pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Eric Han	b182f08135	Fix issue in softmax.cu with transformer error when mask seqlen > 1024 (#83639 ) Fixes #83142 Adds - test to catch this issue. - fix to softmax.cu that broadcasts src_key_padding_mask to regular attention_mask shape Pull Request resolved: https://github.com/pytorch/pytorch/pull/83639 Approved by: https://github.com/ngimel	2022-08-30 18:06:27 +00:00
Rui Zhu	e0f2eba93d	Move odd num_head in TransformerEncoder to slow_path (#83483 ) Summary: odd nhead is not supported for masked softmax, therefore we just move it to use old slow_path Test Plan: CI Differential Revision: D38720086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83483 Approved by: https://github.com/erichan1	2022-08-20 10:02:08 +00:00
Yoav Navon	dfc97df64d	Add fastpath test for mask check flag (#82999 ) Summary: Check that fastpath is taken, which type (sparsity fastpath or normal) for mask that is aligned and one that is not. Test Plan: buck test caffe2/test:test_transformers Differential Revision: D38259928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82999 Approved by: https://github.com/jbschlosser	2022-08-12 00:04:45 +00:00
Joel Benjamin Schlosser	6ca95547ac	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-08-01 22:26:18 +00:00
PyTorch MergeBot	26776d628c	Revert "Initial private SDP interface and naive composite impl (#81956 )" This reverts commit `f15c5bf133`. Reverted https://github.com/pytorch/pytorch/pull/81956 on behalf of https://github.com/janeyx99 due to broke all configs on test_scaled_dot_product_attention (__main__.TestNestedTensorAutograd) `f15c5bf133`	2022-07-27 18:36:54 +00:00
Joel Benjamin Schlosser	f15c5bf133	Initial private SDP interface and naive composite impl (#81956 ) Adds an initial private API version of the SDP interface. Signature: ``` _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) ``` Returns a tuple of `(output, attn_weights)`. Note the following: * `need_attn_weights`: flag indicating that attention weights should be computed. This is useful to toggle off for flash attention as it does not materialize the weights by default, making it more expensive to return them. * Boolean attention mask support only; `True` values within `attn_mask` indicate that the element should take part in attention (notably, this is reverse of MHA, which uses `True` to mask out values). Mask is optional. * `is_causal`: Temporary flag indicating whether to use a causal attention weighting. If this is set to `True`, it takes precedent over any value passed in for `attn_mask`. Longer term, the `is_causal` flagging can be subsumed into the `attn_mask` arg via tensor subclassing (see e.g. [CausalTensor](https://github.com/facebookresearch/xformers/blob/sparse_cleanup/xformers/sparse/causal_tensor.py) in xFormers). * Testing is currently done via reference with the existing Python impl of `F._scaled_dot_product_attention`. * This PR does not yet drop-in the new SDP anywhere. A future PR can hook it up in BT or MHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81956 Approved by: https://github.com/drisspg, https://github.com/erichan1	2022-07-27 15:41:45 +00:00
Eric Han	23088fcfdf	disable src mask for transformer and multiheadattention fastpath (#81277 ) Disable fastpath if src_mask passed to TransformerEncoderLayer and MultiheadAttention. - Refactored test_transformerencoder from test_nn.py to test_transformers.py. Added a src_mask test there. - Added a specific src_mask test in test_transformers.py Fixes https://github.com/pytorch/pytorch/issues/81129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81277 Approved by: https://github.com/zrphercule	2022-07-15 20:55:17 +00:00
Eric Han	06274d7a48	Add test for torchscripting nn.TransformerEncoder, including fast path (#79796 ) Summary: Add test just to check if TransformerEncoder will crash when enumerating over params [with_no_grad, use_torchscript, training]. Motivation for this was that TransformerEncoder fast path (so with_no_grad=True) and use_torchscript=True would crash with the issue that NestedTensor doesn't have size. This was caused because the TransformerEncoder fast path generates a NestedTensor automatically as a perf optimization and torchscript attempts to find intermediate tensor sizes while it optimizes. But NestedTensor has not implemented a size method, so things fail. This test goes together with this fix https://github.com/pytorch/pytorch/pull/79480 Test Plan: ``` buck build --show-output mode/opt -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 mode/inplace //caffe2/test:transformers ./fbcode/buck-out/gen/caffe2/test/transformers#binary.par ``` Test runs and passes together with the changes from the PR above (I made another diff on top of this with those changes). Does not pass without the fix. Reviewed By: mikekgfb Differential Revision: D37222923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79796 Approved by: https://github.com/zrphercule	2022-06-17 22:00:49 +00:00

1 2 3

108 Commits