pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aniket Patil	6f06832219	Fixed typo in activation.py (#111358 ) liner -> linear Pull Request resolved: https://github.com/pytorch/pytorch/pull/111358 Approved by: https://github.com/mikaylagawarecki	2023-10-16 20:36:55 +00:00
Mikayla Gawarecki	48b1208e05	Disable nn.MHA fastpath for floating point masks (#107641 ) Fixes https://github.com/pytorch/pytorch/issues/107084 by disabling the fast path when floating point masks (which should be additive) are passed - [We claim in our docs for MHA that float masks will be added to the attention](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html) (be it `key_padding_mask` or `attn_mask`) - We always canonicalize any mask at the start of MHA in python by converting it to float - my understanding from Driss is that SDPA properly supports additive masking (but there are many special cases for mask shape for MHA that don't work properly currently (BxT, TxT) so [we're turning this off for now](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/attention.cu#L531-L532) - More broadly, the problem isn't with the SDPA path, but that things are broken for the path it falls back to - Right now mha "fast path" code with non-None masks is always going through [this path ](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/attention.cu#L554-L640) that has a call to `masked_softmax` that [converts the masks back to bool](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/attention.cpp#L154-L156) - the implication here is that additive floating point attn_mask and additive key_padding_mask to nn.MHA fastpath are broken - This wasn't broken for the user in [https://github.com/pytorch/pytorch/issues/107084](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fissues%2F107084&h=AT35qHIQavtxKtriTkrkPsWRB3eSRh4qH5PQUyiTzrPTshoztPL0593AmKCmSdEQ5O-5wib0Fd4mwztVu4YbMWb2ghZnZw1pvpJb9-FYWjDsPQ6_oHRVPzFfj8xYXC1TaFnJCkMYjrGXkIfzzxZvmcQYNnIPgsJSiWgjIw) in 1.13.1 because of [this check which bypassed the fast path if attn_mask was defined](https://github.com/pytorch/pytorch/blob/v1.13.1/torch/nn/modules/activation.py#L1096-L1097) (as Driss pointed out though additive key_padding_mask with the fast path were probably broken in 1.13.1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107641 Approved by: https://github.com/drisspg, https://github.com/jbschlosser	2023-08-23 15:08:18 +00:00
Guang Yang	0b57581dec	[pytorch] Disable fast path in MultiheadAttention in Export (#106824 ) Summary: We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime. Two proposed solutions: 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph 2. Add a decomp rule for `aten._native_multi_head_attention` After discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP. Test Plan: CI Differential Revision: D48169806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106824 Approved by: https://github.com/kimishpatel	2023-08-10 00:18:37 +00:00
Mikayla Gawarecki	786977c647	[easy] Add reset_parameters for nn.PRelu (#106507 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106507 Approved by: https://github.com/albanD	2023-08-04 23:22:42 +00:00
Danni Li	5e3aca6c5c	[BE] Input check for torch.nn.MultiheadAttention (#106363 ) Summary: Check `embed_dim` and `num_heads ` of `torch.nn.MultiheadAttention`. Test Plan: Please see GitHub Actions. Differential Revision: D47943134 Fix: #105630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106363 Approved by: https://github.com/mikaylagawarecki	2023-08-01 23:28:23 +00:00
Michael Gschwind	06dd850dd5	Simplify check (#106044 ) Summary: Simplify check / refactor for readability Test Plan: sandcastle, github Differential Revision: D47800732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106044 Approved by: https://github.com/mikaylagawarecki	2023-07-27 01:18:25 +00:00
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	79c5e33349	[BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436 Approved by: https://github.com/malfet, https://github.com/albanD	2023-07-21 07:38:46 +00:00
Danni Li	b33d63d97b	[BE] Use ValueError for input.dim check in torch.nn.modules (#105127 ) Summary: Use ValueError for input.dim check instead of Assertion Error. Fix: #104839 Test Plan: Please see GitHub actions. Differential Revision: D47427998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105127 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-13 23:20:46 +00:00
shibo19	58feefa4ed	add custom device support for special nn.modules (#103419 ) Fixes #103818 1. for some special nn.Modules, there are checks which only support cuda, so I add `privateuse1` check. 2. when get the device type for `privateuse1` by `torch._C._get_privateuse1_backend_name()`, it will get error in `torch.jit.script`, so I add a global variable to avoid this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103419 Approved by: https://github.com/albanD	2023-06-26 00:58:29 +00:00
MysticalMusings	f1f13a35b0	Fix GELU-related docstring formatting (#102845 ) The docstring about GELU seems formatted incorrectly. The original docstring about GELU is rendered as below: $$ \text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt(2 / \pi) * (x + 0.044715 * x^3))) $$ where the square root of which part is confusing. I double-checked the formula, which should be: $$ \text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3))) $$ where round brackets in resource code should be brace brackets. > _formula in [original paper](https://arxiv.org/abs/1606.08415)_ > ![Snipaste_2023-06-03_00-43-49](https://github.com/pytorch/pytorch/assets/39690782/22511c4e-2f20-4a16-9bda-4c182a360160) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102845 Approved by: https://github.com/mikaylagawarecki	2023-06-08 20:19:03 +00:00
Michael Gschwind	2361f7f0ce	Update doc strings to make description of is_causal consistent for nn.Transformer and nn.MHA (#101089 ) Summary: Update doc strings to make description of is_causal consistent for nn.Transformer and nn.MHA Test Plan: sandcastle & github CI/CD Differential Revision: D45737197 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101089 Approved by: https://github.com/mikaylagawarecki	2023-05-13 18:14:38 +00:00
Michael Gschwind	29b2745285	Add message about need_weights=False performance profile. (#100396 ) Summary: Add message about need_weights=False/True performance profile. Test Plan: sandcastle Differential Revision: D45446417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100396 Approved by: https://github.com/albanD	2023-05-01 21:45:41 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Michael Gschwind	36e1ae6778	De-select odd numbered heads from nn.MHA fastpath (#99672 ) Summary: https://github.com/pytorch/pytorch/issues/97128 * Add test for mha num_heads %2 != 0 * Fix test * Add test for bias false * show test passes Test Plan: sandcastle Differential Revision: D45161767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99672 Approved by: https://github.com/ngimel	2023-04-25 00:27:18 +00:00
Felix Divo	70072c926e	Fix MHA doc string (#99146 ) This was missed in #97046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99146 Approved by: https://github.com/albanD	2023-04-14 15:19:13 +00:00
Lucas Pasqualin	35c6547f02	Adds 3D attn_mask support to merge_masks() for Multihead Attention fast path (#98991 ) Fixes #97409 Adds support for 3D attn_mask by always expanding attn_mask to 4D as per https://github.com/pytorch/pytorch/pull/98375#issuecomment-1499504721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98991 Approved by: https://github.com/jbschlosser	2023-04-13 20:29:57 +00:00
Michael Gschwind	c757647dd8	[Better Transformer] make is_causal a hint and force attn_mask to be set on `is_causal=True` in F.MHA (#97214 ) Summary: This fixes an issue raised in [is_causal parameter in torch.nn.TransformerEncoderLayer.forward does not work #96941](https://github.com/pytorch/pytorch/issues/96941) where results computed with is_causal do not properly reflect causal masking. In PyTorch 2.0, Accelerated PT Transformers added the is_causal parameter to legacy nn.Transformer* and nn.MHA APIs aligned with and intended to engage the is_causal parameter of the new scaled_dot_product_attention (SDPA) operator. At present is_causal works differently for Transformer* modules, the nn.MHA and F.MHA: * The nn.Transformer* modules treat is_causal as an optional indicator about the format of attn_mask. This is because some layers (such as the CLIP layer use the attention mask in the layer, and thus the attn_mask was a required feature.) * Initially, nn.MHA and F.MHA were defined to align with F.SDPA in behavior: a user may specify either the attention mask, or is_causal, but not both. It seemed to make sense at the time to align SDPA and MHA, esp since there was a larger overlap of parameters which have since changed, e.g., with the removal of need_weights from SDPA. (See below for why this makes sense.) Unfortunately, this does not work because of how MHA was changed to handle the need_weights parameter. When need_weights is present, we do not (any more) call SDPA because support for need_weights was removed from SDPA before the release. The rationale is that need_weights defeats all optimization at the foundation of SDPA performance. Having the flag might thus mislead users into thinking they get good performance and have them disappointed when they enable a legacy feature of MHA which massively degrades performance. (They might not think anything of enabling that, because it is on by default in MHA today, which leads to more issues.) Since SDPA does not (no longer) support need_weights, we need to pick a separate path which implements attention using a set of discrete operations that allocates a tensor for weights. Alas, this code path does not have support for is_causal, because attention is implemented as matmul and using the attention mask. Thus, is_causal has no impact. (A substantially similar situation arises with how kpm is implemented today because Nested Tensors are not supported by torch.compile() in 2.0) This problem was masked because all uses of legacy nn.MHA (and F.MHA) come through nn.Transformer* which called self-attention (i.e., nn.MHA) only ever with the attention mask attn_mask, and never with is_causal, a missed optimization opportunit that would have been addressed in a future performance update. Regrettably, always calling nn.MHA with attn_mask prevented diagnosing of the issue of not having a suitable attention mask when need_weights support was dropped from SDPA and a discrete implementation of attention was added for that scenario, and for the execution path with key_padding_mask. We have two options to address this issue: Solution 1: Whenever nn.MHA and F.MHA are executed with is_causal set, we internally create a causal mask at significant expense of allocating a tensor and filling it with a triangular causal matrix. This increases memory usage, and runtime, for allocating a causal mask. To add insult to injury, in all current (and likely future) execution scenarios, MHA is called by a model using the nn.Transformer API which already has that matrix and passes it from nn.module to nn.module. Then the passing in of attn_mask has to be suppressed by nn.TransformerEncoderLayer, only for nn.MHA to immediately allocate the very same tensor again to satisfy the requirement to have an attention mask for the computation. (We expect new use cases to use SDPA directly.) Solution 2: We align the behavior of nn.MHA and F.MHA with the rest of the existing nn.Transformer API, and require the attention mask to be passed into nn.MHA in addition to is_causal as an optional indicator about the nature of the attention mask rather than as an alternative to attn_mask. Then, when we choose the code path for processing MHA with need_weights or a key_padding_mask, we have the attn_mask passed down through the nn.Transformer* hierarchy, without the added overhead of allocating an attention mask as in scenario 1. This PR implements solution 2 which offers better performance and in retrospect aligns MHA better with the rest of the Transformer modules as the definition of SDPA evolved into a more streamlined high-performance operator. It ostensibly changes how is_causal works, by requiring the attention mask to be specified. However, as described here, and as shown in the submitted issue, is_causal is not working as intended today, so it requires a change regardless. In that sense, a change in API does not occur per-se, as the current implementation is not working, and a change has to occur either way to resolve the submitted issue, breaking any use cases that depend on the current implementation. Checks exist (and more can be added) that flag any scenarios where is_causal is passed as True, but no attention mask is provided, ensuring that there's not quiet change from even the faulty behavior present in 2.0. As an upside, the present implementation will improve performance by addressing the passing of the is_causal flag from Transformer modules to MHA, speeding up training for these examples, e.g., finetuning BERT, RoBERTa, XLM-R models. Differential Revision: D44245725 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97214 Approved by: https://github.com/albanD	2023-03-25 01:36:30 +00:00
Michael Gschwind	b132220309	Update MHA doc string (#97046 ) Summary: Update MHA doc string Test Plan: sandcastle & github Differential Revision: D44179519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97046 Approved by: https://github.com/voznesenskym	2023-03-18 02:14:59 +00:00
Michael Gschwind	61cb544397	Align mask formatting of both masks more closely (#96286 ) Summary: Align mask formatting of both masks more closely Test Plan: sandcastle & github Differential Revision: D43878634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96286 Approved by: https://github.com/cpuhrsch	2023-03-11 02:18:05 +00:00
SiriusNEO	a7689e73f6	[Docs] Document of RReLU about its different behavior between training and evaluation (#95624 ) Current document of [Randomized Leaky ReLU (RReLU)](https://pytorch.org/docs/stable/generated/torch.nn.RReLU.html#torch.nn.RReLU) does not demonstrate its different behavior between training and evaluation. This PR adds illustrations about this. Fixes #95605. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95624 Approved by: https://github.com/albanD, https://github.com/H-Huang	2023-03-10 22:33:24 +00:00
Michael Gschwind	03b6e6979c	Transformers: fix src and key padding mask bool regression (#96009 ) Summary: fix src and pad mask bool regression This fixes a regression introduced previously with #92733. That PR unified testing of masks to remove Byte Tensors as permissible mask, introduced mask compatibility check, and mask conversion to FP mask. The problem addressed in this PR was that after the first mask had been converted, a check for mask compatibility would fail. Test Plan: sandcastle & github Differential Revision: D43782858 Fixes https://github.com/pytorch/pytorch/issues/95702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96009 Approved by: https://github.com/malfet	2023-03-05 01:50:46 +00:00
Michael Gschwind	4fada6eb95	MHA torch.jit.script fix for in_proj_weight = None (#95653 ) Summary: MHA fix to support in_proj_weight being None Test Plan: sandcastle Differential Revision: D43628206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95653 Approved by: https://github.com/davidberard98, https://github.com/cpuhrsch	2023-02-28 17:29:29 +00:00
albanD	b7e1477e9b	Improve leaky relu doc (#94090 ) Fixes #83821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94090 Approved by: https://github.com/jbschlosser	2023-02-14 17:58:51 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Tri Dao	ffb3561caa	[Docs] Add pointer to FlashAttention paper (#94253 ) As discussed with @drisspg, we're adding pointers to the docs for MHA and Transformers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94253 Approved by: https://github.com/drisspg, https://github.com/malfet	2023-02-07 08:05:10 +00:00
Michael Gschwind	7265f60ad0	Regularize mask handling for attn_mask and key_padding_mask (#92733 ) Summary: Regularize mask handling for attn_mask and key_padding_mask * Update documentation to remove reference to byte masks (which were deprecated long ago) * Introduce check and warn about deprecation if attn_mask and key_padding_mask types mismatch * Convert all masks to float before combining * Combine by adding Test Plan: sandcastle & github CI Differential Revision: D42653215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92733 Approved by: https://github.com/ngimel, https://github.com/drisspg	2023-01-24 14:12:05 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Michael Gschwind	512ec181ec	Introduce causal mask (#90508 ) Summary: Introduce causal mask This PR introduces a causal mask option _causal_mask (as well as causal mask detection if attn_mask is provided), since current custom kernels do not support arbitrary masks. Test Plan: sandcastle & github ci/cd Differential Revision: D41723137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90508 Approved by: https://github.com/albanD	2022-12-16 21:39:42 +00:00
Michael Gschwind	1f88b208ac	Fix cuda/cpu check on NoneType (Unit test) (#88970 ) Summary: Fix cuda/cpu check on NoneType (unit test) Test Plan: sabdcastle/ github CI/CD Differential Revision: D41208798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88970 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch	2022-11-15 01:25:19 +00:00
Michael Gschwind	ee91c328da	Fix cuda/cpu check on NoneType (#88854 ) Summary: Fix cuda/cpu check on NoneType Test Plan: sabdcastle/ github CI/CD Differential Revision: D41203955 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88854 Approved by: https://github.com/drisspg, https://github.com/ngimel	2022-11-11 12:19:31 +00:00
Emil Lynegaard	9d09968bbe	Disable check for dropout in MultiheadAttention fast_path (#88831 ) Since we already enforce eval mode for the fast_path, we do not need to also check for a falsy dropout value, as a model trained with dropout will have a non-zero dropout during eval mode, even though it won't be applied. Fixes #88806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88831 Approved by: https://github.com/drisspg	2022-11-11 03:34:57 +00:00
Grigory Sizov	7ad87f63e2	Support src_mask and src_key_padding_mask for Better Transformer (#88488 ) Fixes T135842750 (follow-up for #87377) ## Description At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. ## Tests - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: https://github.com/pytorch/pytorch/pull/88488 Approved by: https://github.com/mikekgfb	2022-11-10 08:12:56 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00
Grigory Sizov	4c78c7c82a	Enable `src_mask` in fast path of `TransformerEncoderLayer` (#87377 ) ## Issues Fixes https://github.com/pytorch/pytorch/issues/81129#issuecomment-1179435674 ## Description Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in https://github.com/pytorch/pytorch/pull/81277. This PR unrolls this fix, enabling `src_mask` on the fast path: - Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type. - If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used ## Tests: - `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask - `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation - `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match ## Note I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason: - `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26) - If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests - Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double` Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this Pull Request resolved: https://github.com/pytorch/pytorch/pull/87377 Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet	2022-10-31 19:59:36 +00:00
Rui Zhu	4b757f4633	Assert if padding mask type is unexpected (#86353 ) (#87106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86353 Fix the issue described in https://github.com/pytorch/pytorch/issues/86120 Test Plan: buck test mode/opt caffe2/test:test_transformers -- test_train_with_long_type_pad Differential Revision: D40129968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87106 Approved by: https://github.com/malfet	2022-10-20 16:01:54 +00:00
Animesh Jain	2f8cfb74af	Fix gelu repr (#85790 ) Fixes https://github.com/pytorch/torchdynamo/issues/1378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85790 Approved by: https://github.com/ezyang	2022-09-28 18:35:51 +00:00
Michael Gschwind	d7029fea51	Remove TS compatibility transition code (#85003 ) Summary: Remove TS compatibility transition code Test Plan: sandcastle Differential Revision: D39494677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85003 Approved by: https://github.com/erichan1	2022-09-21 02:07:13 +00:00
Abhijit Deo	6d222116a1	[Documentation] Minor rendering issue (#84856 ) There is a Rendering issue with the docstring of nn.GELU. Hope this fixes the [issue.](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html) cc: @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/84856 Approved by: https://github.com/kit1980	2022-09-13 00:29:52 +00:00
Eric Han	7a5d5a0020	Disable Transformer/MHA fast path when autocast is enabled (#84722 ) Differential Revision: D39362298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84722 Approved by: https://github.com/cpuhrsch	2022-09-09 01:15:24 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Rui Zhu	e0f2eba93d	Move odd num_head in TransformerEncoder to slow_path (#83483 ) Summary: odd nhead is not supported for masked softmax, therefore we just move it to use old slow_path Test Plan: CI Differential Revision: D38720086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83483 Approved by: https://github.com/erichan1	2022-08-20 10:02:08 +00:00
Michael Gschwind	d589aa531f	TS jit 2 week compatibility window for new TEL forward() (#83467 ) Summary: TS jit 2 week compatibility window for new TEL forward() Test Plan: sandcastle Differential Revision: D38711177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83467 Approved by: https://github.com/erichan1, https://github.com/jbschlosser	2022-08-16 16:53:10 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Nicolas Macchioni	b236352036	Add mask identifier for multiplexed src_mask/src_key_padding_mask in BT (#81947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947 Transformer fastpath multiplexes two arguments, src_mask [seq_len x seq_len] and src_key_padding_mask [batch_size x seq_len], and later deduces the type based on mask shape. In the event that batch_size == seq_len, any src_mask is wrongly interpreted as a src_key padding_mask. This is fixed by requiring a mask_type identifier be supplied whenever batch_size == seq_len. Additionally, added support for src_mask in masked_softmax CPU path. Test Plan: existing unit tests + new unit tests (batch_size == seq_len) Differential Revision: D37932240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947 Approved by: https://github.com/zrphercule	2022-08-09 23:42:16 +00:00
ProGamerGov	357b7d589c	Fix docstring inconsistencies: string -> str, boolean -> bool (#82410 ) ### Description Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for. ### Testing No testing should be required for this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410 Approved by: https://github.com/jbschlosser	2022-07-28 21:29:57 +00:00
kylematoba	66cf1b6459	correct argument name in docs (#81485 ) Recently introduced `average_attn_weights` argument is documented incorrectly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81485 Approved by: https://github.com/albanD	2022-07-20 20:07:16 +00:00
Eric Han	23088fcfdf	disable src mask for transformer and multiheadattention fastpath (#81277 ) Disable fastpath if src_mask passed to TransformerEncoderLayer and MultiheadAttention. - Refactored test_transformerencoder from test_nn.py to test_transformers.py. Added a src_mask test there. - Added a specific src_mask test in test_transformers.py Fixes https://github.com/pytorch/pytorch/issues/81129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81277 Approved by: https://github.com/zrphercule	2022-07-15 20:55:17 +00:00
anjali411	bda04e9f5e	Add __all__ for torch.optim and torch.nn.modules modules (#80237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237 Approved by: https://github.com/albanD	2022-06-24 21:34:10 +00:00
Shawn Zhong	c1516e0c8d	Fix LeakyReLU spelling (#79102 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79102 Approved by: https://github.com/albanD	2022-06-08 15:39:21 +00:00

1 2 3 4 5

213 Commits