Summary:
All of the pooling modules except MaxUnpool and LPPool return either a
Tensor or [Tensor, Tensor]. The current type annotations are inaccurate,
and prevent scripting the module if return_indices is set as True in the
module.
There's not a great way to make this agree with mypy because the
overload is dependent on the value of return_indices, an attribute.
I tried changing the annotations from `Tensor` to
`Union[Tensor, Tuple[Tensor, Tensor]]`, but that breaks a bunch of uses
that have return_indices=False.
For example, this breaks:
4e94e84f65/torch/nn/modules/container.py (L139)
Also clean up how test names were being constructed in test_jit, since
otherwise we were getting name collisions when there were two tests on
the same nn.Module.
Fixes https://github.com/pytorch/pytorch/issues/45904
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65847
Reviewed By: ZolotukhinM
Differential Revision: D31462517
Pulled By: eellison
fbshipit-source-id: 6f9e8df1be6c75e5e1e9bae07cf3ad3603ba59bd
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/57505
Also fixes a warning I found when compiling:
```
/home/gaoxiang/pytorch-cub/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu(7): warning: inline qualifier ignored for "__global__" function
```
I also updated the bfloat16 guard to CUDA 11.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64498
Reviewed By: mruberry
Differential Revision: D30917077
Pulled By: ngimel
fbshipit-source-id: fb9df08fd469038478a563014b5af7452b4b28c0
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11959
Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes.
Implementation is dumb and simple right now, but future work can add higher performance kernels for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044
Reviewed By: zou3519
Differential Revision: D29876894
Pulled By: jbschlosser
fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00
Summary:
Fixes Python part of https://github.com/pytorch/pytorch/issues/60747
Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355
Reviewed By: bdhirsh
Differential Revision: D29967302
Pulled By: jbschlosser
fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987
Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum.
Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"
Reviewed By: ngimel
Differential Revision: D29115235
fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790
Summary:
Allow those tests to pass on A100 GPUs which support tf32
Basically follow-up to https://github.com/pytorch/pytorch/pull/52871 which also increased some precisions to 0.05
For reference these are the failures I see (only ones in testnn with 1.9.0):
```
FAIL: test_Conv3d_pad_same_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 161 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.032408137116391345 (-33.45570601919647 vs. -33.42329788208008), which occurred at index (2, 0, 0, 1, 0).
======================================================================
FAIL: test_Conv3d_pad_same_dilated_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 111 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.024654212557543076 (35.104286017977465 vs. 35.07963180541992), which occurred at index (3, 0, 0, 0, 2).
======================================================================
FAIL: test_Conv3d_pad_valid_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 41 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.010903167642320355 (8.074376869119371 vs. 8.06347370147705), which occurred at index (0, 0, 1, 0, 0).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60451
Reviewed By: albanD
Differential Revision: D29353255
Pulled By: ngimel
fbshipit-source-id: 155a02242be5a11dcbd9dd40ab63f15c6757ae1b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655
This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791
Reviewed By: gchanan
Differential Revision: D29242015
Pulled By: jbschlosser
fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
Summary:
Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100.
CC Flamefire ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209
Reviewed By: gchanan
Differential Revision: D29220508
Pulled By: ngimel
fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56964
This PR does many things but does not update any logic:
- Prefixes all function names that are not `gradcheck`, `gradgradcheck`, `get_numerical_jacobian`, and `get_analytical_jacobian` with underscore to indicate that they aren't part of the public API (https://github.com/pytorch/pytorch/issues/55714).
- Improve naming to avoid referencing Jacobian rows or Jacobian cols when we really mean vjp and jvp as suggested by zou3519
- Try to reduce comment line length so they are more consistent and easier to read
- Other misc improvements to documentaiton
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D28096571
Pulled By: soulitzer
fbshipit-source-id: d372b5f8ee080669e525a987402ded72810baa0c
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.
This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.
Fixes https://github.com/pytorch/pytorch/issues/3194
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237
Reviewed By: walterddr, VitalyFedyunin
Differential Revision: D26948258
Pulled By: jbschlosser
fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54378
### For release notes
`torch.autograd.gradcheck.get_numerical_jacobian` (not part of the public api) is being deprecated.
In the future, user code relying on this function will break because, among other changes, `get_numerical_jacobian` now returns `List[Tuple[torch.Tensor]]` instead of `List[torch.Tensor]`.
(more details if necessary)
For a `fn` that takes in M inputs and N outputs we now return a list of M N-tuples of jacobians where `output[i][j]` would represent the numerical jacobian w.r.t. to the ith input and the jth output. Previously `get_numerical_jacobian` returned a list of tensors where each tensor represents the jacobian w.r.t. to each of the M inputs and a specific output. Finally, the function passed in as the parameter `fn` should expect to handle individual parameters, where previously `fn` is required to expect its parameters wrapped in a tuple.
--- end --
This PR addresses the comment here https://github.com/pytorch/pytorch/pull/53857#discussion_r595429639, to reduce the run-time of old gradcheck's get numerical jacobian by a factor of num_outputs. However, because very few ops actually return multiple outputs, there is not too much real speed up here.
The main benefit of doing this change as part of the refactor is that it helps us isolate the possible bugs that are specific to switching `get numerical jacobian` to run in a per output way vs all outputs at once. Much of the logic implemented here will be the same for the fast gradcheck case, so knowing for certain that everything should pass after this stage will make the next step much simpler.
The get_numerical_jacobian api is also being used in common_nn. So we update the callsite there as well.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D27728720
Pulled By: soulitzer
fbshipit-source-id: ee0f90b4f26ddc5fdbe949c4965eaa91c9ed0bb8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55574
nn.EmbeddingBag backward is non-deterministic when reducing_mode = Max and on GPU, reducing mode Mean and Sum should be deterministic
Test Plan: NA
Reviewed By: ngimel
Differential Revision: D27633832
fbshipit-source-id: 50786ed8522f1aae27442f5f244a65eab8000b06