Summary:
Fixes https://github.com/pytorch/pytorch/issues/11959
Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes.
Implementation is dumb and simple right now, but future work can add higher performance kernels for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044
Reviewed By: zou3519
Differential Revision: D29876894
Pulled By: jbschlosser
fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00
Summary:
Fixes Python part of https://github.com/pytorch/pytorch/issues/60747
Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355
Reviewed By: bdhirsh
Differential Revision: D29967302
Pulled By: jbschlosser
fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987
Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum.
Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"
Reviewed By: ngimel
Differential Revision: D29115235
fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790
Summary:
Allow those tests to pass on A100 GPUs which support tf32
Basically follow-up to https://github.com/pytorch/pytorch/pull/52871 which also increased some precisions to 0.05
For reference these are the failures I see (only ones in testnn with 1.9.0):
```
FAIL: test_Conv3d_pad_same_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 161 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.032408137116391345 (-33.45570601919647 vs. -33.42329788208008), which occurred at index (2, 0, 0, 1, 0).
======================================================================
FAIL: test_Conv3d_pad_same_dilated_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 111 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.024654212557543076 (35.104286017977465 vs. 35.07963180541992), which occurred at index (3, 0, 0, 0, 2).
======================================================================
FAIL: test_Conv3d_pad_valid_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
method(*args, **kwargs)
File "test_nn.py", line 11296, in with_tf32_on
test.test_cuda(self, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
return self.assertEqual(*args, exact_dtype=False, **kwargs)
File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 41 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.010903167642320355 (8.074376869119371 vs. 8.06347370147705), which occurred at index (0, 0, 1, 0, 0).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60451
Reviewed By: albanD
Differential Revision: D29353255
Pulled By: ngimel
fbshipit-source-id: 155a02242be5a11dcbd9dd40ab63f15c6757ae1b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655
This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791
Reviewed By: gchanan
Differential Revision: D29242015
Pulled By: jbschlosser
fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
Summary:
Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100.
CC Flamefire ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209
Reviewed By: gchanan
Differential Revision: D29220508
Pulled By: ngimel
fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56964
This PR does many things but does not update any logic:
- Prefixes all function names that are not `gradcheck`, `gradgradcheck`, `get_numerical_jacobian`, and `get_analytical_jacobian` with underscore to indicate that they aren't part of the public API (https://github.com/pytorch/pytorch/issues/55714).
- Improve naming to avoid referencing Jacobian rows or Jacobian cols when we really mean vjp and jvp as suggested by zou3519
- Try to reduce comment line length so they are more consistent and easier to read
- Other misc improvements to documentaiton
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D28096571
Pulled By: soulitzer
fbshipit-source-id: d372b5f8ee080669e525a987402ded72810baa0c
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.
This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.
Fixes https://github.com/pytorch/pytorch/issues/3194
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237
Reviewed By: walterddr, VitalyFedyunin
Differential Revision: D26948258
Pulled By: jbschlosser
fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54378
### For release notes
`torch.autograd.gradcheck.get_numerical_jacobian` (not part of the public api) is being deprecated.
In the future, user code relying on this function will break because, among other changes, `get_numerical_jacobian` now returns `List[Tuple[torch.Tensor]]` instead of `List[torch.Tensor]`.
(more details if necessary)
For a `fn` that takes in M inputs and N outputs we now return a list of M N-tuples of jacobians where `output[i][j]` would represent the numerical jacobian w.r.t. to the ith input and the jth output. Previously `get_numerical_jacobian` returned a list of tensors where each tensor represents the jacobian w.r.t. to each of the M inputs and a specific output. Finally, the function passed in as the parameter `fn` should expect to handle individual parameters, where previously `fn` is required to expect its parameters wrapped in a tuple.
--- end --
This PR addresses the comment here https://github.com/pytorch/pytorch/pull/53857#discussion_r595429639, to reduce the run-time of old gradcheck's get numerical jacobian by a factor of num_outputs. However, because very few ops actually return multiple outputs, there is not too much real speed up here.
The main benefit of doing this change as part of the refactor is that it helps us isolate the possible bugs that are specific to switching `get numerical jacobian` to run in a per output way vs all outputs at once. Much of the logic implemented here will be the same for the fast gradcheck case, so knowing for certain that everything should pass after this stage will make the next step much simpler.
The get_numerical_jacobian api is also being used in common_nn. So we update the callsite there as well.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D27728720
Pulled By: soulitzer
fbshipit-source-id: ee0f90b4f26ddc5fdbe949c4965eaa91c9ed0bb8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55574
nn.EmbeddingBag backward is non-deterministic when reducing_mode = Max and on GPU, reducing mode Mean and Sum should be deterministic
Test Plan: NA
Reviewed By: ngimel
Differential Revision: D27633832
fbshipit-source-id: 50786ed8522f1aae27442f5f244a65eab8000b06
Summary:
* Lowering NLLLoss/CrossEntropyLoss to ATen dispatch
* This allows the MLC device to override these ops
* Reduce code duplication between the Python and C++ APIs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53789
Reviewed By: ailzhang
Differential Revision: D27345793
Pulled By: albanD
fbshipit-source-id: 99c0d617ed5e7ee8f27f7a495a25ab4158d9aad6
Summary:
Also modify the `tf32_on_and_off` decorator to make it support function without `device` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52871
Reviewed By: ngimel
Differential Revision: D27286674
Pulled By: mruberry
fbshipit-source-id: 14f6d558271bd6a1d0bc40691c170d47e81de1ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667
First part of #3867 (Pooling operators still to do)
This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input.
Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace.
A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D27170744
Pulled By: jbschlosser
fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22
Summary:
Fixes #{issue number}
Resubmitting a new PR as the older one got reverted due to problems in test_optim.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51227
Reviewed By: ezyang
Differential Revision: D26142505
Pulled By: ailzhang
fbshipit-source-id: a2ab5d85630aac2d2ce17652ba19c11ea668a6a9
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3307
Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50663
Reviewed By: heitorschueroff
Differential Revision: D26074811
Pulled By: albanD
fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50744
This PR adds a `check_batched_grad=True` option to CriterionTest and
turns it on by default for all CriterionTest-generated tests
Test Plan: - run tests
Reviewed By: ejguan
Differential Revision: D25997676
Pulled By: zou3519
fbshipit-source-id: cc730731e6fae2bddc01bc93800fd0e3de28b32d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50593
There are no equivalent to torch.FloatTensor, torch.cuda.FloatTensor for complex
types. So `get_gpu_type` and `get_cpu_type` are broken for complex tensors.
Also found a few places that explicitly cast inputs to floating point types,
which would drop the imaginary component before running the test.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D25954050
Pulled By: mruberry
fbshipit-source-id: 1fa8e5af233aa095c839d5e2f860564baaf92aef
Summary:
Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640)
Things added in this PR:
1. Modify backward and double-backward formulas
2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1)
3. Modify some existing tests to support complex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912
Reviewed By: zhangguanheng66
Differential Revision: D25853036
Pulled By: soulitzer
fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad