Walter Shen
f5178bf151
Revert D25607503: Add base forward grad logic
...
Test Plan: revert-hammer
Differential Revision:
D25607503 (fdf02eff3d )
Original commit changeset: f1396290de1d
fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f
2020-12-21 19:56:28 -08:00
albanD
fdf02eff3d
Add base forward grad logic ( #49097 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097
RFC: https://github.com/pytorch/rfcs/pull/11
This PR add the basic logic to handle forward grad as dual Tensors.
It contains the following:
- Mechanism to save dual state on a Tensor and clear it up when the dual level ends
- C++ and python user facing API
- Updated view system that is able to track both forward and backward views
The current PR has the following limitations:
- Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
- Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
- Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
- We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
- We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.
Reading guide:
- Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df ), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a ), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285 ) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542 ) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d ). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
- New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb ), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd ) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325 ). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
- Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911 ), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94 ), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1 ) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677 )
- API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991 ) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d ), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f ), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c ), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c ), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce ), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf ) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243 )
- c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e ), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9 )
- python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d )
- python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9 ), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8 )
- c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0 ), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3 )
- Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1 ), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5 ) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433 )
- Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2 ), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030 )
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D25607503
Pulled By: albanD
fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099
2020-12-21 14:39:43 -08:00
Alexander
44ce0b8883
Sparse-sparse matrix multiplication (CPU/CUDA) ( #39526 )
...
Summary:
This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.
The current implementation of `torch.sparse.mm` support this configuration,
`torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.
This implementation extends `torch.sparse.mm` function to support `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`
Resolves #[20988](https://github.com/pytorch/pytorch/issues/20988 ) for CPU/CUDA.
- [x] sparse matmul
- [x] CPU/CUDA C++ implementation
- [x] unittests
- [x] update torch.sparse.mm documentation
- [x] autograd support
The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.
Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:
size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
-- | -- | -- | -- | --
(32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
(32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
(32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
(32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
(512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
(512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
(512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
(512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
(1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
(1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
(1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
(1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7
A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:
```
[------------------------- sparse.mm-backward -------------------------]
| sparse.backward | dense.backward
-----------------------------------------------------------------------
(32, 10000) | 0.01 | 13.5 | 2.4
(32, 10000) | 0.05 | 52.3 | 2.4
(512, 10000) | 0.01 | 1016.8 | 491.5
(512, 10000) | 0.05 | 1604.3 | 492.3
(1024, 10000) | 0.01 | 2384.1 | 1963.7
(1024, 10000) | 0.05 | 3965.8 | 1951.9
```
I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.
```
[---------------------------------- matmul ---------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ------------------------------------------------------------------
(cpu) torch | 5.4 | 5.4 | 5.2 | 5.3 | 5.3 | 5.4
torch.sparse | 122.2 | 51.9 | 27.5 | 11.4 | 4.9 | 1.8
scipy | 150.1 | 87.4 | 69.2 | 56.8 | 38.4 | 17.1
(cuda) torch | 1.3 | 1.1 | 1.1 | 1.1 | 1.1 | 1.1
torch.sparse | 20.0 | 8.4 | 5.1 | 2.5 | 1.5 | 1.1
[----------------------------------- backward -----------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: -----------------------------------------------------------------------
(cpu) torch | 17.7 | 17.9 | 17.7 | 17.7 | 17.6 | 17.9
torch.sparse | 672.9 | 432.6 | 327.5 | 230.8 | 176.7 | 116.7
(cuda) torch | 3.8 | 3.6 | 3.5 | 3.5 | 3.6 | 3.5
torch.sparse | 68.8 | 46.2 | 35.6 | 24.2 | 17.8 | 11.9
Times are in milliseconds (ms).
```
In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.
## **References**
1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.** Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk ](https://github.com/google-research/google-research/tree/master/sgk )
2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity ](https://github.com/google-research/google-research/tree/master/state_of_sparsity )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526
Reviewed By: mruberry
Differential Revision: D25661239
Pulled By: ngimel
fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
2020-12-21 11:53:55 -08:00
Ryan Spring
65876d3f51
Change aten::native_layer_norm signature to match torch.layer_norm definition ( #48971 )
...
Summary:
This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training.
`native_layer_norm(X, gamma, beta, M, N, eps)` =>
`native_layer_norm(input, normalized_shape, weight, bias, eps)`
`native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` =>
`native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971
Reviewed By: izdeby
Differential Revision: D25574070
Pulled By: ngimel
fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4
2020-12-16 23:09:18 -08:00
Peter Bell
fc0a3a1787
Improve torch.fft n-dimensional transforms ( #46911 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46911
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D25420647
Pulled By: mruberry
fbshipit-source-id: bf7e6a2ec41f9f95ffb05c128ee0f3297e34aae2
2020-12-09 12:40:06 -08:00
Erjia Guan
86bb413600
Optimize backward for torch.repeat ( #46726 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726
Fixes #43192
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D24739840
Pulled By: ejguan
fbshipit-source-id: ddf21fc52c4676de25ad7bfb0b5c1c23daa77ee6
2020-11-09 15:12:40 -08:00
Erjia Guan
bba5a31176
Revert D24481801: Optimize backward for torch.repeat
...
Test Plan: revert-hammer
Differential Revision:
D24481801 (4e6f2440d8 )
Original commit changeset: 95c155e0de83
fbshipit-source-id: 0fb0afde760b0f5e17bd75df950a5d76aee5370b
2020-11-04 10:44:40 -08:00
Erjia Guan
f1ac63d324
Implement copysign ( #46396 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46396
Related #38349
[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign )
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex
`c = np.copysign(a, b)`
| a | b | c | a.grad |
| -1 | -1 | -1 | 1 |
| -0 | -1 | -0 | 0 |
| 0 | -1 | -0 | 0 |
| 1 | -1 | -1 | -1 |
| -1 | -0 | -1 | 1 |
| -0 | -0 | 0 | 0 |
| 0 | -0 | 0 | 0 |
| 1 | -0 | -1 | -1 |
| -1 | 0 | 1 | -1 |
| -0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| -1 | 1 | 1 | -1 |
| -0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |
This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.
TODO:
- [x] test (cpu/gpu)
- [x] doc
- [x] ~kernel_vec~
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D24401366
Pulled By: ejguan
fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d
2020-11-04 08:08:57 -08:00
Erjia Guan
4e6f2440d8
Optimize backward for torch.repeat ( #46726 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726
Fixes #43192
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D24481801
Pulled By: ejguan
fbshipit-source-id: 95c155e0de83b71f173c9135732ea84ba6399d69
2020-11-03 11:16:55 -08:00
kshitij12345
1d233d7d1f
[fix] torch.nn.functional.embedding -> padding_idx behavior ( #46714 )
...
Summary:
Reference https://github.com/pytorch/pytorch/issues/46585
Fix for second snippet in the mentioned issue.
```python
predefined_weights = torch.rand(10, 3)
result = torch.nn.functional.embedding(torch.LongTensor([1,2,0]), predefined_weights, padding_idx=0)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46714
Reviewed By: VitalyFedyunin
Differential Revision: D24593352
Pulled By: albanD
fbshipit-source-id: 655b69d9ec57891871e26feeda2aa0dcff73beba
2020-10-29 13:29:00 -07:00
anjali411
d94bd998ec
Update backward formulas (Re #44444 ) ( #46275 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46275
Re #44444
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D24285785
Pulled By: anjali411
fbshipit-source-id: c60ecd4fe4f144132085f2c91d3b950e92b2a491
2020-10-25 19:40:59 -07:00
Kurt Mohler
28f8372bf4
Avoid mat1 references in mm_mat1_backward ( #45777 )
...
Summary:
Avoiding references to `mat1` in `mm_mat1_backward` is a first step to solving issue https://github.com/pytorch/pytorch/issues/42371
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45777
Reviewed By: malfet
Differential Revision: D24347967
Pulled By: albanD
fbshipit-source-id: f09a8149d9795481b5ed5b48fdd0e598ba027d0b
2020-10-16 13:52:44 -07:00
Edward Yang
546aab66c1
Revert D24027761: Update backward definition for more operators and reenable tests in test_ops.py
...
Test Plan: revert-hammer
Differential Revision:
D24027761 (7d809f5d8e )
Original commit changeset: c1f707c2a039
fbshipit-source-id: 30750d2f08886036fb8b2cd0ae51c7732d3b7b19
2020-10-02 18:52:57 -07:00
anjali411
7d809f5d8e
Update backward definition for more operators and reenable tests in test_ops.py ( #44444 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44444
This PR:
1. Fixes https://github.com/pytorch/pytorch/issues/41510 . Updates backward formula for the following functions: `asin`, `acos`, `asinh`, `acosh`, `atan`, `atanh`, `div`, `log`, `log10`, `log2`, `log1p`, `pow`, `reciprocal`, `angle`.
2. Re-enables the tests in `test_ops.py`.
3. Adds dispatch for complex dtypes for `tanh_backward`.
4. Re-enables commented tests in `common_methods_invocation.py`.
Test Plan: Imported from OSS
Reviewed By: glaringlee
Differential Revision: D24027761
Pulled By: anjali411
fbshipit-source-id: c1f707c2a039149a6e04bbde53ee120d9119d99a
2020-10-02 13:37:10 -07:00
anjali411
18876b5722
Update backward formula for torch.dot and add backward definition for torch.vdot ( #45074 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45074
TODO: Add R -> C tests in https://github.com/pytorch/pytorch/pull/44744 (blocked on some JIT changes)
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D23975361
Pulled By: anjali411
fbshipit-source-id: 3512bd2962b588a198bc317673bd18cc96ac823f
2020-09-29 12:52:03 -07:00
Brian Hirsh
439930c81b
adding a beta parameter to the smooth_l1 loss fn ( #44433 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time
fixing some type errors, updated fn signature in a few more files
removing my usage of Scalar, making beta a double everywhere instead
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23636720
Pulled By: bdhirsh
fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d
2020-09-25 16:36:28 -07:00
kshitij12345
00e704e757
[fix] torch.repeat : dim-0 backward ( #45212 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45201
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45212
Reviewed By: mrshenli
Differential Revision: D23905545
Pulled By: albanD
fbshipit-source-id: c5bf9cf481c8cf3ccc1fdbfb364006b29f67dc9f
2020-09-25 07:53:00 -07:00
anjali411
58b6ab69e5
torch.sgn for complex tensors ( #39955 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955
resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors.
`torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0`
This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D23460526
Pulled By: anjali411
fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92
2020-09-22 08:24:53 -07:00
anjali411
9f67176b82
Complex gradcheck logic ( #43208 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208
This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf
More concretely, this PR introduces the following changes:
1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated.
2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added.
3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`.
4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`.
Follow up tasks:
1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)`
2. Add back commented test in `common_methods_invocation.py`.
3. Add more special case checking for complex gradcheck to make debugging easier.
4. Update complex autograd note.
5. disable complex autograd for operators not tested for complex.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D23655088
Pulled By: anjali411
fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb
2020-09-20 22:05:04 -07:00
Peter Bell
da7863f46b
Add one dimensional FFTs to torch.fft namespace ( #43011 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D23751850
Pulled By: mruberry
fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33
2020-09-19 23:32:22 -07:00
Richard Zou
69f6d94caa
Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators ( #44422 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422
See #44052 for context.
Test Plan:
- `pytest test/test_autograd.py -v`
- `pytest test/test_nn.py -v`
Reviewed By: mrshenli
Differential Revision: D23607691
Pulled By: zou3519
fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6
2020-09-10 18:43:18 -07:00
Richard Zou
7ff7e6cfc8
Register cummaxmin_backward, cumprod_backward as operators ( #44410 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410
See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.
Test Plan: - `pytest test/test_autograd.py -v`
Reviewed By: mrshenli
Differential Revision: D23605503
Pulled By: zou3519
fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
2020-09-10 18:43:15 -07:00
Richard Zou
08b431f54c
Add trace_backward, masked_select_backward, and take_backward as ops ( #44408 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408
See #44052 for context.
Test Plan: - `pytest test/test_autograd.py -v`
Reviewed By: mrshenli
Differential Revision: D23605504
Pulled By: zou3519
fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3
2020-09-10 18:41:07 -07:00
Richard Zou
9a5a732866
Register some backwards functions as operators ( #44052 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052
Summary
=======
This PR registers the following backwards functions as operators:
- slice_backward
- select_backward
- gather_backward
- index_select_backward (the backward function for index_select)
- select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc)
In the future, I'd like to register more backward functions as operators
so that we can write batching rules for the backward functions. Batching
rules for backward functions makes it so that we can compute batched
gradients.
Motivation
==========
The rationale behind this PR is that a lot of backwards functions (27 in total)
are incompatible with BatchedTensor due to using in-place operations.
Sometimes we can allow the in-place operations, but other times we can't.
For example, consider select_backward:
```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
auto grad_input = at::zeros(input_sizes, grad.options());
grad_input.select(dim, index).copy_(grad);
return grad_input;
}
```
and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
torch.autograd.grad(x[0], x, v)
vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```
For the batched gradient use case, `grad` is a BatchedTensor.
The physical version of `grad` has size `(B0,)`.
However, select_backward creates a `grad_input` of shape `(5)`, and
tries to copy `grad` to a slice of it.
Other approaches
================
I've considered the following:
- register select_backward as an operator (this PR)
- have a branch inside select_backward for if `grad` is batched.
- this is OK, but what if we have more tensor extensions that want to override this?
- modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior".
- select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful
Test Plan
=========
- `pytest test/test_autograd.py -v`
- Registering backward functions may impact performance. I benchmarked
select_backward to see if registering it as an operator led to any noticable
performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc .
The TL;DR is that the overhead is pretty minimal.
Test Plan: Imported from OSS
Reviewed By: ezyang, fbhuba
Differential Revision: D23481183
Pulled By: zou3519
fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350
2020-09-04 08:30:39 -07:00
albanD
73f009a2aa
refactor manual function definitions ( #43711 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711
this makes them available in forward if needed
No change to the file content, just a copy-paste.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23454146
Pulled By: albanD
fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195
2020-09-02 09:23:21 -07:00