pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
albanD	cba8516b52	make internal forwardAD methods on at::Tensor internal (#54099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54099 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117838 Pulled By: albanD fbshipit-source-id: ede96529a4b099dea9cf885d0bf2cb352aa30fa5	2021-03-18 09:27:17 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
lezcano	1f5b9170aa	Faster backwards for cumsum and cumprod (#53711 ) Summary: Provides a faster formula for `cumprod` in the case when the input has zeros. This formula is non-differentiable, so we leave the previous formula for the cases when `at::GradMode::is_enabled()`. This new formula gives up to x10 and x30 speed-ups in CPU and GPU (see the benchmarks below). The `cumsum` backward formula was rewritten so that no copies are necessary. We also removed a double negation in its formula. This gives a significant speed-up in CPU, while being almost as efficient as the formula with copies in GPU. We can see this speed-up when comparing the "No zeros" part of the benchmark. Benchmarks: nb. It is worth noting that the script tests the forward and the backward for `cumprod`, so the speed-ups should be even larger than those announced here. <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, size_prod, zeros, device): print(f"ndims: {ndims}, tensor_size: {size}, size_prod: {size_prod}, zeros: {zeros}, device: {device}") for dim in range(ndims): sizes = ndims * [size] sizes[dim] = size_prod tensor = torch.rand(sizes, device=device) with torch.no_grad(): if zeros: # Set 0.1 of them to zero p_drop = 0.1 mask = torch.full_like(tensor, 1.0 - p_drop) tensor = tensor torch.bernoulli(mask) else: tensor = tensor + 1e-3 tensor.requires_grad_() grad = torch.ones_like(tensor) # We test both forward + backward, meaning that the speed-up is actually greater than reported # That being said, this is more realistic than doing `retain_graph=True` command = "torch.autograd.grad([tensor.cumprod(dim)], [tensor], grad_outputs=[grad])" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for device, zeros in product([cuda, cpu], [True, False]): run_test(3, 300, 10, zeros, device) run_test(3, 300, 100, zeros, device) if device == cuda: run_test(3, 300, 300, zeros, device) ``` </details> <details> <summary>CPU This PR (Some regression small tensors, x4 speed-up large tensors)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 28.2 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 29.8 ms ± 78.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 24.5 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 414 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 428 ms ± 4.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 382 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.11 ms ± 9.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.83 ms ± 3.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.08 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 92.2 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 101 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 87 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA This PR (7-30x speed-up)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 1.46 ms ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.48 ms ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.93 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 10.5 ms ± 914 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.6 ms ± 509 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.7 ms ± 864 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 30.3 ms ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 32.2 ms ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 248 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 252 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 438 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.1 ms ± 193 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.16 ms ± 380 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.59 ms ± 398 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 6.3 ms ± 857 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.39 ms ± 288 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.15 ms ± 233 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 8.27 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.8 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 28.2 ms ± 74.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 1.53 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.95 s ± 4.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.86 s ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.42 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.25 ms ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.34 ms ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 104 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 117 ms ± 99.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 94.8 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 912 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.05 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.74 ms ± 381 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 71.3 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 9.82 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 119 ms ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 646 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 776 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 917 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 301 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 308 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 592 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.61 ms ± 375 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.68 ms ± 524 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.38 ms ± 736 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 7.89 ms ± 848 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.03 ms ± 517 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.24 ms ± 405 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53711 Reviewed By: jbschlosser Differential Revision: D27059662 Pulled By: anjali411 fbshipit-source-id: be610d5590c0199b4412dff66fac47666faaff9d	2021-03-16 13:57:43 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Nikita Vedeneev	8f15a2f052	eig_backward: faster and with complex support (#52875 ) Summary: As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods. Fixes https://github.com/pytorch/pytorch/issues/51621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875 Reviewed By: mrshenli Differential Revision: D26768653 Pulled By: anjali411 fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655	2021-03-10 11:33:30 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
kshitij12345	748285ccd7	[complex] add autograd support for torch.polar (#52488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52488 Reviewed By: zou3519 Differential Revision: D26711841 Pulled By: anjali411 fbshipit-source-id: b8538fb8cb44456b832e4f993cf41954b3ddd2e8	2021-03-01 21:57:35 -08:00
Richard Barnes	fa325d7c9f	Use `sum_integers` and `multiply_integers` (#51146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51146 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25903430 fbshipit-source-id: 329c14018c9e5192864eed88a8ed0a5068ff1c69	2021-02-10 18:05:45 -08:00
Alexander	0c313564af	Backward through sparse_coo_tensor (#50361 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49683 This PR solves Backward through sparse_coo_tensor bug by implementing a `sparse_mask_helper` function for n-dimensional sparse tensor for CPU and CUDA which is used to reimplement `sparse_constructor_values_backward` function. This `sparse_mask` function was implemented before for backward sparse-sparse matmul. However, the algorithm is little different because in this case it should be applyable not only for matrices but for n-dimensional tensors. Thankfully it was not quite hard to extend and now both share the same code base. Note that no new tests are required because now the backward for sparse-sparse matmul now uses the new `sparse_mask_helper`. ngimel, mruberry - kindly review this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50361 Reviewed By: zhangguanheng66 Differential Revision: D26270483 Pulled By: ngimel fbshipit-source-id: ee4bda49ff86e769342674b64d3c4bc34eae38ef	2021-02-06 23:15:54 -08:00
Peter Bell	b150f150ba	Add division overload with rounding_mode selection (#51706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280 As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46	2021-02-04 13:08:36 -08:00
anjali411	bd3ae117fc	Fixes cat backward formula to return correct gradient values for R -> C case (#51681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51681 Fixes https://github.com/pytorch/pytorch/issues/51627 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D26238748 Pulled By: anjali411 fbshipit-source-id: 1dc47f8ddddbf3f2c176f21e5dcee917f84f4c93	2021-02-03 21:29:55 -08:00
XiaobingSuper	ec378055c3	add OneDNN linear backward (#49453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49453 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006889 Pulled By: VitalyFedyunin fbshipit-source-id: 06e2a02b6e01d847395521a31fe84d844f2ee9ae	2021-02-02 12:18:59 -08:00
Ivan Yashchuk	ddf26816d3	Make torch.svd return V, not V.conj() for complex inputs (#51012 ) Summary: BC-breaking note: torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy. This will silently break all users of torch.svd() with complex inputs. Original PR Summary: This PR resolves https://github.com/pytorch/pytorch/issues/45821. The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V. Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012 Reviewed By: bdhirsh Differential Revision: D26047593 Pulled By: albanD fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca	2021-01-25 14:06:41 -08:00
Tugsbayasgalan Manlaibaatar	1a38fa9930	Striding for lists Part 1 (#48719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48719 Attempt to break this PR (https://github.com/pytorch/pytorch/pull/33019) into two parts. As per our discussion with eellison, the first part is to make sure our aten::slice operator take optional parameters for begin/step/end. This will help with refactoring ir_emitter.cpp for genering handling for list and slice striding. Once this PR merged, we will submit a second PR with compiler change. Test Plan: None for this PR, but new tests will be added for the second part. Imported from OSS Reviewed By: jamesr66a Differential Revision: D25929902 fbshipit-source-id: 5385df04e6d61ded0699b09bbfec6691396b56c3	2021-01-19 09:30:01 -08:00
Richard Zou	1154a8594e	Add instructional error message for cudnn RNN double backward workaround (#33884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33884 Mitigates https://github.com/pytorch/pytorch/issues/5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Reviewed By: albanD Differential Revision: D20143544 Pulled By: zou3519 fbshipit-source-id: c2e49b3d8bdb9b34b561f006150e4c7551a78fac	2021-01-19 09:05:36 -08:00
Ivan Yashchuk	f9a5ba7398	Added linalg.slogdet (#49194 ) Summary: This PR adds `torch.linalg.slogdet`. Changes compared to the original torch.slogdet: - Complex input now works as in NumPy - Added out= variant (allocates temporary and makes a copy for now) - Updated `slogdet_backward` to work with complex input Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194 Reviewed By: VitalyFedyunin Differential Revision: D25916959 Pulled By: mruberry fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48	2021-01-19 07:28:12 -08:00
anjali411	227acc2e51	Complex autograd support for torch.{baddbmm, addbmm, addmm, addmv} (#50632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50632 I'll port the following method tests in follow-up PRs: `'baddbmm', 'addbmm', 'addmv', 'addr'` After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values. Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25947471 Pulled By: anjali411 fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591	2021-01-18 14:05:02 -08:00
Jeffrey Wan	6e3e57095c	Add complex support for torch.nn.L1Loss (#49912 ) Summary: Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640) Things added in this PR: 1. Modify backward and double-backward formulas 2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1) 3. Modify some existing tests to support complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912 Reviewed By: zhangguanheng66 Differential Revision: D25853036 Pulled By: soulitzer fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad	2021-01-15 15:53:15 -08:00
Howard Huang	ec51b67282	Fix elu backward operation for negative alpha (#49272 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49272 Test Plan: ``` x = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True) y = torch.nn.functional.elu_(x.clone(), alpha=-2) grads = torch.tensor(torch.ones_like(y)) y.backward(grads) ``` ``` RuntimeError: In-place elu backward calculation is triggered with a negative slope which is not supported. This is caused by calling in-place forward function with a negative slope, please call out-of-place version instead. ``` Reviewed By: albanD Differential Revision: D25569839 Pulled By: H-Huang fbshipit-source-id: e3c6c0c2c810261566c10c0cc184fd81b280c650	2021-01-11 12:52:52 -08:00
Nikita Vedeneev	eb87686511	svd_backward: more memory and computationally efficient. (#50109 ) Summary: As per title. CC IvanYashchuk (unfortunately I cannot add you as a reviewer for some reason). Pull Request resolved: https://github.com/pytorch/pytorch/pull/50109 Reviewed By: gchanan Differential Revision: D25828536 Pulled By: albanD fbshipit-source-id: 3791c3dd4f5c2a2917eac62e6527ecd1edcb400d	2021-01-11 05:28:43 -08:00
Antonio Cuni	b5ab0a7f78	Improve torch.linalg.qr (#50046 ) Summary: This is a follow up of PR https://github.com/pytorch/pytorch/issues/47764 to fix the remaining details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50046 Reviewed By: zou3519 Differential Revision: D25825557 Pulled By: mruberry fbshipit-source-id: b8e335e02265e73484a99b0189e4cc042828e0a9	2021-01-08 09:52:31 -08:00
Sebastian Messmer	c7e9abb66a	Making ops c10-full: list of optional tensors (#49138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138 See for details: https://fb.quip.com/QRtJAin66lPN We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this. ## Backwards Compatibility - This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`. - This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57 - This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`. ghstack-source-id: 119269131 Test Plan: ## Benchmarks (C++ instruction counts): ### Forward #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x[0] = 1 \|11566015 \|11566015\|0 \|0.00% \| \|x.index({0}) \|6807019 \|6801019 \|-6000 \|-0.09%\| \|x.index({0, 0}) \|13529019 \|13557019\|28000 \|0.21% \| \|x.index({0, 0, 0}) \|10677004 \|10692004\|15000 \|0.14% \| \|x.index({"..."}) \|5512015 \|5506015 \|-6000 \|-0.11%\| \|x.index({Slice(None, None, None)}) \|6866016 \|6936016 \|70000 \|1.02% \| \|x.index({None}) \|8554015 \|8548015 \|-6000 \|-0.07%\| \|x.index({false}) \|22400000 \|22744000\|344000 \|1.54% \| \|x.index({true}) \|27624088 \|27264393\|-359695\|-1.30%\| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})\|123472000\|123463306\|-8694\|-0.01%\| ### Autograd #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}, torch::requires_grad()); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` Note: the script measures the forward path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path. #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x.index({0}) \|14839019\|14833019\|-6000\| 0.00% \| \|x.index({0, 0}) \|28342019\|28370019\|28000\| 0.00% \| \|x.index({0, 0, 0}) \|24434004\|24449004\|15000\| 0.00% \| \|x.index({"..."}) \|12773015\|12767015\|-6000\| 0.00% \| \|x.index({Slice(None, None, None)}) \|14837016\|14907016\|70000\| 0.47% \| \|x.index({None}) \|15926015\|15920015\|-6000\| 0.00% \| \|x.index({false}) \|36958000\|37477000\|519000\| 1.40% \| \|x.index({true}) \|41971408\|42426094\|454686\| 1.08% \| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) \|168184392\|164545682\|-3638710\| -2.16% \| Reviewed By: bhosmer Differential Revision: D25454632 fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d	2021-01-04 05:04:02 -08:00
Jeffrey Wan	4677fc69a2	Fix inf norm grad (reland) (#48611 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/48122 Does this result in a regression? No significant regression observed. Timer script: ``` import torch from torch.utils.benchmark import Timer setup=""" a = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2) """ stmt=""" torch.autograd.grad(torch.norm(a, dim=(0,), keepdim=False), a, gradient) """ timer = Timer(stmt, setup) print(timer.timeit(10000)) print(timer.collect_callgrind(100)) ``` Note: small matrix, keepdim is False, and dims is non-empty Before change ``` Runtime 37.37 us 1 measurement, 10000 runs , 1 thread All Noisy symbols removed Instructions: 15279045 15141710 Baseline: 4257 3851 100 runs per measurement, 1 thread ``` After change ``` Runtime 36.08 us 1 measurement, 10000 runs , 1 thread All Noisy symbols removed Instructions: 15296974 15153534 Baseline: 4257 3851 100 runs per measurement, 1 thread ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48611 Reviewed By: albanD, mruberry Differential Revision: D25309997 Pulled By: soulitzer fbshipit-source-id: 5fb950dc9259234342985c0e84ada25a7e3814d6	2020-12-30 21:13:33 -08:00
anjali411	97c17b4772	Fix auto exponent issue for torch.pow (#49809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49809 Fixes https://github.com/pytorch/xla/issues/2688 #46936 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25724176 Pulled By: anjali411 fbshipit-source-id: 16287a1f481e9475679b99d6fb45de840da225be	2020-12-29 17:02:56 -08:00
Antonio Cuni	361f5ed91d	Implement torch.linalg.qr (#47764 ) Summary: I am opening this PR early to have a place to discuss design issues. The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following: `reduced` this is completely equivalent to `some=True`, and both are the default. `complete` this is completely equivalent to `some=False`. `r` this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I think that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case. `raw` in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world. I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives. `full`, `f` alias for `reduced`, deprecated since numpy 1.8.0 `economic`, `e` similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0 To summarize: * `reduce`, `complete` and `r` are straightforward to implement. * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future? * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead /cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764 Reviewed By: ngimel Differential Revision: D25708870 Pulled By: mruberry fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b	2020-12-28 17:28:17 -08:00
albanD	c23808d8e8	Reland: Add base forward grad logic (#49734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd	2020-12-22 12:11:27 -08:00
Martin Yuan	590e7168ed	[PyTorch] Remove direct reference to native symbols in sparse related non-native codes (#49721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49721 As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998). All symbols of ATen/native could only be referenced through dispatcher (https://github.com/pytorch/pytorch/issues/48684). This diff is to decouple the native reference recently introduced for sparse tensors. ghstack-source-id: 119028080 Test Plan: CI Reviewed By: dhruvbird, ngimel Differential Revision: D25675711 fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c	2020-12-21 22:16:20 -08:00
Walter Shen	f5178bf151	Revert D25607503: Add base forward grad logic Test Plan: revert-hammer Differential Revision: D25607503 (`fdf02eff3d`) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f	2020-12-21 19:56:28 -08:00
albanD	fdf02eff3d	Add base forward grad logic (#49097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099	2020-12-21 14:39:43 -08:00
Alexander	44ce0b8883	Sparse-sparse matrix multiplication (CPU/CUDA) (#39526 ) Summary: This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format. The current implementation of `torch.sparse.mm` support this configuration, `torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large. This implementation extends `torch.sparse.mm` function to support `torch.sparse.mm(sparse_matrix1, sparse_matrix2)` Resolves #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA. - [x] sparse matmul - [x] CPU/CUDA C++ implementation - [x] unittests - [x] update torch.sparse.mm documentation - [x] autograd support The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm. Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars: size \| density \| sparse.mm(CUDA) \| sparse.mm(CPU) \| scipy_coo_matmul -- \| -- \| -- \| -- \| -- (32, 10000) \| 0.01 \| 822.7 \| 79.4 \| 704.1 (32, 10000) \| 0.05 \| 1741.1 \| 402.6 \| 1155.3 (32, 10000) \| 0.1 \| 2956.8 \| 840.8 \| 1885.4 (32, 10000) \| 0.25 \| 6417.7 \| 2832.3 \| 4665.2 (512, 10000) \| 0.01 \| 1010.2 \| 3941.3 \| 26937.7 (512, 10000) \| 0.05 \| 2216.2 \| 26903.8 \| 57343.7 (512, 10000) \| 0.1 \| 4868.4 \| 87773.7 \| 117477.0 (512, 10000) \| 0.25 \| 16639.3 \| 608105.0 \| 624290.4 (1024, 10000) \| 0.01 \| 1224.8 \| 13088.1 \| 110379.2 (1024, 10000) \| 0.05 \| 3897.5 \| 94783.9 \| 236541.8 (1024, 10000) \| 0.1 \| 10559.1 \| 405312.5 \| 525483.4 (1024, 10000) \| 0.25 \| 57456.3 \| 2424337.5 \| 2729318.7 A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking: ``` [------------------------- sparse.mm-backward -------------------------] \| sparse.backward \| dense.backward ----------------------------------------------------------------------- (32, 10000) \| 0.01 \| 13.5 \| 2.4 (32, 10000) \| 0.05 \| 52.3 \| 2.4 (512, 10000) \| 0.01 \| 1016.8 \| 491.5 (512, 10000) \| 0.05 \| 1604.3 \| 492.3 (1024, 10000) \| 0.01 \| 2384.1 \| 1963.7 (1024, 10000) \| 0.05 \| 3965.8 \| 1951.9 ``` I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels. ``` [---------------------------------- matmul ---------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------ (cpu) torch \| 5.4 \| 5.4 \| 5.2 \| 5.3 \| 5.3 \| 5.4 torch.sparse \| 122.2 \| 51.9 \| 27.5 \| 11.4 \| 4.9 \| 1.8 scipy \| 150.1 \| 87.4 \| 69.2 \| 56.8 \| 38.4 \| 17.1 (cuda) torch \| 1.3 \| 1.1 \| 1.1 \| 1.1 \| 1.1 \| 1.1 torch.sparse \| 20.0 \| 8.4 \| 5.1 \| 2.5 \| 1.5 \| 1.1 [----------------------------------- backward -----------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ----------------------------------------------------------------------- (cpu) torch \| 17.7 \| 17.9 \| 17.7 \| 17.7 \| 17.6 \| 17.9 torch.sparse \| 672.9 \| 432.6 \| 327.5 \| 230.8 \| 176.7 \| 116.7 (cuda) torch \| 3.8 \| 3.6 \| 3.5 \| 3.5 \| 3.6 \| 3.5 torch.sparse \| 68.8 \| 46.2 \| 35.6 \| 24.2 \| 17.8 \| 11.9 Times are in milliseconds (ms). ``` In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before. ## References 1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. Sparse GPU Kernels for Deep Learning. Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk) 2. Trevor Gale, Erich Elsen, Sara Hooker. The State of Sparsity in Deep Neural Networks. [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526 Reviewed By: mruberry Differential Revision: D25661239 Pulled By: ngimel fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938	2020-12-21 11:53:55 -08:00
Ivan Yashchuk	8be205ae13	Added linalg.solve (#48456 ) Summary: This PR adds `torch.linalg.solve`. `linalg_solve_out` uses in-place operations on the provided result tensor. I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization. In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456 Reviewed By: izdeby Differential Revision: D25562222 Pulled By: mruberry fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b	2020-12-21 10:11:12 -08:00
Ivan Yashchuk	f5ee619d2a	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: ngimel Differential Revision: D25658897 Pulled By: mruberry fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01	2020-12-20 14:39:31 -08:00
Mike Ruberry	f5b68e74d7	Revert D25574962: [pytorch][PR] Updated derivative rules for complex svd and pinverse Test Plan: revert-hammer Differential Revision: D25574962 (`9955355853`) Original commit changeset: 832b61303e88 fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd	2020-12-17 00:59:43 -08:00
Ryan Spring	65876d3f51	Change aten::native_layer_norm signature to match torch.layer_norm definition (#48971 ) Summary: This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training. `native_layer_norm(X, gamma, beta, M, N, eps)` => `native_layer_norm(input, normalized_shape, weight, bias, eps)` `native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` => `native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971 Reviewed By: izdeby Differential Revision: D25574070 Pulled By: ngimel fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4	2020-12-16 23:09:18 -08:00
Ivan Yashchuk	9955355853	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: izdeby Differential Revision: D25574962 Pulled By: mruberry fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6	2020-12-16 12:32:22 -08:00
Richard Zou	f98d8c6237	Move inplace_is_vmap_compatible to BatchedTensorImpl.h (#49118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49118 I need this in the next stack up. It seems useful to have as a helper function. Test Plan: - run tests Reviewed By: izdeby Differential Revision: D25563546 Pulled By: zou3519 fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2	2020-12-16 11:30:13 -08:00
Peter Bell	94a3d4b083	Remove unused operator at::_fft_with_size (#48905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48905 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25480385 Pulled By: mruberry fbshipit-source-id: 192d04a1b7e33b4e408cda8a82679c3ae3490a7d	2020-12-13 20:28:41 -08:00
Ivan Yashchuk	6c1b405a3b	Updated derivative rules for complex QR decomposition (#48489 ) Summary: Updated `qr_backward` to work correctly for complex-valued inputs. Added `torch.qr` to list of complex tests. The previous implementation for real-valued differentiation used equation 42 from https://arxiv.org/abs/1001.1654 The current implementation is a bit simpler but the result for the real-valued input case is the same and all tests still pass. Derivation of complex-valued QR differentiation https://giggleliu.github.io/2019/04/02/einsumbp.html Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48489 Reviewed By: bdhirsh Differential Revision: D25272344 Pulled By: albanD fbshipit-source-id: b53c1fca1683f4aee5f4d5ce3cab9e559170e7cf	2020-12-11 14:14:40 -08:00
Kurt Mohler	54f0556ee4	Add missing complex support for torch.norm and torch.linalg.norm (#48284 ) Summary: BC-breaking note: Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy. PR Summary: The following cases were previously unsupported for complex inputs, and this commit adds support: - Frobenius norm - Norm order 2 (vector and matrix) - CUDA vector norm Part of https://github.com/pytorch/pytorch/issues/47833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284 Reviewed By: H-Huang Differential Revision: D25420880 Pulled By: mruberry fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017	2020-12-10 10:23:45 -08:00
Peter Bell	fc0a3a1787	Improve torch.fft n-dimensional transforms (#46911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46911 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25420647 Pulled By: mruberry fbshipit-source-id: bf7e6a2ec41f9f95ffb05c128ee0f3297e34aae2	2020-12-09 12:40:06 -08:00
Ivan Yashchuk	85121a7a0f	Added CUDA support for complex input for torch.cholesky_solve (#47047 ) Summary: `torch.cholesky_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs now. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047 Reviewed By: ngimel Differential Revision: D24730020 Pulled By: mruberry fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f	2020-12-05 20:18:30 -08:00
Ivan Yashchuk	4ed7f36ed1	Added linalg.eigh, linalg.eigvalsh (#45526 ) Summary: This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility. The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does). Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526 Reviewed By: gchanan Differential Revision: D25022659 Pulled By: mruberry fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a	2020-11-22 04:57:28 -08:00
Nikita Shulga	6d0947c8cf	Revert D25093315: [pytorch][PR] Fix inf norm grad Test Plan: revert-hammer Differential Revision: D25093315 (`ca880d77b8`) Original commit changeset: be1a7af32fe8 fbshipit-source-id: b383ec2a2c5884149b4fc7896f9d2856259794cd	2020-11-20 18:27:52 -08:00
Jeffrey Wan	ca880d77b8	Fix inf norm grad (#48122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41779 Also fixes an issue with inf norm returning small non-zero values due to usage of `numeric_limit::min` which actually "returns the minimum positive normalized value" when applied to floating-point numbers. See https://en.cppreference.com/w/cpp/types/numeric_limits/min. ``` >>> import torch >>> with torch.enable_grad(): ... a = torch.tensor([ ... [9., 2., 9.], ... [-2., -3., -4.], ... [7., 8., -9.], ... ], requires_grad=True) ... b = torch.norm(a, p=float('inf')) ... b.backward() ... print(a.grad) ... tensor([[ 0.3333, 0.0000, 0.3333], [-0.0000, -0.0000, -0.0000], [ 0.0000, 0.0000, -0.3333]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48122 Reviewed By: izdeby Differential Revision: D25093315 Pulled By: soulitzer fbshipit-source-id: be1a7af32fe8bac0df877971fd75089d33e4bd43	2020-11-20 10:22:11 -08:00
Richard Zou	370310bedb	batched grad for binary_cross_entropy, symeig (#48057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48057 This PR fixes batched grad computation for: - binary_cross_entropy (i.e., vmap through binary_cross_entropy_double_backward) - symeig (i.e. vmap through symeig_backward) It was previously impossible to vmap through those functions because they use in-place operations in a vmap-incompatible way. See note at `233192be73/aten/src/ATen/BatchedFallback.cpp (L117-L122)` for what it means for an in-place operation to be vmap-incompatible. This PR adds a check: if the in-place operations in e.g. symeig are vmap-incompatible and we are inside of a vmap, then we do the out-of-place variant of the operation. Ditto for binary_cross_entropy. This is to avoid code duplication: the alternative would be to register the backward formula as an operator and change just those lines to be out-of-place! This PR also adds some general guidelines for what to do if an in-place operation is vmap-incompatible. General guidelines ------------------ If an in-place operation used in a backward formula is vmap-incompatible, then as developers we have the following options: - If the in-place operation directly followed the creation of a tensor with a factory function like at::zeros(...), we should replace the factory with a corresponding grad.new_zeros(...) call. The grad.new_zeros(...) call propagates the batch dims to the resulting tensor. For example: Before: at::zeros(input.sizes(), grad.options()).copy_(grad) After: grad.new_zeros(input.sizes()).copy_(grad) - If the in-place operation followed some sequence of operations, if the we want to be able to vmap over the backward formula as-is (this is usually the case for simple (<15loc) backward formulas), then use inplace_is_vmap_compatible to guard the operation. For example: c = a * b Before: c.mul_(grad) After: c = inplace_is_vmap_compatible(c, grad) ? c.mul_(grad) : c * grad - If we don't want to vmap directly over the backward formula (e.g., if the backward formula is too complicated or has a lot of vmap-incompatible operations, then register the backward formula as an operator and eventually write a batching rule for it. Test Plan --------- New tests Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25069525 Pulled By: zou3519 fbshipit-source-id: e0dfeb5a812f35b7579fc6ecf7252bf31ce0d790	2020-11-19 07:59:02 -08:00
Mike Ruberry	013e6a3d9d	Revert D24698027: Fix auto exponent issue for torch.pow Test Plan: revert-hammer Differential Revision: D24698027 (`8ef7ccd669`) Original commit changeset: f23fdb65c925 fbshipit-source-id: 9a67a2c6310c9e4fdefbb421a8cd4fa41595bc9a	2020-11-15 03:58:44 -08:00
anjali411	8ef7ccd669	Fix auto exponent issue for torch.pow (#47024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47024 Fixes https://github.com/pytorch/pytorch/issues/46936 Stack from [ghstack](https://github.com/ezyang/ghstack): * #47024 Fix auto exponent issue for torch.pow Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24698027 Pulled By: anjali411 fbshipit-source-id: f23fdb65c925166243593036e08214c4f041a63d	2020-11-14 22:50:12 -08:00
Ivan Yashchuk	149190c014	Added CUDA support for complex input for torch.solve (#47045 ) Summary: `torch.solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs. Fixes https://github.com/pytorch/pytorch/issues/41084 Ref. https://github.com/pytorch/pytorch/issues/33152 anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045 Reviewed By: nikithamalgifb Differential Revision: D24921503 Pulled By: anjali411 fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923	2020-11-12 12:22:59 -08:00
Richard Zou	57dcb04239	Batched gradient support for view+inplace operations (#47227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47227 Motivation ---------- We would like to compute batched gradients for view+inplace operations. This most notably shows up in internal implementation of operations. For example, many view backward functions (SelectBackward, DiagonalBackward) are implemented with view+inplace, so to support vectorized hessian computation for e.g. torch.select and torch.diagonal we would need a way to handle or workaround view+inplace. Approach -------- view+inplace creates a CopySlices node and transmute view backward nodes into an AsStrided node. For example, ``` leaf = torch.randn(4, 5, requires_grad=True) base = leaf * leaf view = base[0] view.cos_() ``` base.grad_fn is CopySlices and view.grad_fn is AsStridedBackward. To support vmap over CopySlices and AsStridedBackward: - We use `new_empty_strided` instead of `empty_strided` in CopySlices so that the batch dims get propagated - We use `new_zeros` inside AsStridedBackward so that the batch dims get propagated. Test Plan --------- - New tests. When we get closer to having most operations support batched grad computation via vmap, I'd like to add it as an option to gradcheck and turn it on for our tests. Test Plan: Imported from OSS Reviewed By: kwanmacher, glaringlee Differential Revision: D24741687 Pulled By: zou3519 fbshipit-source-id: 8210064f782a0a7a193752029a4340e505ffb5d8	2020-11-10 07:38:02 -08:00

1 2

75 Commits