pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Zou	41e1ab0785	Introduce isTensorSubclassLike; add special cases to backwards formulas (#69534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534 Something is TensorSubclassLike if it is a Tensor subclass or if it has the same problems as Tensor subclasses. Today that just includes Tensor Subclasses and meta tensors but may include other things in the future. Some of our backwards formulas are incompatible with TensorSubclassLike objects. For example, calling .data_ptr() is a problem because many TensorSubclassLike objects don't have storage. Another problem is in-place operations: performing `regular_tensor.inplace_(tensor_subclass)` is a problem. This PR adds special cases to the backward formulas for torch.max and torch.clamp to handle this. The backward formulas for torch.max and torch.clamp are not dispatcher operations so they cannot be overridden and we hesitate to make them dispatcher operations for FC/BC concerns and performance overhead concerns. Furthermore, the old concept of "is this inplace operation vmap compatible?" can be subsumed by the general "is this inplace operation tensor-subclass compatible" question, so I replaced all instances of isInplaceVmapCompatible and replaced it with the isTensorSubclassLike checks. Test Plan - I tested the changes using functorch. - It's possible to write a test for these in core (one has to make a custom tensor subclass and then send it through the operation and then invoke autograd), but I wanted to push the work to doing some generic testing for backward formulas (https://github.com/pytorch/pytorch/issues/69530) instead of doing some one-off things now. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32967727 Pulled By: zou3519 fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc	2021-12-09 15:03:22 -08:00
lezcano	cafcf599d0	Deprecate torch.triangular_solve (#63570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570 There is a use of `at::triangular_solve_out` in the file `torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared to move to `at::linalg_solve_triangular_out`. Deprecation note: This PR deprecates the `torch.triangular_solve` function in favor of `torch.linalg.solve_triangular`. An upgrade guide is added to the documentation for `torch.triangular_solve`. Note that it DOES NOT remove `torch.triangular_solve`, but `torch.triangular_solve` will be removed in a future PyTorch release. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618035 Pulled By: anjali411 fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83	2021-12-02 13:24:55 -08:00
lezcano	f9e69af22e	Modify LU_backward and lu_solve_backward to use linalg_solve_triangular (#63569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569 This PR also rewrites `lu_solve_backward` from scratch going from solving 5 systems of equations to just 2. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618014 Pulled By: anjali411 fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538	2021-12-01 07:34:38 -08:00
Mike Ruberry	6ae34ea6f8	Revert D32521980: Add linalg.lu_factor Test Plan: revert-hammer Differential Revision: D32521980 (`b10929a14a`) Original commit changeset: 26a49ebd87f8 fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82	2021-11-28 17:22:15 -08:00
lezcano	b10929a14a	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32521980 Pulled By: mruberry fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb	2021-11-27 17:52:48 -08:00
lezcano	b46c89d950	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32588230 Pulled By: mruberry fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910	2021-11-22 12:41:06 -08:00
soulitzer	7bb401a4c9	Add forward AD support for miscellanous operators (#67820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820 Original PR here: https://github.com/pytorch/pytorch/pull/67040 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32314423 Pulled By: soulitzer fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b	2021-11-19 14:31:06 -08:00
jiej	ca92111758	Add native_dropout (#63937 ) Summary: Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937 Reviewed By: mruberry Differential Revision: D32477657 Pulled By: ngimel fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4	2021-11-18 19:41:10 -08:00
Jane Xu	9f4e004abd	Revert D32283178: Add linalg.solve_triangular Test Plan: revert-hammer Differential Revision: D32283178 (`0706607abc`) Original commit changeset: deb672e6e52f fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755	2021-11-18 14:46:10 -08:00
lezcano	0706607abc	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: zou3519, JacobSzwejbka Differential Revision: D32283178 Pulled By: mruberry fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8	2021-11-18 09:45:51 -08:00
Nikita Vedeneev	857fed1f42	torch.linalg.qr: forward AD support (#67268 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67268 Reviewed By: ngimel Differential Revision: D31960517 Pulled By: albanD fbshipit-source-id: bfd1028a8d352f550efb420f9ca609c09f4a7484	2021-11-18 08:11:54 -08:00
Matthias Reis	4c346bd073	Added forward derivatives for neg, diag, inverse, linalg_eig (#67837 ) Summary: Recreated due to CI failures as per comment https://github.com/pytorch/pytorch/pull/67339#issuecomment-959893293 === See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67837 Reviewed By: mrshenli Differential Revision: D32403662 Pulled By: soulitzer fbshipit-source-id: 529cb93f865ce4cc2e24fa6f672d4234e7abe2b1	2021-11-16 20:32:47 -08:00
Masaki Kozuki	c5e5264be2	Disable TF32 in `pinv_jvp` and `pinv_backward` (#67948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67947 cc ptrblck xwang233 zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/67948 Reviewed By: H-Huang Differential Revision: D32251934 Pulled By: ngimel fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec	2021-11-08 22:33:29 -08:00
Natalia Gimelshein	98be5216e2	Revert D32104006: [pytorch][PR] Added forward derivatives for neg, diag, inverse, linalg_eig Test Plan: revert-hammer Differential Revision: D32104006 (`88c61b8d06`) Original commit changeset: 1f6ace09ee3e fbshipit-source-id: f9f950b4177e1fe29b9059f4b5dfb9c8c67f479a	2021-11-03 12:40:00 -07:00
Matthias Reis	88c61b8d06	Added forward derivatives for neg, diag, inverse, linalg_eig (#67339 ) Summary: See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67339 Reviewed By: ejguan Differential Revision: D32104006 Pulled By: albanD fbshipit-source-id: 1f6ace09ee3e737b99520543b30550601809ceb5	2021-11-03 11:21:54 -07:00
Nikita Vedeneev	3c61700cf7	`torch.linalg.householder_product`: forward AD support (#67043 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67043 Reviewed By: VitalyFedyunin Differential Revision: D31897617 Pulled By: albanD fbshipit-source-id: ef135fe3d9e5b9b2a541c355017f07cdb1309979	2021-10-26 08:34:00 -07:00
lezcano	d3fc3c4ded	Implement forward AD for linalg.matrix_exp (#62716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62716 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823231 Pulled By: mruberry fbshipit-source-id: 6d19b8988dce773b5716f0522d06febfe167fead	2021-10-21 23:55:36 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Nikita Vedeneev	7fad47e522	`torch.linalg.lstsq`: forward/backward AD support (#65054 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65054 Reviewed By: zou3519 Differential Revision: D31729468 Pulled By: albanD fbshipit-source-id: ab7df824bc80128e7f64f6444c7a4baa4786c161	2021-10-18 11:28:44 -07:00
Nikita Vedeneev	06c37876b8	`torch.linalg.householder_product` faster backward (#63880 ) Summary: This PR implements a much more efficient algorithm. This algorithm allows to achieve MASSIVE speed-ups, especially for batched and/or larger double-precision inputs. Here are some benchmarks: <details> <summary>Testing script</summary> ```python from IPython import get_ipython import torch import itertools torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def generate_input(shape, dtype=torch.double, device=cpu): eigvals = torch.rand(shape[:-1], dtype=dtype, device=device) eigvecs = torch.rand(shape, dtype=dtype, device=device) input = (eigvecs * eigvals.unsqueeze(-2)) @ eigvecs.inverse() input.requires_grad_(True) tau = torch.rand(*shape[:-1], dtype=dtype, device=device) tau.requires_grad_(True) return input, tau def run_test(shape, device, dtype): print(f"shape: {shape}, device: {device}, dtype: {dtype}") a, tau = generate_input(shape, dtype=dtype, device=device) prod = torch.linalg.householder_product(a, tau) ones_prod = torch.ones_like(prod) command = "torch.autograd.backward((prod,), (ones_prod), retain_graph=True)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() dtypes = [torch.float, torch.double] devices = [cpu, cuda] #devices = [cuda] sizes = [ (10, 10), (1000, 10, 10), (100, 100), (1000, 100, 100), (1000, 1000), (10, 1000, 1000), ] for device, dtype, size in itertools.product(devices, dtypes, sizes): run_test(size, device, dtype) ``` </details> <details> <summary>This PR, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.33 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 1.52 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (100, 100), device: cuda, dtype: torch.float32 10.8 ms ± 9.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 127 ms ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 1000), device: cuda, dtype: torch.float32 151 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 981 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.64 ms ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 298 ms ± 463 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float32 15.4 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 5.36 s ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float32 1.64 s ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 15.7 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.14 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 2.22 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cuda, dtype: torch.float64 10.6 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 287 ms ± 84.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 236 ms ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 1.88 s ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.58 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 308 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float64 79 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 54.2 s ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 31.5 s ± 698 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 4min 45s ± 2.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 476 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 5.1 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 4.38 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 1.55 s ± 6.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 745 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 5.44 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 387 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 12.3 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 39.4 ms ± 80.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 29.1 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 9.42 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 1min 50s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 381 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 6.19 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 4.6 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 2.59 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 1.07 s ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 14.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 395 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 14.6 ms ± 9.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 45.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 33.1 s ± 69.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 19.3 s ± 80.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 3min 30s ± 1.29 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63880 Reviewed By: soulitzer Differential Revision: D30639435 Pulled By: anjali411 fbshipit-source-id: 127789943ae56e2f1dd03e0fe76ef7b6db86bcf0	2021-10-15 09:54:30 -07:00
Peter Bell	5f45927d15	Autograd: Delay warnings until the end of backward execution (#66235 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50209 This adds a new warning handler that stores all warnings in a shared queue, which can be "replayed" at a later time and, crucially, on another thread. Then, I use this inside the autograd engine to ensure that warnings are processed by the handler registered on the main thread. For testing, I also add an operator that always warns in the backward pass and test that the warning is a normal Python warning. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235 Reviewed By: ejguan Differential Revision: D31505413 Pulled By: albanD fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9	2021-10-13 15:38:04 -07:00
Nikita Vedeneev	1b40daac74	pinv: forward/backward AD which is Frechet-defined in a rank-preserving neighborhood. (#66092 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65911. Also enables complex support/tests for `linalg_pinv` in OpInfo. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66092 Reviewed By: ejguan Differential Revision: D31503072 Pulled By: albanD fbshipit-source-id: 52018e826826ae62beaad76becb5edf880be253f	2021-10-11 08:33:28 -07:00
Nikita Vedeneev	1d586e78c6	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: dagitses Differential Revision: D31431847 Pulled By: albanD fbshipit-source-id: 0e343e0d9da3c3d2051fca215fad289d77275251	2021-10-06 16:04:22 -07:00
soulitzer	4cdfceddd2	[Reland] Avoid saving self for `softmax` and `log_softmax` (#66018 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/65242 The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018 Reviewed By: albanD Differential Revision: D31348822 Pulled By: soulitzer fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a	2021-10-03 21:35:01 -07:00
Michael Suo	9ae63bd87c	Revert D31238123: [pytorch][PR] Avoid saving self for`softmax` and `log_softmax` Test Plan: revert-hammer Differential Revision: D31238123 (`fb412bdd80`) Original commit changeset: afd319d3676d fbshipit-source-id: b7980d653a4b8322a225f1dd08c2857ecbe5bc94	2021-09-30 11:34:14 -07:00
soulitzer	fb412bdd80	Avoid saving self for`softmax` and `log_softmax` (#65242 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64000 - updates double backward formula to compute grad wrt output instead of self - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242 Reviewed By: albanD Differential Revision: D31238123 Pulled By: soulitzer fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4	2021-09-29 18:16:12 -07:00
Mike Ruberry	0a0564a347	Revert D31206837: [pytorch][PR] `*_solve` methods: implements forward AD Test Plan: revert-hammer Differential Revision: D31206837 (`26e31f76b0`) Original commit changeset: 040beda97442 fbshipit-source-id: f28091327357af9f54f367eda6606240924b93ac	2021-09-28 23:31:16 -07:00
Nikita Vedeneev	26e31f76b0	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: gchanan Differential Revision: D31206837 Pulled By: albanD fbshipit-source-id: 040beda97442e7a88a9df9abc7bb18313ce55bc3	2021-09-28 06:51:32 -07:00
Ivan Yashchuk	0aef44cb3d	Add forward AD for torch.linalg.eigh (#62163 ) Summary: This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass. For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163 Reviewed By: jbschlosser Differential Revision: D30903988 Pulled By: albanD fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823	2021-09-13 21:15:38 -07:00
Nikita Vedeneev	88fff22023	`torch.lu`: forward AD support (#64742 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742 Reviewed By: H-Huang Differential Revision: D30841227 Pulled By: albanD fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a	2021-09-10 07:19:11 -07:00
Nikita Vedeneev	dc53546655	`torch.lu_solve`: forward AD support (#64646 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64646 Reviewed By: VitalyFedyunin Differential Revision: D30807898 Pulled By: albanD fbshipit-source-id: 1f943c22357dd1b3662cfe0d2a26af68e3a2df4c	2021-09-09 08:58:00 -07:00
Ivan Yashchuk	dd8f6ac597	Add forward mode differentiation for torch.linalg.cholesky and transpose (#62159 ) Summary: This PR adds forward mode differentiation for `torch.linalg.cholesky`, `torch.linalg.cholesky_ex`, and `transpose` functions. Complex tests for Cholesky fail because for some reason the gradcheck sends matrices full of zeros to `cholesky_jvp` function. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62159 Reviewed By: mrshenli Differential Revision: D30776829 Pulled By: albanD fbshipit-source-id: 32e5539ed6423eed8c18cce16271330ab0ea8d5e	2021-09-08 09:44:30 -07:00
soulitzer	92a154aa29	Move variabletype functions around (#63330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63330 - This is in preparation for templated/boxed autograd-not-implemented fallback - Make sure VariableTypeUtils does not depend on generated code - Lift `isFwGradDefined` into `autograd/functions/utils.cpp` so it's available to mobile builds - Removes `using namespace at` from VariableTypeUtils, previously we needed this for Templated version, but now its not strictly necessary but still a good change to avoid name conflicts if this header is included elsewhere in the future. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518573 Pulled By: soulitzer fbshipit-source-id: a0fb904baafc9713de609fffec4b813f6cfcc000	2021-08-26 16:02:39 -07:00
Nikita Vedeneev	dbcfd7739f	Make `torch.lu` differentiable for wide/tall inputs + jit (#61564 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61564 Reviewed By: astaff Differential Revision: D30338136 Pulled By: mruberry fbshipit-source-id: f01436fc90980544cdfa270feee16bb3dda21b93	2021-08-16 11:40:57 -07:00
Nikita Vedeneev	741accb11e	Implements backward for `torch.lu_solve` (#61681 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61681 Reviewed By: ngimel Differential Revision: D30063116 Pulled By: mruberry fbshipit-source-id: e095b0cadfb7c8b37a7ef91bae5b5dc170d8ef1c	2021-08-12 21:17:11 -07:00
jiej	ed0b8a3e83	LayerNorm Support in autodiff: (#50467 ) Summary: 1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward 2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule 3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467 Reviewed By: eellison Differential Revision: D30232864 Pulled By: jansel fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a Co-authored-by: Ryan Spring <rspring@nvidia.com> Co-authored-by: Jie <jiej@nvidia.com>	2021-08-12 11:05:53 -07:00
Nikita Shulga	30214aef2d	[BE] irangefy (#62928 ) Summary: Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928 Reviewed By: driazati Differential Revision: D30171904 Pulled By: malfet fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae	2021-08-07 13:34:13 -07:00
Nikita Vedeneev	8e35df0bf3	det_backward: return svd path for double backward (so that all ci tests pass) (#62570 ) Summary: Potentially fixes https://github.com/pytorch/pytorch/issues/62327 and fixes https://github.com/pytorch/pytorch/issues/62328. This PR replaces the double backward of det from eig to svd. The latter is slower but should be more stable. CC anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62570 Reviewed By: pbelevich Differential Revision: D30072876 Pulled By: anjali411 fbshipit-source-id: c91b507dbfd6a3ec47dc6d0b0dcfa5f8c8228c30	2021-08-04 13:43:51 -07:00
Nikita Vedeneev	d7ddae8e4f	det_backward: correct, more robust and with complex support [clone] (#61905 ) Summary: Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905 Reviewed By: albanD Differential Revision: D29937920 Pulled By: anjali411 fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f	2021-07-27 10:08:26 -07:00
Ivan Yashchuk	3cd12448b4	Add forward mode differentiation for inverse and solve (#62160 ) Summary: This PR adds forward mode differentiation for `torch.linalg.inv`, `torch.linalg.inv_ex`, and `torch.linalg.solve` functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62160 Reviewed By: mruberry Differential Revision: D29917213 Pulled By: albanD fbshipit-source-id: b08bbc830f77f342cc7ca5b823d7ea4380f2aaa8	2021-07-27 07:51:22 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Mike Ruberry	1ce3281a6d	Revert D29361872: [pytorch][PR] det_backward: more robust and with complex support Test Plan: revert-hammer Differential Revision: D29361872 (`fce85480b9`) Original commit changeset: b1f0fec7e3ac fbshipit-source-id: feffa74ad65b0b294e0a9b0ee72d245393421f70	2021-07-15 15:26:00 -07:00
Nikita Vedeneev	fce85480b9	det_backward: more robust and with complex support (#58195 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58195 Reviewed By: albanD Differential Revision: D29361872 Pulled By: anjali411 fbshipit-source-id: b1f0fec7e3ac52acd1481bcc878cc0c1d07c1852	2021-07-15 11:04:42 -07:00
Anjali Chourdia	30e48bbeae	Add neg bit (#56058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058 User facing changes: 1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`) 2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag. Non user facing changes: 1. Added a new Negative dispatch key and a backend fallback to handle it 2. Updated copy kernel to handle negative bit 3. Merged conjugate and negative bit fallback kernel 4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987) Testing: 1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view). 2. Added a new test class containing `test_conj_view`, `test_neg_view`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29636403 fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63	2021-07-13 13:50:42 -07:00
albanD	056a8e0d5c	Remove un-used parameter in _trilinear backward (#60673 ) Summary: This argument is only important for speed and memory usage. So it is ok to ignore it during the backward. As discussed, we might want to change this to speed up backward in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673 Reviewed By: soulitzer Differential Revision: D29370125 Pulled By: albanD fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71	2021-06-25 17:47:10 -07:00
lezcano	dfc8247d33	Faster cumsum and cumprod backwards (#60642 ) Summary: Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result. Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`. <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(ndims, prod_dim, dim, num_threads, device): size = [500]ndims size[dim] = prod_dim x = torch.rand(size, device=device, requires_grad=True) # Make sure there are no zeros as the formula for the backward # that we are testing is for when the backward has no zeros with torch.no_grad(): x.add_(1e-3) grad = torch.ones_like(x) timer = Timer( "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])", globals={"x": x, "dim": dim, "grad": grad}, label=f"Cumprod + Backwards {device}", description=f"dim: {dim}", sub_label=f"prod_dim: {prod_dim}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): ndims = 3 dims = range(ndims) prod_dims = [10, 100, 500] for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) for num_threads in threads: yield ndims, prod_dim, dim, num_threads, device compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary> Benchmark PR </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 14 \| 12 prod_dim: 100 \| 260 \| 270 \| 260 prod_dim: 500 \| 1400 \| 1550 \| 1360 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 166 \| 167 prod_dim: 500 \| 902 \| 950 \| 858 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 3 \| 3 prod_dim: 100 \| 110 \| 108 \| 106 prod_dim: 500 \| 576 \| 590 \| 547 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 562 \| 566 \| 1075 prod_dim: 100 \| 5388 \| 5394 \| 6697 prod_dim: 500 \| 28170 \| 27580 \| 30740 Times are in microseconds (us). ``` </details> <details> <summary> Benchmark master </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 13 \| 12 prod_dim: 100 \| 270 \| 270 \| 256 prod_dim: 500 \| 1500 \| 1590 \| 1300 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 170 \| 164 prod_dim: 500 \| 911 \| 940 \| 840 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 4 \| 4 prod_dim: 100 \| 111 \| 109 \| 105 prod_dim: 500 \| 570 \| 590 \| 536 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 616 \| 597 \| 1109 prod_dim: 100 \| 5976 \| 5723 \| 7017 prod_dim: 500 \| 31110 \| 29160 \| 32320 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642 Reviewed By: ngimel Differential Revision: D29366368 Pulled By: albanD fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde	2021-06-25 12:44:12 -07:00
Richard Barnes	b162d95e46	Fix a number of lint perf and safety issues in torch (#59897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29037012 fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb	2021-06-15 13:14:51 -07:00
albanD	a524ee00ca	Forward AD formulas batch 3 (#59711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59711 This is the exact same PR as before. This was reverted before the PR below was faulty. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28995762 Pulled By: albanD fbshipit-source-id: 65940ad93bced9b5f97106709d603d1cd7260812	2021-06-10 19:30:02 -07:00
Richard Barnes	e3d75b8475	irange for PyTorch sans jit (#59481 ) Summary: Switches most of the simple for loops outside of `jit` directories to use `c10::irange`. Generated with D28874212. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28909681 fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85	2021-06-09 14:46:11 -07:00
Ivan Yashchuk	90303157ab	Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554 This PR enables complex numbers supports for matrix-matrix multiplication of COO sparse matrices. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968309 Pulled By: anjali411 fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca	2021-06-08 19:34:41 -07:00
Jane Xu	14f4c8d333	Revert D28387762: Forward AD formulas batch 3 Test Plan: revert-hammer Differential Revision: D28387762 (`58348bea06`) Original commit changeset: fc395c92af7e fbshipit-source-id: 608d704ff5bc560714790a576eaf9ed7f1f44e13	2021-06-08 15:19:26 -07:00
Natalia Gimelshein	9d533ef3ac	Renorm fix (#59615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a	2021-06-08 14:59:24 -07:00
Victor Quach	c268eefe96	Use TORCH_CHECK_NOT_IMPLEMENTED for AD not implemented (#59482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59482 Fixes #53398 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28933809 fbshipit-source-id: 53387ec9690fc235b0622b50800feced706ea1ee	2021-06-08 14:02:04 -07:00
albanD	58348bea06	Forward AD formulas batch 3 (#58094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58094 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387762 Pulled By: albanD fbshipit-source-id: fc395c92af7ebb5ebae95c40f6c76273047f4097	2021-06-08 13:00:21 -07:00
Nikita Vedeneev	a30b359590	fix double backward for `binary_cross_entropy` loss function when `reduction=sum`. (#59479 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59477. ```python In [1]: import torch In [2]: x = torch.rand(3, 3, dtype=torch.double, requires_grad=True) In [3]: y = torch.rand(3, 3, dtype=torch.double) In [4]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y]) Out[4]: True In [5]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='mean'), [x, y]) Out[5]: True In [6]: torch.autograd.gradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y]) Out[6]: True ``` More comprehensive testing could be added in https://github.com/pytorch/pytorch/pull/59447 where explicit `gradcheck` and `gradgradcheck` tests are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59479 Reviewed By: ejguan Differential Revision: D28934354 Pulled By: albanD fbshipit-source-id: 12ce68e3c5c499b2531f7cdba3c22548d67e07e9	2021-06-07 14:14:08 -07:00
Nikita Vedeneev	c51abf8fca	Make `binary_cross_entropy` differentiable wrt `target` (#59447 ) Summary: As per title. Resolves https://github.com/pytorch/pytorch/issues/56683. `gradgradcheck` will fail once `target.requires_grad() == True` because of the limitations of the current double backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59447 Reviewed By: agolynski Differential Revision: D28910140 Pulled By: albanD fbshipit-source-id: 20934880eb4d22bec34446a6d1be0a38ef95edc7	2021-06-07 09:20:17 -07:00
anjali411	3607478ecd	Conjugate View (#54987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987 Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype: Here's a summary of the changes in this PR: This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose). 1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor. 2. NEW API: a) `.conj()` -- now returning a view. b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory. c) `.conj_physical_()`, and `out=` variant d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0. e) `.resolve_conj_()` in-place version of (d) f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors. g) `view_as_real` -- existing function, but now errors out on conjugated tensors. 3. Conjugate Fallback a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor. b) This fallback is well equipped to handle the following cases: - functional operation e.g., `torch.sin(input)` - Mutable inputs and in-place operations e.g., `tensor.add_(2)` - out-of-place operation e.g., `torch.sin(input, out=out)` - Tensorlist input args - NOTE: Meta tensors don't work with conjugate fallback. 4. Autograd a) `resolve_conj()` is an identity function w.r.t. autograd b) Everything else works as expected. 5. Testing: a) All method_tests run with conjugate view tensors. b) OpInfo tests that run with conjugate views - test_variant_consistency_eager/jit - gradcheck, gradgradcheck - test_conj_views (that only run for `torch.cfloat` dtype) NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit. Follow up work: 1. conjugate view RFC 2. Add neg bit to re-enable view operation on conjugated tensors 3. Update linalg functions to call into specialized functions that fast path with the hermitian operation. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28227315 Pulled By: anjali411 fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f	2021-06-04 14:12:41 -07:00
Peter Bell	6408cbd918	Migrate renorm to ATen (CPU and CUDA) (#59250 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/59108, closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616 This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot. #### Benchmarks (CPU): \| Shape \| Dim \| Before \| After (1 thread) \| After (8 threads) \| \|:------------:\|:---:\|--------:\|-----------------:\|------------------:\| \| (10, 10, 10) \| 0 \| 11.6 us \| 4.2 us \| 4.2 us \| \| \| 1 \| 14.3 us \| 5.2 us \| 5.2 us \| \| \| 2 \| 12.7 us \| 4.6 us \| 4.6 us \| \| (50, 50, 50) \| 0 \| 330 us \| 120 us \| 24.4 us \| \| \| 1 \| 350 us \| 135 us \| 28.2 us \| \| \| 2 \| 417 us \| 130 us \| 24.4 us \| #### Benchmarks (CUDA) \| Shape \| Dim \| Before \| After \| \|:------------:\|:---:\|--------:\|--------:\| \| (10, 10, 10) \| 0 \| 12.5 us \| 12.1 us \| \| \| 1 \| 13.1 us \| 12.2 us \| \| \| 2 \| 13.1 us \| 11.8 us \| \| (50, 50, 50) \| 0 \| 33.7 us \| 11.6 us \| \| \| 1 \| 36.5 us \| 15.8 us \| \| \| 2 \| 41.1 us \| 15 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59250 Reviewed By: mruberry Differential Revision: D28820359 Pulled By: ngimel fbshipit-source-id: 572486adabac8135d52a9b8700f9d145c2a4ed45	2021-06-03 11:43:27 -07:00
albanD	d095ec75a1	Forward AD formulas batch 2 (#57863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57863 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387763 Pulled By: albanD fbshipit-source-id: e1b60ab728bb05b9e3323ee0dc7e401aaf5b8817	2021-06-03 07:33:04 -07:00
Richard Barnes	3979cb0656	irange for size_t (#55320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27572577 fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03	2021-06-03 01:04:13 -07:00
kshitij12345	5c18994674	[special] Add `i1` and `i1e` (#56352 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * [x] Check Docs https://12721710-65600975-gh.circle-artifacts.com/0/docs/special.html * [x] Investigate fp32 failure on CI?! (Fails on clang. Reproduced locally with clang-11) * [ ] Kernel vs Composite? * [x] Autograd for `i0e` for zero? Pull Request resolved: https://github.com/pytorch/pytorch/pull/56352 Reviewed By: anjali411 Differential Revision: D28700888 Pulled By: mruberry fbshipit-source-id: 91a3cbb94f5b8a3b063589ec38179848c11def83	2021-05-29 20:55:23 -07:00
Natalia Gimelshein	355b24438c	make vector_norm backward call norm_backward (#59135 ) Summary: Per title. Remove duplicated code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59135 Reviewed By: mruberry Differential Revision: D28775716 Pulled By: ngimel fbshipit-source-id: 50dc77590db15976453fc41c3657a77198749849	2021-05-29 12:14:46 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Kurt Mohler	fe8e5eb260	Change native functions to take `c10::string_view` args instead of `std::string` (#57680 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57680 Reviewed By: malfet Differential Revision: D28511799 Pulled By: ezyang fbshipit-source-id: 43142f994d048b28b3279ccdb7a28cbaa3190973	2021-05-20 18:15:45 -07:00
lezcano	1f3807ce5d	More stable and faster implementation of the gradient of torch.linalg.eigh (#55049 ) Summary: This PR: - Renames symeig_backward to eigh_backward - Improves the stability and speed of the gradient computation by doing `V(A + B)Vh` instead of `VAVh + VBVh` when both the gradients of the eigenvectors and eigenvalues are defined. - Updates the comments of the function to make them arguably clearer Pull Request resolved: https://github.com/pytorch/pytorch/pull/55049 Reviewed By: ngimel Differential Revision: D28396823 Pulled By: mruberry fbshipit-source-id: a144482bfb1054e281b58ae1fe3cf1015bab505d	2021-05-13 17:17:35 -07:00
lezcano	9e156b01e5	linalg.eig backwards and linalg.eigvals (#57276 ) Summary: This PR adds backwards support for `eig` and `eigvals`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57276 Reviewed By: ngimel Differential Revision: D28405056 Pulled By: mruberry fbshipit-source-id: 27ef03f139f44d75f4d319b0f3e77e99eea9bb01	2021-05-13 09:42:13 -07:00
lezcano	db13119fc4	Deprecate symeig (#57732 ) Summary: This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732 Reviewed By: bdhirsh Differential Revision: D28328189 Pulled By: mruberry fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474	2021-05-12 02:21:35 -07:00
Nikita Vedeneev	c790fd2bf8	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: albanD Differential Revision: D28355725 Pulled By: mruberry fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78	2021-05-11 22:53:21 -07:00
Ivan Yashchuk	aaca12bcc2	Deprecate in docs torch.svd and change svd -> linalg_svd (#57981 ) Summary: This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549). In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772). Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981 Reviewed By: ngimel Differential Revision: D28345558 Pulled By: mruberry fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213	2021-05-11 18:04:10 -07:00
lezcano	415ae54c31	Deprecate torch.eig (#57727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57727 Reviewed By: bdhirsh Differential Revision: D28317984 Pulled By: mruberry fbshipit-source-id: fa1aa1b78fd3611ac208bca93e2b745a1bac41f1	2021-05-10 23:31:02 -07:00
Mike Ruberry	3c87fe9b14	Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making `torch.lu_solve` differentiable. Test Plan: revert-hammer Differential Revision: D28117714 (`5c67d8dfd3`) Original commit changeset: befd33db12ec fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81	2021-05-09 23:20:06 -07:00
Nikita Vedeneev	5c67d8dfd3	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: astaff Differential Revision: D28117714 Pulled By: mruberry fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4	2021-05-09 19:12:56 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Peter Bell	2043093217	Add correction parameter to std/var (#50903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50903 First part of #50010. Also fixes #51127. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27911345 Pulled By: mruberry fbshipit-source-id: 7138fddc935802918ab9ff19f4bc1b9f4d745d41	2021-05-07 14:40:28 -07:00
Alexander	6f2c0cccdd	New: sparse complex: add linear algebra, addmm (#57129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129 Test Plan: Imported from OSS Reviewed By: janeyx99, astaff Differential Revision: D28112701 Pulled By: ezyang fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59	2021-05-07 05:37:48 -07:00
Heitor Schueroff	1f1e2dab6b	Remove optional type for ord parameter in vector_norm (#57662 ) Summary: As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215 Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None` ### BC Breaking Note This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662 Reviewed By: albanD, mruberry Differential Revision: D28228870 Pulled By: heitorschueroff fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13	2021-05-06 17:53:25 -07:00
Peter Bell	33eea146ee	torch.clamp with tensor min and max (#52695 ) Summary: Fixes gh-2793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52695 Reviewed By: mruberry Differential Revision: D27395977 Pulled By: ezyang fbshipit-source-id: f86aa240feb034d42e4c45447e72218f6a773c24	2021-05-03 12:56:16 -07:00
Kevin Rose	ec86f96e91	Fix for derivative of sinc(x) when x is positive but very very small (#56986 ) Summary: Problem arises for sinc'(x) where x != 0, but x ** 2 == 0, which happens for some very small floats. I realized that my solution from https://github.com/pytorch/pytorch/issues/56763 was incomplete when I did a quick implementation using `torch.autograd.Function` and still got a `NaN` from my derivative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56986 Reviewed By: gchanan Differential Revision: D28093507 Pulled By: albanD fbshipit-source-id: 2a30e1065b08c5c60de843a0778dedeb0fb295f4	2021-04-29 11:16:39 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Kevin Rose	5854e93bc9	Fix derivative of sinc at x=0 (#56763 ) Summary: Attempting to fix https://github.com/pytorch/pytorch/issues/56760 The derivative of `sinc(x)` at `x=0` should be special cased to 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56763 Reviewed By: zhangguanheng66 Differential Revision: D27978135 Pulled By: albanD fbshipit-source-id: ede5e734613cf60e720f6bcc7387c3cd9c6ec233	2021-04-26 09:43:42 -07:00
Xiao Wang	7b31ba4708	Fix cudnn ctc loss backward (#56639 ) Summary: Fix cudnn ctc loss backward Fix https://github.com/pytorch/pytorch/issues/49046, which was working in pytorch 1.1 Originally modified in this PR in Oct 2019, https://github.com/pytorch/pytorch/pull/27039/files#diff-25ec2c1108ee03e2167622588ec31d167897ef1cccb12a4cfe77eb98777316daR2383-R2392 According to the original code `90ffab6e37/tools/autograd/derivatives.yaml (L1387-L1388)` and the code after PR `f461184505/tools/autograd/templates/Functions.cpp (L2456-L2465)` This `at::zeros({0}, raw_grad.options())` in line 2460 seems suspicious, and is causing `infer_size` runtime error ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (177) at non-singleton dimension 2 Exception raised from infer_size at ..\aten\src\ATen\ExpandUtils.cpp:24 (most recent call first): ``` I've modified that to `at::zeros_like(raw_grad)`, which looks more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56639 Reviewed By: mruberry Differential Revision: D27987860 Pulled By: ngimel fbshipit-source-id: 5ad65e78d017c26894fb26318a5992b0878d04d5	2021-04-25 22:51:19 -07:00
Brian Hirsh	e8faf69739	fix torch.pow type promotion issue (#54085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54085 Fixes https://github.com/pytorch/pytorch/issues/50121. This fixes two similar issues pointed out with the dtype that `torch.pow` performs its computation. Thanks ngimel for spotting the issues originally (comments [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594624355) and [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594719704))! Before: ``` >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0])) tensor([0]) >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0)) tensor(131072) >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda')) tensor([131072], device='cuda:0') >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda')) tensor(131072, device='cuda:0') ``` After: ``` >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0])) tensor([0]) >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0)) tensor(0) >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda')) tensor([0], device='cuda:0') >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda')) tensor(0, device='cuda:0') ``` In all four cases above, `tensor(0, ...)` is the correct value because the computed "common dtype" among the inputs is expected to be `uint8`. Computing `2 ** 7` in uint8 will then overflow to zero. Finally, we cast the computed output to the output tensor's dtype, which is `int32`. There were two separate issues fixed in this PR: one for cpu and one for cuda: * For CPU, The `pow(Scalar, Tensor)` overload wasn't calling `set_wrapped_number(true)` after wrapping the scalar in a Tensor, which caused the "promoted" scalar to incorrectly participate in type promotion (see the documented behavior [here](`aa8714dfed/c10/core/TensorImpl.h (L590)`)) * For CUDA, the cuda kernels defined in `PowKernel.cu` were using the output's dtype to run the computation, instead of the common dtype. As an aside: The CPU and CUDA kernels actually both use `iter.dtype()` instead of `iter.common_dtype()` to run the computation, which I fixed. The reason that only manifested here for CUDA is because TensorIterator has cpu-specific logic to create temporary outputs with the intermediate dtype (shown [here](`aa8714dfed/aten/src/ATen/TensorIterator.cpp (L349)`)). I'm not sure what the end state is there- I can imagine that being something we're more okay doing for cpu than for cuda, but it also leads to hard-to-track-down inconsistencies between the two like in this case. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27096330 Pulled By: bdhirsh fbshipit-source-id: a7e2909243851625cb3056d1e7abb2383bfe95f2	2021-04-15 08:55:53 -07:00
Richard Barnes	d690973295	irange on int64_t (#55148 ) Summary: Converts loops of the form: ``` for(int64_t VAR=0;VAR<LIMIT;VAR++) ``` to the form ``` for(const auto VAR : c10::irange(LIMIT)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55148 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27447811 fbshipit-source-id: 6311a094ec4a81a0b57383aaee0ba1b1dc2445c4	2021-04-05 16:14:00 -07:00
Peter Bell	2ee02b30b1	Replace rounding_mode="true" with rounding_mode=None (#51988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988 * #51988 Replace rounding_mode="true" with rounding_mode=None Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27561817 Pulled By: mruberry fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368	2021-04-05 14:53:43 -07:00
Antonio Cuni	980d6f2589	torch.linalg.det (#53119 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51652. In particular: - the main implementation is in `torch.linalg.det` now. `torch.det` is just a deprecated alias to it - add a new `OpInfo` for `torch.linalg.det` - remove the old-style tests for `torch.det` (this is similar to what we did for `torch.linalg.slogdet`, see https://github.com/pytorch/pytorch/issues/49194) - added a `out=` argument to `torch.linalg.det`, but not to `torch.det`. It is worth noting that I had to skip few tests: - `TestGradientsCuda::test_fn_gradgrad_linalg_det_cuda_float64`. This is not a regression: the functionality is broken also on master, but the test is not executed properly due to https://github.com/pytorch/pytorch/issues/53361. And the following tests which fails only on ROCm: - `test_variant_consistency_jit_cuda_{float64,float32}` - `test_fn_grad_cuda_float64` I think that the ROCm tests fail because the current linalg.det backward is unstable if the matrix has repeated singular values, see https://github.com/pytorch/pytorch/issues/53364 . (At the moment of writing some CI jobs are still running but I believe the build will be green, since the only difference wrt the last push is the skip of the ROCm tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53119 Reviewed By: H-Huang Differential Revision: D27441999 Pulled By: mruberry fbshipit-source-id: 5eab14c4f0a165e0cf9ec626c3f4bb23359f2a9e	2021-04-05 08:45:27 -07:00
Mike Ruberry	c0ac0fef4e	Revert D27448156: irange for size_t Test Plan: revert-hammer Differential Revision: D27448156 (`041b4431b2`) Original commit changeset: 585da57d4de9 fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365	2021-04-03 19:14:00 -07:00
Richard Barnes	041b4431b2	irange for size_t (#55163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27448156 fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1	2021-04-02 23:22:29 -07:00
Nikita Vedeneev	61b074581c	`torch.prod` backward for complex types. (#48125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53511 torch.det does depend on torch.prod, which in turn depends on several other functions, and they also depend on torch.prod, so there is a circular relationship, hence this PR will enable complex backward support for several functions at once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48125 Reviewed By: pbelevich Differential Revision: D27188589 Pulled By: anjali411 fbshipit-source-id: bbb80f8ecb83a0c3bea2b917627d3cd3b84eb09a	2021-03-19 09:44:08 -07:00
albanD	09b4af2f0f	Remove legacy from optional-related function names (#54101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54101 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117839 Pulled By: albanD fbshipit-source-id: 1f50b06ff9b0be8301f6ea9eca14f73a3a5fa137	2021-03-18 09:29:00 -07:00
albanD	cba8516b52	make internal forwardAD methods on at::Tensor internal (#54099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54099 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117838 Pulled By: albanD fbshipit-source-id: ede96529a4b099dea9cf885d0bf2cb352aa30fa5	2021-03-18 09:27:17 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
lezcano	1f5b9170aa	Faster backwards for cumsum and cumprod (#53711 ) Summary: Provides a faster formula for `cumprod` in the case when the input has zeros. This formula is non-differentiable, so we leave the previous formula for the cases when `at::GradMode::is_enabled()`. This new formula gives up to x10 and x30 speed-ups in CPU and GPU (see the benchmarks below). The `cumsum` backward formula was rewritten so that no copies are necessary. We also removed a double negation in its formula. This gives a significant speed-up in CPU, while being almost as efficient as the formula with copies in GPU. We can see this speed-up when comparing the "No zeros" part of the benchmark. Benchmarks: nb. It is worth noting that the script tests the forward and the backward for `cumprod`, so the speed-ups should be even larger than those announced here. <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, size_prod, zeros, device): print(f"ndims: {ndims}, tensor_size: {size}, size_prod: {size_prod}, zeros: {zeros}, device: {device}") for dim in range(ndims): sizes = ndims * [size] sizes[dim] = size_prod tensor = torch.rand(sizes, device=device) with torch.no_grad(): if zeros: # Set 0.1 of them to zero p_drop = 0.1 mask = torch.full_like(tensor, 1.0 - p_drop) tensor = tensor torch.bernoulli(mask) else: tensor = tensor + 1e-3 tensor.requires_grad_() grad = torch.ones_like(tensor) # We test both forward + backward, meaning that the speed-up is actually greater than reported # That being said, this is more realistic than doing `retain_graph=True` command = "torch.autograd.grad([tensor.cumprod(dim)], [tensor], grad_outputs=[grad])" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for device, zeros in product([cuda, cpu], [True, False]): run_test(3, 300, 10, zeros, device) run_test(3, 300, 100, zeros, device) if device == cuda: run_test(3, 300, 300, zeros, device) ``` </details> <details> <summary>CPU This PR (Some regression small tensors, x4 speed-up large tensors)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 28.2 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 29.8 ms ± 78.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 24.5 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 414 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 428 ms ± 4.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 382 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.11 ms ± 9.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.83 ms ± 3.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.08 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 92.2 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 101 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 87 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA This PR (7-30x speed-up)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 1.46 ms ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.48 ms ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.93 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 10.5 ms ± 914 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.6 ms ± 509 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.7 ms ± 864 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 30.3 ms ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 32.2 ms ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 248 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 252 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 438 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.1 ms ± 193 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.16 ms ± 380 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.59 ms ± 398 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 6.3 ms ± 857 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.39 ms ± 288 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.15 ms ± 233 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 8.27 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.8 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 28.2 ms ± 74.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 1.53 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.95 s ± 4.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.86 s ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.42 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.25 ms ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.34 ms ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 104 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 117 ms ± 99.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 94.8 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 912 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.05 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.74 ms ± 381 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 71.3 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 9.82 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 119 ms ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 646 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 776 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 917 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 301 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 308 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 592 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.61 ms ± 375 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.68 ms ± 524 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.38 ms ± 736 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 7.89 ms ± 848 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.03 ms ± 517 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.24 ms ± 405 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53711 Reviewed By: jbschlosser Differential Revision: D27059662 Pulled By: anjali411 fbshipit-source-id: be610d5590c0199b4412dff66fac47666faaff9d	2021-03-16 13:57:43 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Nikita Vedeneev	8f15a2f052	eig_backward: faster and with complex support (#52875 ) Summary: As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods. Fixes https://github.com/pytorch/pytorch/issues/51621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875 Reviewed By: mrshenli Differential Revision: D26768653 Pulled By: anjali411 fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655	2021-03-10 11:33:30 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
kshitij12345	748285ccd7	[complex] add autograd support for torch.polar (#52488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52488 Reviewed By: zou3519 Differential Revision: D26711841 Pulled By: anjali411 fbshipit-source-id: b8538fb8cb44456b832e4f993cf41954b3ddd2e8	2021-03-01 21:57:35 -08:00
Richard Barnes	fa325d7c9f	Use `sum_integers` and `multiply_integers` (#51146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51146 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25903430 fbshipit-source-id: 329c14018c9e5192864eed88a8ed0a5068ff1c69	2021-02-10 18:05:45 -08:00
Alexander	0c313564af	Backward through sparse_coo_tensor (#50361 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49683 This PR solves Backward through sparse_coo_tensor bug by implementing a `sparse_mask_helper` function for n-dimensional sparse tensor for CPU and CUDA which is used to reimplement `sparse_constructor_values_backward` function. This `sparse_mask` function was implemented before for backward sparse-sparse matmul. However, the algorithm is little different because in this case it should be applyable not only for matrices but for n-dimensional tensors. Thankfully it was not quite hard to extend and now both share the same code base. Note that no new tests are required because now the backward for sparse-sparse matmul now uses the new `sparse_mask_helper`. ngimel, mruberry - kindly review this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50361 Reviewed By: zhangguanheng66 Differential Revision: D26270483 Pulled By: ngimel fbshipit-source-id: ee4bda49ff86e769342674b64d3c4bc34eae38ef	2021-02-06 23:15:54 -08:00
Peter Bell	b150f150ba	Add division overload with rounding_mode selection (#51706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280 As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46	2021-02-04 13:08:36 -08:00

1 2 3 4 5

214 Commits