Commit Graph

184 Commits

Author SHA1 Message Date
Nikita Shulga
c9efb58223 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
2022-01-31 09:38:13 -08:00
Ryan Spring
3a53b3e94f Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
2022-01-31 09:00:32 -08:00
Joel Schlosser
e9fb2d1db1 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
2022-01-28 10:32:14 -08:00
Ryan Spring
4713dd9cca Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
2022-01-28 08:55:48 -08:00
kshitij12345
ba0407a86e index_backward: use out-of-place index_put if any input is subclass (#71779)
Summary:
Reference: https://github.com/pytorch/functorch/issues/393

Context :

The derivative of `__getitem__`/`index` is
f5a71ec2d6/tools/autograd/derivatives.yaml (L733-L734)

where `index_backward` is defined as
f5a71ec2d6/torch/csrc/autograd/FunctionsManual.cpp (L3892-L3894)

Problem arises when `grad` is not BatchedTensor but one of the other input is. In that case, `grad.new_zeros` returns an unbatched tensor and call to the inplace `_index_put_impl_` errors as it expects `zeros_like_self` to be Batched.

To avoid this, we dispatch to out-of-place `index_put` if any of the input tensor is subclassed otherwise we dispatch to the inplace `_index_put_impl_`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71779

Reviewed By: albanD

Differential Revision: D33790596

Pulled By: zou3519

fbshipit-source-id: 9d6d81b758740cab7b3db9b905f1e8053f82b835
2022-01-28 08:15:43 -08:00
soulitzer
73fd3e021c Fix forward AD for cudnn batch norm (#71901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71901

We didn't catch this initially because CuDNN is not being tested on CI.

The following tests fail on master (if we build with CuDNN), but pass with this PR:
- `test_forward_mode_AD_nn_functional_batch_norm_cuda_float64`
- `test_forward_mode_AD_nn_functional_instance_norm_cuda_float64`

I don't think it is documented anywhere, but from the tests passing now I'm going to guess `result1` and `result2` return `mean` and `invstd` respectively. Previously, I thought mean and variance were returned because the variables were named `saved_mean` and `saved_var`.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33818652

Pulled By: soulitzer

fbshipit-source-id: ecee760f5aec620dc70f57de4fb3573c8f2f5f31
2022-01-27 15:52:47 -08:00
lezcano
391319ed8f Implement forward AD for linalg.svd and improve svd_backward (#70253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70253

I included a derivation of the formula in the complex case, as it is
particularly tricky. As far as I know, this is the first time this formula
is derived in the literature.

I also implemented a more efficient and more accurate version of svd_backward.
More importantly, I also added a lax check in the complex case making sure the loss
function just depends on the subspaces spanned by the pairs of singular
vectors, and not their joint phase.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33751982

Pulled By: mruberry

fbshipit-source-id: c2a4a92a921a732357e99c01ccb563813b1af512
2022-01-27 10:37:08 -08:00
lezcano
a1860bd567 Rewrite svd and linalg.svd as structured kernels (#69827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827

In general, the current pattern allows for implementing optimisations
for all the backends in a common place (see for example the optimisation
for empty matrices).

After this PR, `torch.svd` is implemented in terms of `linalg.svd` and
`linalg.svdvals`, as expected. This makes it differentiable in the case
when `compute_uv=False`, although this is not particularly important, as
`torch.svd` will eventually be deprecated.

This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj
in the cases when `full_matrices=False` or `compute_uv=False`.

The memory for auxiliary `U` and `V` in the cases above, needed for some
cuSOLVER routines is allocated raw allocators rather than through fully
fledged tensors, as it's just a blob of memory the algorithm requests.
As the code is better structured now, it was easier to see that `U` and
`Vh` needn't be allocated when calling `svd_cusolver_gesvd`.

Now `linalg.svdvals` work as expected wrt the `out=` parameter.
Note that in the test `test_svd_memory_allocation` we were
passing a tensor of the wrong size and dtype and the test seemed to
pass...

This PR also changes the backward formula to avoid saving the input
matrix, as it's not necessary. In a follow up PR, I will clean the
backward formula and make it more numerically stable and efficient.

This PR also does a number of memory optimisations here and there, and fixes
the call to cusolver_gesvd, which were incorrect for m <= n. To test
this path, I compiled the code with a flag to unconditionally execute
the `if (!gesvdj_convergence_check.empty())` branch, and all the tests
passed.

I also took this chance to simplify the tests for these functions in
`test_linalg.py`, as we had lots of tests that were testing some
functionality that is already currently tested in the corresponding
OpInfos. I used xwang233's feature to test both MAGMA and CUDA
backends. This is particularly good for SVD, as cuSOLVER is always
chosen over MAGMA when available, so testing MAGMA otherwise would be
tricky.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33751983

Pulled By: mruberry

fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2
2022-01-27 10:35:47 -08:00
Mikayla Gawarecki
ddcddac726 Add new reduce options and autograd support for scatter_reduce (#71788)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71788

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33778525

Pulled By: cpuhrsch

fbshipit-source-id: 47b8544e29df3075bc6ede894c59499a7ffec876
2022-01-27 09:34:01 -08:00
soulitzer
60765438e8 Add forward AD formulas for some losses (#71026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71026

...and fmod

Testing:
- L1Loss: new module tests (linear in the real case only)
- SmoothL1Loss: new module tests
- MSELoss: tested - OpInfo + new module tests
- huberloss: tested - OpInfo + new module tests
- multi-margin-loss: new module tests
- kl-div: OpInfo + new module tests
- fmod: OpInfo

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33485661

Pulled By: soulitzer

fbshipit-source-id: 542ef5148183b9f574d06b2e2e345d0d889537b7
2022-01-26 08:29:46 -08:00
lezcano
97585ae1e7 Simplify forward / backward AD for linalg.eigh and add checks (#70528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70528

This PR adds checks for the backward of `linalg.eigh`, similar to those
deduced in https://github.com/pytorch/pytorch/pull/70253

It also makes its the implementation parallel that of the (fwd/bwd) derivative of
`torch.linalg.eig` and it makes most OpInfo tests pass.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33530149

Pulled By: albanD

fbshipit-source-id: 1f368b8d450d4e9e8ae74d3881c78513c27eb956
2022-01-12 08:35:52 -08:00
lezcano
061be8d600 Correct forward AD for linalg.eig and add checks (#70527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70527

This PR adds checks for the backward of `linalg.eig`, similar to those
deduced in https://github.com/pytorch/pytorch/pull/70253

It also modifies the function so that it does not save the input matrix,
as it's not necessary.

It also corrects the forward AD formula for it to be correct. Now all
the tests pass for `linalg.eig` and `linalg.eigvals`.

It also updates the docs to reflect better what's going on here.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33530148

Pulled By: albanD

fbshipit-source-id: 984521a04f81ecb28ac1c4402b0243c63dd6959d
2022-01-12 08:30:55 -08:00
soulitzer
78994d13c0 Add forward AD formulas for {batch,layer,group}_norm (#70355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70355

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33405362

Pulled By: soulitzer

fbshipit-source-id: 55a92e88a04e7b15a0a223025d66c14f7db2a190
2022-01-10 13:52:16 -08:00
soulitzer
3051aabd0e Add forward AD formulas for convolution and some others (#69956)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69956

Test Plan: Imported from OSS

Reviewed By: albanD, bdhirsh

Differential Revision: D33235974

Pulled By: soulitzer

fbshipit-source-id: ea60d687edc5d62d92f3fd3cb6640421d32c908c
2022-01-06 08:39:51 -08:00
Amir Khojaste
748790588c Upgrading the loop to use irange (#70326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326

See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.

Test Plan: buck run //caffe2/torch/fb/sparsenn:test

Reviewed By: r-barnes

Differential Revision: D33243400

fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
2022-01-06 07:06:53 -08:00
lezcano
a35b4b49d2 Add linalg.lu_factor (#66933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933

This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.

This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.

We add a test and an OpInfo for the new function.

This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.

Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32834069

Pulled By: mruberry

fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb
2022-01-05 20:32:12 -08:00
Richard Zou
29f1ccc8f0 Fix some Composite Compliance problems with binary_cross_entropy backward (#70198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70198

This PR fixes composite compliance problems with:
- binary_cross_entropy's backward formula
- binary_cross_entropy_with_logits's backward formula
- binary_cross_entropy's double backward formula

It does so by adding checks for areAnyTensorSubclassLike.

Test Plan:
- I tested everything with functorch.
- We are going to do https://github.com/pytorch/pytorch/issues/69530 in
the future so we have a way of testing this in core. I need the
binary_cross_entropy ones for something right now and didn't want to
wait until we come up with a solution for #69530.

Reviewed By: Chillee

Differential Revision: D33246995

Pulled By: zou3519

fbshipit-source-id: 310ed3196b937d01b189870b86a6c5f77f9258b4
2021-12-22 07:24:04 -08:00
Joel Schlosser
4d5dd00e61 Remove backward ops for cuDNN transposed convolution (#69902)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69902

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33093795

Pulled By: jbschlosser

fbshipit-source-id: 8b90150bd1996e48c0c888bdab4e95a849d10ef5
2021-12-15 17:48:25 -08:00
Joel Schlosser
3dc3651e0e Remove backward ops for cuDNN convolution (#69901)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69901

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33093796

Pulled By: jbschlosser

fbshipit-source-id: f5beab6f3078144b6c8e5c4c51d69823815a9f99
2021-12-15 17:46:49 -08:00
soulitzer
b399a4d7b9 Add some reduction forward AD formulas (#69661)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020601

Pulled By: soulitzer

fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e
2021-12-14 23:33:43 -08:00
Richard Zou
41e1ab0785 Introduce isTensorSubclassLike; add special cases to backwards formulas (#69534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534

Something is TensorSubclassLike if it is a Tensor subclass or if it has
the same problems as Tensor subclasses. Today that just includes Tensor
Subclasses and meta tensors but may include other things in the future.

Some of our backwards formulas are incompatible with TensorSubclassLike
objects. For example, calling .data_ptr() is a problem because many
TensorSubclassLike objects don't have storage. Another problem is
in-place operations: performing `regular_tensor.inplace_(tensor_subclass)`
is a problem.

This PR adds special cases to the backward formulas for torch.max and
torch.clamp to handle this. The backward formulas for torch.max and
torch.clamp are not dispatcher operations so they cannot be overridden
and we hesitate to make them dispatcher operations for FC/BC concerns
and performance overhead concerns.

Furthermore, the old concept of "is this inplace operation vmap
compatible?" can be subsumed by the general "is this inplace operation
tensor-subclass compatible" question, so I replaced all instances of
isInplaceVmapCompatible and replaced it with the isTensorSubclassLike
checks.

Test Plan
- I tested the changes using functorch.
- It's possible to write a test for these in core (one has to make
a custom tensor subclass and then send it through the operation and then
invoke autograd), but I wanted to push the work to doing some
generic testing for backward formulas
(https://github.com/pytorch/pytorch/issues/69530) instead of doing some
one-off things now.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32967727

Pulled By: zou3519

fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc
2021-12-09 15:03:22 -08:00
lezcano
cafcf599d0 Deprecate torch.triangular_solve (#63570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570

There is a use of `at::triangular_solve_out` in the file
`torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared
to move to `at::linalg_solve_triangular_out`.

**Deprecation note:**

This PR deprecates the `torch.triangular_solve` function in favor of
`torch.linalg.solve_triangular`. An upgrade guide is added to the
documentation for `torch.triangular_solve`.

Note that it DOES NOT remove `torch.triangular_solve`, but
`torch.triangular_solve` will be removed in a future PyTorch release.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32618035

Pulled By: anjali411

fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83
2021-12-02 13:24:55 -08:00
lezcano
f9e69af22e Modify LU_backward and lu_solve_backward to use linalg_solve_triangular (#63569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569

This PR also rewrites `lu_solve_backward` from scratch going from
solving 5 systems of equations to just 2.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32618014

Pulled By: anjali411

fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538
2021-12-01 07:34:38 -08:00
Mike Ruberry
6ae34ea6f8 Revert D32521980: Add linalg.lu_factor
Test Plan: revert-hammer

Differential Revision:
D32521980 (b10929a14a)

Original commit changeset: 26a49ebd87f8

fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82
2021-11-28 17:22:15 -08:00
lezcano
b10929a14a Add linalg.lu_factor (#66933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933

This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.

This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.

We add a test and an OpInfo for the new function.

This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.

Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32521980

Pulled By: mruberry

fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb
2021-11-27 17:52:48 -08:00
lezcano
b46c89d950 Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32588230

Pulled By: mruberry

fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910
2021-11-22 12:41:06 -08:00
soulitzer
7bb401a4c9 Add forward AD support for miscellanous operators (#67820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820

Original PR here: https://github.com/pytorch/pytorch/pull/67040

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32314423

Pulled By: soulitzer

fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b
2021-11-19 14:31:06 -08:00
jiej
ca92111758 Add native_dropout (#63937)
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937

Reviewed By: mruberry

Differential Revision: D32477657

Pulled By: ngimel

fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
2021-11-18 19:41:10 -08:00
Jane Xu
9f4e004abd Revert D32283178: Add linalg.solve_triangular
Test Plan: revert-hammer

Differential Revision:
D32283178 (0706607abc)

Original commit changeset: deb672e6e52f

fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755
2021-11-18 14:46:10 -08:00
lezcano
0706607abc Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: zou3519, JacobSzwejbka

Differential Revision: D32283178

Pulled By: mruberry

fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8
2021-11-18 09:45:51 -08:00
Nikita Vedeneev
857fed1f42 torch.linalg.qr: forward AD support (#67268)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67268

Reviewed By: ngimel

Differential Revision: D31960517

Pulled By: albanD

fbshipit-source-id: bfd1028a8d352f550efb420f9ca609c09f4a7484
2021-11-18 08:11:54 -08:00
Matthias Reis
4c346bd073 Added forward derivatives for neg, diag, inverse, linalg_eig (#67837)
Summary:
Recreated due to CI failures as per comment https://github.com/pytorch/pytorch/pull/67339#issuecomment-959893293

===

See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment

The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf.

As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible.

CC albanD Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67837

Reviewed By: mrshenli

Differential Revision: D32403662

Pulled By: soulitzer

fbshipit-source-id: 529cb93f865ce4cc2e24fa6f672d4234e7abe2b1
2021-11-16 20:32:47 -08:00
Masaki Kozuki
c5e5264be2 Disable TF32 in pinv_jvp and pinv_backward (#67948)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67947

cc ptrblck xwang233 zasdfgbnm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67948

Reviewed By: H-Huang

Differential Revision: D32251934

Pulled By: ngimel

fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec
2021-11-08 22:33:29 -08:00
Natalia Gimelshein
98be5216e2 Revert D32104006: [pytorch][PR] Added forward derivatives for neg, diag, inverse, linalg_eig
Test Plan: revert-hammer

Differential Revision:
D32104006 (88c61b8d06)

Original commit changeset: 1f6ace09ee3e

fbshipit-source-id: f9f950b4177e1fe29b9059f4b5dfb9c8c67f479a
2021-11-03 12:40:00 -07:00
Matthias Reis
88c61b8d06 Added forward derivatives for neg, diag, inverse, linalg_eig (#67339)
Summary:
See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment

The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf.

As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible.

CC albanD Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67339

Reviewed By: ejguan

Differential Revision: D32104006

Pulled By: albanD

fbshipit-source-id: 1f6ace09ee3e737b99520543b30550601809ceb5
2021-11-03 11:21:54 -07:00
Nikita Vedeneev
3c61700cf7 torch.linalg.householder_product: forward AD support (#67043)
Summary:
As per title.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67043

Reviewed By: VitalyFedyunin

Differential Revision: D31897617

Pulled By: albanD

fbshipit-source-id: ef135fe3d9e5b9b2a541c355017f07cdb1309979
2021-10-26 08:34:00 -07:00
lezcano
d3fc3c4ded Implement forward AD for linalg.matrix_exp (#62716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62716

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31823231

Pulled By: mruberry

fbshipit-source-id: 6d19b8988dce773b5716f0522d06febfe167fead
2021-10-21 23:55:36 -07:00
lezcano
0974215c4d Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181

This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.

It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31692896

Pulled By: anjali411

fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
2021-10-18 13:02:25 -07:00
Nikita Vedeneev
7fad47e522 torch.linalg.lstsq: forward/backward AD support (#65054)
Summary:
As per title.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65054

Reviewed By: zou3519

Differential Revision: D31729468

Pulled By: albanD

fbshipit-source-id: ab7df824bc80128e7f64f6444c7a4baa4786c161
2021-10-18 11:28:44 -07:00
Nikita Vedeneev
06c37876b8 torch.linalg.householder_product faster backward (#63880)
Summary:
This PR implements a much more efficient algorithm. This algorithm allows to achieve MASSIVE speed-ups, especially for batched and/or larger double-precision inputs.
Here are some benchmarks:

<details>

<summary>Testing script</summary>

```python
from IPython import get_ipython
import torch
import itertools

torch.manual_seed(13)
#torch.set_num_threads(1)

ipython = get_ipython()

cpu = torch.device('cpu')
cuda = torch.device('cuda')

def generate_input(shape, dtype=torch.double, device=cpu):
    eigvals = torch.rand(*shape[:-1], dtype=dtype, device=device)
    eigvecs = torch.rand(*shape, dtype=dtype, device=device)
    input = (eigvecs * eigvals.unsqueeze(-2)) @ eigvecs.inverse()
    input.requires_grad_(True)
    tau = torch.rand(*shape[:-1], dtype=dtype, device=device)
    tau.requires_grad_(True)
    return input, tau

def run_test(shape, device, dtype):
    print(f"shape: {shape}, device: {device}, dtype: {dtype}")
    a, tau = generate_input(shape, dtype=dtype, device=device)
    prod = torch.linalg.householder_product(a, tau)
    ones_prod = torch.ones_like(prod)

    command = "torch.autograd.backward((prod,), (ones_prod), retain_graph=True)"
    if device == cuda:
        command = command + "; torch.cuda.synchronize()"
    ipython.magic(f"timeit {command}")
    print()

dtypes = [torch.float, torch.double]
devices = [cpu, cuda]
#devices = [cuda]
sizes = [
    (10, 10),
    (1000, 10, 10),
    (100, 100),
    (1000, 100, 100),
    (1000, 1000),
    (10, 1000, 1000),
]

for device, dtype, size in itertools.product(devices, dtypes, sizes):
    run_test(size, device, dtype)

```

</details>

<details>

<summary>This PR, cuda float32</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float32
1.33 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float32
1.52 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (100, 100), device: cuda, dtype: torch.float32
10.8 ms ± 9.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float32
127 ms ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 1000), device: cuda, dtype: torch.float32
151 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float32
981 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

```

</details>

<details>

<summary>Master, cuda float32</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float32
1.64 ms ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float32
298 ms ± 463 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (100, 100), device: cuda, dtype: torch.float32
15.4 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float32
5.36 s ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float32
1.64 s ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float32
15.7 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```

</details>

<details>

<summary>This PR, cuda float64</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float64
1.14 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float64
2.22 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cuda, dtype: torch.float64
10.6 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float64
287 ms ± 84.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float64
236 ms ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float64
1.88 s ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cuda float64</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float64
1.58 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float64
308 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (100, 100), device: cuda, dtype: torch.float64
79 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float64
54.2 s ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float64
31.5 s ± 698 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float64
4min 45s ± 2.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>This PR, cpu float32</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float32
476 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float32
5.1 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float32
4.38 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float32
1.55 s ± 6.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float32
745 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float32
5.44 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cpu float32</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float32
387 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float32
12.3 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float32
39.4 ms ± 80.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float32
29.1 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float32
9.42 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float32
1min 50s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>This PR, cpu float64</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float64
381 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float64
6.19 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float64
4.6 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float64
2.59 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float64
1.07 s ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float64
14.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cpu float64</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float64
395 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float64
14.6 ms ± 9.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float64
45.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float64
33.1 s ± 69.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float64
19.3 s ± 80.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float64
3min 30s ± 1.29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63880

Reviewed By: soulitzer

Differential Revision: D30639435

Pulled By: anjali411

fbshipit-source-id: 127789943ae56e2f1dd03e0fe76ef7b6db86bcf0
2021-10-15 09:54:30 -07:00
Peter Bell
5f45927d15 Autograd: Delay warnings until the end of backward execution (#66235)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50209

This adds a new warning handler that stores all warnings in a shared
queue, which can be "replayed" at a later time and, crucially, on
another thread. Then, I use this inside the autograd engine to ensure
that warnings are processed by the handler registered on the main
thread.

For testing, I also add an operator that always warns in the backward
pass and test that the warning is a normal Python warning.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235

Reviewed By: ejguan

Differential Revision: D31505413

Pulled By: albanD

fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9
2021-10-13 15:38:04 -07:00
Nikita Vedeneev
1b40daac74 pinv: forward/backward AD which is Frechet-defined in a rank-preserving neighborhood. (#66092)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65911. Also enables complex support/tests for `linalg_pinv` in OpInfo.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66092

Reviewed By: ejguan

Differential Revision: D31503072

Pulled By: albanD

fbshipit-source-id: 52018e826826ae62beaad76becb5edf880be253f
2021-10-11 08:33:28 -07:00
Nikita Vedeneev
1d586e78c6 *_solve methods: implements forward AD (#65546)
Summary:
This PR adds forward AD for `*_solve` methods.
Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK,
and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546

Reviewed By: dagitses

Differential Revision: D31431847

Pulled By: albanD

fbshipit-source-id: 0e343e0d9da3c3d2051fca215fad289d77275251
2021-10-06 16:04:22 -07:00
soulitzer
4cdfceddd2 [Reland] Avoid saving self for softmax and log_softmax (#66018)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/65242

The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018

Reviewed By: albanD

Differential Revision: D31348822

Pulled By: soulitzer

fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a
2021-10-03 21:35:01 -07:00
Michael Suo
9ae63bd87c Revert D31238123: [pytorch][PR] Avoid saving self forsoftmax and log_softmax
Test Plan: revert-hammer

Differential Revision:
D31238123 (fb412bdd80)

Original commit changeset: afd319d3676d

fbshipit-source-id: b7980d653a4b8322a225f1dd08c2857ecbe5bc94
2021-09-30 11:34:14 -07:00
soulitzer
fb412bdd80 Avoid saving self forsoftmax and log_softmax (#65242)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64000
 - updates double backward formula to compute grad wrt output instead of self
 - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242

Reviewed By: albanD

Differential Revision: D31238123

Pulled By: soulitzer

fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4
2021-09-29 18:16:12 -07:00
Mike Ruberry
0a0564a347 Revert D31206837: [pytorch][PR] *_solve methods: implements forward AD
Test Plan: revert-hammer

Differential Revision:
D31206837 (26e31f76b0)

Original commit changeset: 040beda97442

fbshipit-source-id: f28091327357af9f54f367eda6606240924b93ac
2021-09-28 23:31:16 -07:00
Nikita Vedeneev
26e31f76b0 *_solve methods: implements forward AD (#65546)
Summary:
This PR adds forward AD for `*_solve` methods.
Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK,
and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546

Reviewed By: gchanan

Differential Revision: D31206837

Pulled By: albanD

fbshipit-source-id: 040beda97442e7a88a9df9abc7bb18313ce55bc3
2021-09-28 06:51:32 -07:00
Ivan Yashchuk
0aef44cb3d Add forward AD for torch.linalg.eigh (#62163)
Summary:
This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass.

For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163

Reviewed By: jbschlosser

Differential Revision: D30903988

Pulled By: albanD

fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823
2021-09-13 21:15:38 -07:00
Nikita Vedeneev
88fff22023 torch.lu: forward AD support (#64742)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742

Reviewed By: H-Huang

Differential Revision: D30841227

Pulled By: albanD

fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a
2021-09-10 07:19:11 -07:00