Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)`
- Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D35222842
Pulled By: mikaylagawarecki
fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
(cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73741
There are probably more perf improvements that can be made, for example reusing more quantities from forward, doing more things inplace, but in the spirit of improving coverage, this is probably OK for now.
Note: I didn't do anything with half_to_float, but CUDA (locally) hasn't complained yet
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D34690141
Pulled By: soulitzer
fbshipit-source-id: fe934e191fee2c8e956d7a5f4b553923adf1b33f
(cherry picked from commit ae49aff7f7c8496e04a3ce7667d8f068ca0a52ec)
Summary:
This should (hopefully) make all the CI from `functorch` go green (including jvp's!) after changing `VARIADIC_BDIMS_BOXED(_svd_helper);` with `VARIADIC_BDIMS_BOXED(_linalg_svd);` and removing all the skip and xfails associated to `linalg.svdvals`.
Locally, there's just one test that started failing because of this, and that is `test_vmapjvpall_norm_nuc_cpu_float32`. I have no idea what's going on here, but it's a jvp product, so not a regression, and it might very well be caused by the jvp of other operation within `norm_nuc` as this is a composite operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72181
Reviewed By: ngimel
Differential Revision: D33952744
Pulled By: zou3519
fbshipit-source-id: 2a2510d97eed4a0bfc25615264ddd36e38856efe
(cherry picked from commit 5805fa107c)
Summary:
Reference: https://github.com/pytorch/functorch/issues/393
Context :
The derivative of `__getitem__`/`index` is
f5a71ec2d6/tools/autograd/derivatives.yaml (L733-L734)
where `index_backward` is defined as
f5a71ec2d6/torch/csrc/autograd/FunctionsManual.cpp (L3892-L3894)
Problem arises when `grad` is not BatchedTensor but one of the other input is. In that case, `grad.new_zeros` returns an unbatched tensor and call to the inplace `_index_put_impl_` errors as it expects `zeros_like_self` to be Batched.
To avoid this, we dispatch to out-of-place `index_put` if any of the input tensor is subclassed otherwise we dispatch to the inplace `_index_put_impl_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71779
Reviewed By: albanD
Differential Revision: D33790596
Pulled By: zou3519
fbshipit-source-id: 9d6d81b758740cab7b3db9b905f1e8053f82b835
(cherry picked from commit ba0407a86e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71901
We didn't catch this initially because CuDNN is not being tested on CI.
The following tests fail on master (if we build with CuDNN), but pass with this PR:
- `test_forward_mode_AD_nn_functional_batch_norm_cuda_float64`
- `test_forward_mode_AD_nn_functional_instance_norm_cuda_float64`
I don't think it is documented anywhere, but from the tests passing now I'm going to guess `result1` and `result2` return `mean` and `invstd` respectively. Previously, I thought mean and variance were returned because the variables were named `saved_mean` and `saved_var`.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33818652
Pulled By: soulitzer
fbshipit-source-id: ecee760f5aec620dc70f57de4fb3573c8f2f5f31
(cherry picked from commit 73fd3e021c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70253
I included a derivation of the formula in the complex case, as it is
particularly tricky. As far as I know, this is the first time this formula
is derived in the literature.
I also implemented a more efficient and more accurate version of svd_backward.
More importantly, I also added a lax check in the complex case making sure the loss
function just depends on the subspaces spanned by the pairs of singular
vectors, and not their joint phase.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751982
Pulled By: mruberry
fbshipit-source-id: c2a4a92a921a732357e99c01ccb563813b1af512
(cherry picked from commit 391319ed8f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827
In general, the current pattern allows for implementing optimisations
for all the backends in a common place (see for example the optimisation
for empty matrices).
After this PR, `torch.svd` is implemented in terms of `linalg.svd` and
`linalg.svdvals`, as expected. This makes it differentiable in the case
when `compute_uv=False`, although this is not particularly important, as
`torch.svd` will eventually be deprecated.
This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj
in the cases when `full_matrices=False` or `compute_uv=False`.
The memory for auxiliary `U` and `V` in the cases above, needed for some
cuSOLVER routines is allocated raw allocators rather than through fully
fledged tensors, as it's just a blob of memory the algorithm requests.
As the code is better structured now, it was easier to see that `U` and
`Vh` needn't be allocated when calling `svd_cusolver_gesvd`.
Now `linalg.svdvals` work as expected wrt the `out=` parameter.
Note that in the test `test_svd_memory_allocation` we were
passing a tensor of the wrong size and dtype and the test seemed to
pass...
This PR also changes the backward formula to avoid saving the input
matrix, as it's not necessary. In a follow up PR, I will clean the
backward formula and make it more numerically stable and efficient.
This PR also does a number of memory optimisations here and there, and fixes
the call to cusolver_gesvd, which were incorrect for m <= n. To test
this path, I compiled the code with a flag to unconditionally execute
the `if (!gesvdj_convergence_check.empty())` branch, and all the tests
passed.
I also took this chance to simplify the tests for these functions in
`test_linalg.py`, as we had lots of tests that were testing some
functionality that is already currently tested in the corresponding
OpInfos. I used xwang233's feature to test both MAGMA and CUDA
backends. This is particularly good for SVD, as cuSOLVER is always
chosen over MAGMA when available, so testing MAGMA otherwise would be
tricky.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751983
Pulled By: mruberry
fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2
(cherry picked from commit a1860bd567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70528
This PR adds checks for the backward of `linalg.eigh`, similar to those
deduced in https://github.com/pytorch/pytorch/pull/70253
It also makes its the implementation parallel that of the (fwd/bwd) derivative of
`torch.linalg.eig` and it makes most OpInfo tests pass.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D33530149
Pulled By: albanD
fbshipit-source-id: 1f368b8d450d4e9e8ae74d3881c78513c27eb956
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70527
This PR adds checks for the backward of `linalg.eig`, similar to those
deduced in https://github.com/pytorch/pytorch/pull/70253
It also modifies the function so that it does not save the input matrix,
as it's not necessary.
It also corrects the forward AD formula for it to be correct. Now all
the tests pass for `linalg.eig` and `linalg.eigvals`.
It also updates the docs to reflect better what's going on here.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D33530148
Pulled By: albanD
fbshipit-source-id: 984521a04f81ecb28ac1c4402b0243c63dd6959d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326
See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.
Test Plan: buck run //caffe2/torch/fb/sparsenn:test
Reviewed By: r-barnes
Differential Revision: D33243400
fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933
This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.
This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.
We add a test and an OpInfo for the new function.
This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.
Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D32834069
Pulled By: mruberry
fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70198
This PR fixes composite compliance problems with:
- binary_cross_entropy's backward formula
- binary_cross_entropy_with_logits's backward formula
- binary_cross_entropy's double backward formula
It does so by adding checks for areAnyTensorSubclassLike.
Test Plan:
- I tested everything with functorch.
- We are going to do https://github.com/pytorch/pytorch/issues/69530 in
the future so we have a way of testing this in core. I need the
binary_cross_entropy ones for something right now and didn't want to
wait until we come up with a solution for #69530.
Reviewed By: Chillee
Differential Revision: D33246995
Pulled By: zou3519
fbshipit-source-id: 310ed3196b937d01b189870b86a6c5f77f9258b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534
Something is TensorSubclassLike if it is a Tensor subclass or if it has
the same problems as Tensor subclasses. Today that just includes Tensor
Subclasses and meta tensors but may include other things in the future.
Some of our backwards formulas are incompatible with TensorSubclassLike
objects. For example, calling .data_ptr() is a problem because many
TensorSubclassLike objects don't have storage. Another problem is
in-place operations: performing `regular_tensor.inplace_(tensor_subclass)`
is a problem.
This PR adds special cases to the backward formulas for torch.max and
torch.clamp to handle this. The backward formulas for torch.max and
torch.clamp are not dispatcher operations so they cannot be overridden
and we hesitate to make them dispatcher operations for FC/BC concerns
and performance overhead concerns.
Furthermore, the old concept of "is this inplace operation vmap
compatible?" can be subsumed by the general "is this inplace operation
tensor-subclass compatible" question, so I replaced all instances of
isInplaceVmapCompatible and replaced it with the isTensorSubclassLike
checks.
Test Plan
- I tested the changes using functorch.
- It's possible to write a test for these in core (one has to make
a custom tensor subclass and then send it through the operation and then
invoke autograd), but I wanted to push the work to doing some
generic testing for backward formulas
(https://github.com/pytorch/pytorch/issues/69530) instead of doing some
one-off things now.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D32967727
Pulled By: zou3519
fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570
There is a use of `at::triangular_solve_out` in the file
`torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared
to move to `at::linalg_solve_triangular_out`.
**Deprecation note:**
This PR deprecates the `torch.triangular_solve` function in favor of
`torch.linalg.solve_triangular`. An upgrade guide is added to the
documentation for `torch.triangular_solve`.
Note that it DOES NOT remove `torch.triangular_solve`, but
`torch.triangular_solve` will be removed in a future PyTorch release.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D32618035
Pulled By: anjali411
fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569
This PR also rewrites `lu_solve_backward` from scratch going from
solving 5 systems of equations to just 2.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D32618014
Pulled By: anjali411
fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933
This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.
This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.
We add a test and an OpInfo for the new function.
This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.
Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D32521980
Pulled By: mruberry
fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568
This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.
This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.
This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.
Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.
We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.
Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32588230
Pulled By: mruberry
fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.
cc gmagogsfm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937
Reviewed By: mruberry
Differential Revision: D32477657
Pulled By: ngimel
fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568
This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.
This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.
This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.
Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.
We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.
Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: zou3519, JacobSzwejbka
Differential Revision: D32283178
Pulled By: mruberry
fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8