Turns out this is just a composite compliance issue. Branching on if
something requires grad or not can lead to incorrect gradients if we
have a BatchedTensor wrapping a tensor that requires grad.
Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84939
Approved by: https://github.com/soulitzer
This ref does more things than `torch.norm`, and it fixes a few bugs
that `torch.norm` has. This implementation and the `torch.norm`
implementation come to terms in the next PR of this stack
We put this PR before, as otherwise `test_decomp.py` was failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81765
Approved by: https://github.com/ngimel
prod performs a sync to test for zeros as the formula is substantially
simpler if there are no zeros, but this doesn't work for meta tensors.
The double backwards formula works great in all cases though!
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81617
Approved by: https://github.com/soulitzer
This PR also adds complex support for logdet, and makes all these
functions support out= and be composite depending on one function. We
also extend the support of `logdet` to complex numbers and improve the
docs of all these functions.
We also use `linalg_lu_factor_ex` in these functions, so we remove the
synchronisation present before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79742
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
This PR is in preparation for implementing `logdet` and `slogdet` as
structured kernels + implementing them with more efficient derivatives
We implement forward AD for det. We also simplify the implementation of
the backward, and leave a note on how to implement it properly for
singular matrices. We leave thad for future work.
Note (by looking at the OpInfo) that the current implementation passes
the same tests as the one before. We skip the forward-over-backward in
the singular case, as that one was not working in the gradgrad case
either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79487
Approved by: https://github.com/nikitaved, https://github.com/albanD
The previous PR in this stack uncovered an error in the forward over
backward for this function.
In this PR, we fix this error and we also fix the gradgrad
implementation (and make it more stable and faster using `logsigmoid`).
We also move the double backward for this function to `FunctoinsManual`
as there's no reason for it to be in `native_functions`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80083
Approved by: https://github.com/zou3519
This PR:
- Corrects the forward AD formula of `torch.sgn`.
- The reason why we can't use `auto_element_wise` for this operations is rather subtle. I left a comment.
- This, in turn, fixes a problem we had in forward-over-backward for `linalg.svd` and other spectral decompositions (and `norm`, `linalg.norm`, `linalg.matrix_norm`) that were using `torch.abs` (whose derivative is given by `torch.sgn`.
- Implement the formula for a number of missing operations `nansum`, `amax`, `amin`...
- Simplified a few formulas, most notably the forward AD for `div` and the derivative of `norm`, `linalg.norm` and `vector_norm` for `ord=+-inf`.
- Correct the formula for `mean`, `std_mean`, `var_mean` when `dim` is provided and equal to `()` (or `None`)
- A few minor improvements to `sum_backward`, `unsqueeze_multiple` and formulas depending on them
- Fix the derivatives of `std_mean` and `std_var` (complex support,
ASAN, forward AD...)
Fixes: https://github.com/pytorch/pytorch/issues/67539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80082
Approved by: https://github.com/zou3519
The previous PR in this stack uncovered an error in the forward over
backward for this function.
In this PR, we fix this error and we also fix the gradgrad
implementation (and make it more stable and faster using `logsigmoid`).
We also move the double backward for this function to `FunctoinsManual`
as there's no reason for it to be in `native_functions`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79381
Approved by: https://github.com/soulitzer
This PR:
- Corrects the forward AD formula of `torch.sgn`.
- The reason why we can't use `auto_element_wise` for this operations is rather subtle. I left a comment.
- This, in turn, fixes a problem we had in forward-over-backward for `linalg.svd` and other spectral decompositions (and `norm`, `linalg.norm`, `linalg.matrix_norm`) that were using `torch.abs` (whose derivative is given by `torch.sgn`.
- Implement the formula for a number of missing operations `nansum`, `amax`, `amin`...
- Simplified a few formulas, most notably the forward AD for `div` and the derivative of `norm`, `linalg.norm` and `vector_norm` for `ord=+-inf`.
- Correct the formula for `mean`, `std_mean`, `var_mean` when `dim` is provided and equal to `()` (or `None`)
- A few minor improvements to `sum_backward`, `unsqueeze_multiple` and formulas depending on them
- Fix the derivatives of `std_mean` and `std_var` (complex support,
ASAN, forward AD...)
Fixes: https://github.com/pytorch/pytorch/issues/67539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77975
Approved by: https://github.com/soulitzer
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.
As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79804
Approved by: https://github.com/zou3519, https://github.com/malfet
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.
As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78257
Approved by: https://github.com/jbschlosser
This PR heavily simplifies the code of `linalg.solve`. At the same time,
this implementation saves quite a few copies of the input data in some
cases (e.g. A is contiguous)
We also implement it in such a way that the derivative goes from
computing two LU decompositions and two LU solves to no LU
decompositions and one LU solves. It also avoids a number of unnecessary
copies the derivative was unnecessarily performing (at least the copy of
two matrices).
On top of this, we add a `left` kw-only arg that allows the user to
solve `XA = B` rather concisely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74046
Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.
We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.
We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.
Fixes https://github.com/pytorch/pytorch/issues/61657
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77634
Approved by: https://github.com/malfet
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.
We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.
We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.
Fixes https://github.com/pytorch/pytorch/issues/61657
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935
Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies
Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.
This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.
I really hope we don't continue adding more features.
This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833
Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
We derive and implement a more concise rule for the forward and backward
derivatives of the QR decomposition. While doing this we:
- Fix the composite compliance of `linalg.qr` and we make it support batches
- Improve the performance and simplify the implementation of both foward and backward
- Avoid saving the input matrix for the backward computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76115
Approved by: https://github.com/nikitaved, https://github.com/albanD
The problem was that `grad_input` and `grad_target` may be ZeroTensors,
which are immutable. This PR changes it so that operations on grad_input
and grad_target in `binary_cross_entropy_with_logits_jvp` are no longer
in-place.
Test Plan:
- run existing tests
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76322
Approved by: https://github.com/soulitzer
This PR:
- Derives formally a new rule for Cholesky (write-up to come)
- Implements it without using in-place operations in the forward or backward.
- Does not instantiate inverses explicitly, but rather it solves two triangular systems of equations (2 triang vs 1 triang and 2 matmuls should be comparable, but the first one should be more stable).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76032
Approved by: https://github.com/nikitaved, https://github.com/albanD
Summary:
[Comment](https://github.com/pytorch/pytorch/pull/62445/files#r680132022) claims, it got added for consistency with top level CMakeLists.txt, but `-Wno-unused-variable` is not mentioned there.
Modify violations in 50+ files that were added in the interim by either removing unused variables, or decorating the code with `C10_UNUSED` if local variable is likely used to extend object lifetime until the end of the block.
Caused preventable revert in https://github.com/pytorch/pytorch/pull/72633#issuecomment-1092300787
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75538
Reviewed By: anjali411
Differential Revision: D35747333
Pulled By: malfet
fbshipit-source-id: 3fc5828e44a4c05ba0e89e92613e6ebbdb260626
(cherry picked from commit c179fba21cfa2a0093fad50ccad5a22dd7cff52c)
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm
`torch.mm` also works, but tests will be added later.
In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.
`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)`
- Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D35222842
Pulled By: mikaylagawarecki
fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
(cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73741
There are probably more perf improvements that can be made, for example reusing more quantities from forward, doing more things inplace, but in the spirit of improving coverage, this is probably OK for now.
Note: I didn't do anything with half_to_float, but CUDA (locally) hasn't complained yet
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D34690141
Pulled By: soulitzer
fbshipit-source-id: fe934e191fee2c8e956d7a5f4b553923adf1b33f
(cherry picked from commit ae49aff7f7c8496e04a3ce7667d8f068ca0a52ec)
Summary:
This should (hopefully) make all the CI from `functorch` go green (including jvp's!) after changing `VARIADIC_BDIMS_BOXED(_svd_helper);` with `VARIADIC_BDIMS_BOXED(_linalg_svd);` and removing all the skip and xfails associated to `linalg.svdvals`.
Locally, there's just one test that started failing because of this, and that is `test_vmapjvpall_norm_nuc_cpu_float32`. I have no idea what's going on here, but it's a jvp product, so not a regression, and it might very well be caused by the jvp of other operation within `norm_nuc` as this is a composite operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72181
Reviewed By: ngimel
Differential Revision: D33952744
Pulled By: zou3519
fbshipit-source-id: 2a2510d97eed4a0bfc25615264ddd36e38856efe
(cherry picked from commit 5805fa107c)
Summary:
Reference: https://github.com/pytorch/functorch/issues/393
Context :
The derivative of `__getitem__`/`index` is
f5a71ec2d6/tools/autograd/derivatives.yaml (L733-L734)
where `index_backward` is defined as
f5a71ec2d6/torch/csrc/autograd/FunctionsManual.cpp (L3892-L3894)
Problem arises when `grad` is not BatchedTensor but one of the other input is. In that case, `grad.new_zeros` returns an unbatched tensor and call to the inplace `_index_put_impl_` errors as it expects `zeros_like_self` to be Batched.
To avoid this, we dispatch to out-of-place `index_put` if any of the input tensor is subclassed otherwise we dispatch to the inplace `_index_put_impl_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71779
Reviewed By: albanD
Differential Revision: D33790596
Pulled By: zou3519
fbshipit-source-id: 9d6d81b758740cab7b3db9b905f1e8053f82b835
(cherry picked from commit ba0407a86e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71901
We didn't catch this initially because CuDNN is not being tested on CI.
The following tests fail on master (if we build with CuDNN), but pass with this PR:
- `test_forward_mode_AD_nn_functional_batch_norm_cuda_float64`
- `test_forward_mode_AD_nn_functional_instance_norm_cuda_float64`
I don't think it is documented anywhere, but from the tests passing now I'm going to guess `result1` and `result2` return `mean` and `invstd` respectively. Previously, I thought mean and variance were returned because the variables were named `saved_mean` and `saved_var`.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33818652
Pulled By: soulitzer
fbshipit-source-id: ecee760f5aec620dc70f57de4fb3573c8f2f5f31
(cherry picked from commit 73fd3e021c)