Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49721
As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998).
All symbols of ATen/native could only be referenced through dispatcher (https://github.com/pytorch/pytorch/issues/48684).
This diff is to decouple the native reference recently introduced for sparse tensors.
ghstack-source-id: 119028080
Test Plan: CI
Reviewed By: dhruvbird, ngimel
Differential Revision: D25675711
fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c
Summary:
This PR adds `torch.linalg.solve`.
`linalg_solve_out` uses in-place operations on the provided result tensor.
I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.
In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456
Reviewed By: izdeby
Differential Revision: D25562222
Pulled By: mruberry
fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
Summary:
This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training.
`native_layer_norm(X, gamma, beta, M, N, eps)` =>
`native_layer_norm(input, normalized_shape, weight, bias, eps)`
`native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` =>
`native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971
Reviewed By: izdeby
Differential Revision: D25574070
Pulled By: ngimel
fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49118
I need this in the next stack up. It seems useful to have as a helper
function.
Test Plan: - run tests
Reviewed By: izdeby
Differential Revision: D25563546
Pulled By: zou3519
fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2
Summary:
Updated `qr_backward` to work correctly for complex-valued inputs.
Added `torch.qr` to list of complex tests.
The previous implementation for real-valued differentiation used equation 42 from https://arxiv.org/abs/1001.1654
The current implementation is a bit simpler but the result for the real-valued input case is the same and all tests still pass.
Derivation of complex-valued QR differentiation https://giggleliu.github.io/2019/04/02/einsumbp.html
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48489
Reviewed By: bdhirsh
Differential Revision: D25272344
Pulled By: albanD
fbshipit-source-id: b53c1fca1683f4aee5f4d5ce3cab9e559170e7cf
Summary:
**BC-breaking note:**
Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy.
**PR Summary:**
The following cases were previously unsupported for complex inputs, and this commit adds support:
- Frobenius norm
- Norm order 2 (vector and matrix)
- CUDA vector norm
Part of https://github.com/pytorch/pytorch/issues/47833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284
Reviewed By: H-Huang
Differential Revision: D25420880
Pulled By: mruberry
fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017
Summary:
`torch.cholesky_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs now.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047
Reviewed By: ngimel
Differential Revision: D24730020
Pulled By: mruberry
fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f
Summary:
This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility.
The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does).
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526
Reviewed By: gchanan
Differential Revision: D25022659
Pulled By: mruberry
fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48057
This PR fixes batched grad computation for:
- binary_cross_entropy (i.e., vmap through binary_cross_entropy_double_backward)
- symeig (i.e. vmap through symeig_backward)
It was previously impossible to vmap through those functions because
they use in-place operations in a vmap-incompatible way.
See note at
233192be73/aten/src/ATen/BatchedFallback.cpp (L117-L122)
for what it means for an in-place operation to be vmap-incompatible.
This PR adds a check: if the in-place operations in e.g. symeig are
vmap-incompatible and we are inside of a vmap, then we do the
out-of-place variant of the operation. Ditto for binary_cross_entropy.
This is to avoid code duplication: the alternative would be to register
the backward formula as an operator and change just those lines to be
out-of-place!
This PR also adds some general guidelines for what to do if an in-place
operation is vmap-incompatible.
General guidelines
------------------
If an in-place operation used in a backward formula is vmap-incompatible,
then as developers we have the following options:
- If the in-place operation directly followed the creation of a tensor with
a factory function like at::zeros(...), we should replace the factory with a
corresponding grad.new_zeros(...) call. The grad.new_zeros(...) call
propagates the batch dims to the resulting tensor.
For example:
Before: at::zeros(input.sizes(), grad.options()).copy_(grad)
After: grad.new_zeros(input.sizes()).copy_(grad)
- If the in-place operation followed some sequence of operations, if the
we want to be able to vmap over the backward formula as-is (this is
usually the case for simple (<15loc) backward formulas), then use
inplace_is_vmap_compatible to guard the operation. For example:
c = a * b
Before: c.mul_(grad)
After: c = inplace_is_vmap_compatible(c, grad) ? c.mul_(grad) : c * grad
- If we don't want to vmap directly over the backward formula (e.g., if the
backward formula is too complicated or has a lot of vmap-incompatible
operations, then register the backward formula as an operator and eventually
write a batching rule for it.
Test Plan
---------
New tests
Test Plan: Imported from OSS
Reviewed By: zhangguanheng66
Differential Revision: D25069525
Pulled By: zou3519
fbshipit-source-id: e0dfeb5a812f35b7579fc6ecf7252bf31ce0d790
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47227
Motivation
----------
We would like to compute batched gradients for view+inplace operations.
This most notably shows up in internal implementation of operations.
For example, many view backward functions (SelectBackward, DiagonalBackward)
are implemented with view+inplace, so to support vectorized hessian
computation for e.g. torch.select and torch.diagonal we would need a
way to handle or workaround view+inplace.
Approach
--------
view+inplace creates a CopySlices node and transmute view backward nodes
into an AsStrided node. For example,
```
leaf = torch.randn(4, 5, requires_grad=True)
base = leaf * leaf
view = base[0]
view.cos_()
```
base.grad_fn is CopySlices and view.grad_fn is AsStridedBackward.
To support vmap over CopySlices and AsStridedBackward:
- We use `new_empty_strided` instead of `empty_strided` in CopySlices
so that the batch dims get propagated
- We use `new_zeros` inside AsStridedBackward so that the batch dims get
propagated.
Test Plan
---------
- New tests. When we get closer to having most operations support batched
grad computation via vmap, I'd like to add it as an option to gradcheck
and turn it on for our tests.
Test Plan: Imported from OSS
Reviewed By: kwanmacher, glaringlee
Differential Revision: D24741687
Pulled By: zou3519
fbshipit-source-id: 8210064f782a0a7a193752029a4340e505ffb5d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46674
Summary
-------
This adds batched gradient support (i.e., vmap through the gradient
formulas) for Tensor.max(), Tensor.min(), Tensor.median()
that have evenly_distribute_backward as their backward formula.
Previously, the plan was to register incompatible gradient formulas as
backward operators (see #44052). However, it turns out that we can just use
`new_zeros` to get around some incompatible gradient formulas (see next
section for discussion).
Context: the vmap+inplace problem
---------------------------------
A lot of backwards functions are incompatible with BatchedTensor due to
using in-place operations. Sometimes we can allow the in-place
operations, but other times we can't. For example, consider select_backward:
```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes,
int64_t dim, int64_t index) {
auto grad_input = at::zeros(input_sizes, grad.options());
grad_input.select(dim, index).copy_(grad);
return grad_input;
}
```
and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
torch.autograd.grad(x[0], x, v)
vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```
For the batched gradient use case, grad is a BatchedTensor.
The physical version of grad has size (B0,).
However, select_backward creates a grad_input of shape (5), and
tries to copy grad to a slice of it.
Up until now, the proposal to handle this has been to register these
backward formulas as operators so that vmap doesn’t actually see the
`copy_` calls (see #44052). However, it turns out we can actually just
use `new_zeros` to construct a new Tensor that has the same
"batched-ness" as grad:
```
auto grad_input = grad.new_zeros(input_sizes);
grad_input.select(dim, index).copy_(grad);
```
We should use this for simple backward functions. For more complicated
backward functions where this solution doesn't work, we should register
those as operators.
Alternatives
------------
Option 2: Register `evenly_distribute_backward` as an operator and have the
vmap fallback run it in a loop.
- This requires more LOC changes.
- Furthermore, we'd have to write an efficient batching rule for
`evenly_distribute_backward` in the future.
- If we use `new_zeros` instead, we don't need to write an efficient
batching rule for `evenly_distribute_backward` as long as the
constituents of `evenly_distributed_backward` have efficient batching rules.
Option 3: Have factory functions perform differently if they are called
inside vmap.
- For example, `at::zeros(3, 5)` could return a Tensor of shape
`(B0, B1, 3, 5)` if we are vmapping over two dimensions with size B0 and B1.
This requires maintaining some global and/or thread-local state about
the size of the dims being vmapped over which can be tricky.
And more...
Future
------
- I will undo some of the work I’ve done in the past to move backward
functions to being operators (#44052, #44408). The simpler backward
functions (like select backward) can just use Tensor.new_zeros.
I apologize for the thrashing.
- Include a NOTE about the vmap+inplace problem somewhere in the
codebase. I don't have a good idea of where to put it at the moment.
Test Plan
---------
- New tests
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D24456781
Pulled By: zou3519
fbshipit-source-id: 9c6c8ee2cb1a4e25afd779bdf0bdf5ab76b9bc20
Summary:
Updated `cholesky_backward` to work correctly for complex input.
Note that the current implementation gives the conjugate of what JAX would return. anjali411 is that correct thing to do?
Ref. https://github.com/pytorch/pytorch/issues/44895
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45267
Reviewed By: bwasti
Differential Revision: D23975269
Pulled By: anjali411
fbshipit-source-id: 9908b0bb53c411e5ad24027ff570c4f0abd451e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280
Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min|max|median)
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D23908796
Pulled By: heitorschueroff
fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time
fixing some type errors, updated fn signature in a few more files
removing my usage of Scalar, making beta a double everywhere instead
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23636720
Pulled By: bdhirsh
fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955
resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors.
`torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0`
This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D23460526
Pulled By: anjali411
fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208
This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf
More concretely, this PR introduces the following changes:
1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated.
2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added.
3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`.
4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`.
Follow up tasks:
1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)`
2. Add back commented test in `common_methods_invocation.py`.
3. Add more special case checking for complex gradcheck to make debugging easier.
4. Update complex autograd note.
5. disable complex autograd for operators not tested for complex.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D23655088
Pulled By: anjali411
fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410
See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.
Test Plan: - `pytest test/test_autograd.py -v`
Reviewed By: mrshenli
Differential Revision: D23605503
Pulled By: zou3519
fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052
Summary
=======
This PR registers the following backwards functions as operators:
- slice_backward
- select_backward
- gather_backward
- index_select_backward (the backward function for index_select)
- select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc)
In the future, I'd like to register more backward functions as operators
so that we can write batching rules for the backward functions. Batching
rules for backward functions makes it so that we can compute batched
gradients.
Motivation
==========
The rationale behind this PR is that a lot of backwards functions (27 in total)
are incompatible with BatchedTensor due to using in-place operations.
Sometimes we can allow the in-place operations, but other times we can't.
For example, consider select_backward:
```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
auto grad_input = at::zeros(input_sizes, grad.options());
grad_input.select(dim, index).copy_(grad);
return grad_input;
}
```
and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
torch.autograd.grad(x[0], x, v)
vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```
For the batched gradient use case, `grad` is a BatchedTensor.
The physical version of `grad` has size `(B0,)`.
However, select_backward creates a `grad_input` of shape `(5)`, and
tries to copy `grad` to a slice of it.
Other approaches
================
I've considered the following:
- register select_backward as an operator (this PR)
- have a branch inside select_backward for if `grad` is batched.
- this is OK, but what if we have more tensor extensions that want to override this?
- modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior".
- select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful
Test Plan
=========
- `pytest test/test_autograd.py -v`
- Registering backward functions may impact performance. I benchmarked
select_backward to see if registering it as an operator led to any noticable
performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc.
The TL;DR is that the overhead is pretty minimal.
Test Plan: Imported from OSS
Reviewed By: ezyang, fbhuba
Differential Revision: D23481183
Pulled By: zou3519
fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711
this makes them available in forward if needed
No change to the file content, just a copy-paste.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23454146
Pulled By: albanD
fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195