Commit Graph

371 Commits

Author SHA1 Message Date
kshitij12345
ffd76d11c9 [fix] take : backward batching rule (#95772)
Fixes https://github.com/pytorch/pytorch/issues/95738

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95772
Approved by: https://github.com/zou3519
2023-03-30 17:18:17 +00:00
Li-Huai (Allan) Lin
7776653a0c Add linear gradgrad (#97151)
Fixes #92206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97151
Approved by: https://github.com/albanD
2023-03-30 07:25:02 +00:00
Edward Z. Yang
32fdd44577 SymIntify maybe_multiply (#97675)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97675
Approved by: https://github.com/albanD
2023-03-27 23:20:23 +00:00
Joel Schlosser
0d66db1b2a Implement last dim split_with_sizes for NT (forward only, non-SymInt-ified) (#97446)
This is needed for the HSTU model.

Details:
* ~~NT `chunk` now calls into NT `split_with_sizes` since the latter is more general~~ (removed; they're totally separate)
* Throws for backward
* Only operates over the last dim (`dim=-1`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97446
Approved by: https://github.com/cpuhrsch
2023-03-23 22:17:06 +00:00
Driss Guessous
98a5cf090d [SDPA] Remove the chunk_grad from mem-eff attention (#96880)
# Summary

There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized.

I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880
Approved by: https://github.com/cpuhrsch
2023-03-17 21:28:25 +00:00
Driss Guessous
7ec0d6f006 Moves SDPA backward helper native function to functionsmanual.cpp (#95821)
## Summary
chunk_grad_outputs should have been created within functionsmanual.cpp to begin with. This removes it as a native function and adds to its appropriate home.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95821
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-03-14 17:49:07 +00:00
Nikita Shulga
82daf98151 [Sparse] Move SparseTensorUtils.* to native/ (#96696)
Fixes internal linking problem after `DECLARE_DISPATCH` was introduced in SparseTensorUtils.cpp, but implemented inside the native library.

Also, fix `sign-unsigned` compare in `_flatten_indices_impl`
Followups:
 Move code declared/implemented in `SparseTensorUtils.*` to `at::native` namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96696
Approved by: https://github.com/albanD
2023-03-14 02:56:52 +00:00
Kazuaki Ishizaki
69aa6b4bb9 fix typo in comments under torch/csrc/autograd (#96061)
This PR fixes typos in comments of `.cpp` and `.h` files under `torch/csrc/autograd` directory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96061
Approved by: https://github.com/soulitzer
2023-03-06 18:05:14 +00:00
Masaki Kozuki
49f6849f58 Fix codegen logic for foreach derivatives (#95263)
follow-up https://github.com/pytorch/pytorch/pull/93901.

Unexpected numerical mismatches observed in some foreach functions' backward result seemed to be caused by the wrong order of `IndexRangeGenerator::range` call.
This pr has `args_with_derivatives` have the same or similar order of `foreach_native_function.func.arguments.flat_non_out`

---

what the current master generates for `_foreach_mul.List`:
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto other_ix = gen.range(other_size_);
  auto self_ix = gen.range(self_size_);
  variable_list grad_inputs(gen.size());
  auto other = unpack_list(other_);
  auto self = unpack_list(self_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}
```

with this PR the generated backward is
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto self_ix = gen.range(self_size_);                                         <----- diff
  auto other_ix = gen.range(other_size_);                                       <----- diff
  variable_list grad_inputs(gen.size());
  auto self = unpack_list(self_);
  auto other = unpack_list(other_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}

```

The change is to fix the order of `self_ix` and `other_ix`.[](url)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95263
Approved by: https://github.com/soulitzer
2023-03-04 20:03:54 +00:00
Edward Z. Yang
fb10e66d35 Bulk convert numel() to sym_numel() in FunctionsManual (#95543)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95543
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2023-02-27 13:46:13 +00:00
Peter Bell
bc438af6fe std/var: support floating point correction value (#94073)
Ref https://github.com/pytorch/pytorch/issues/61492#issuecomment-1413003480

The array API specifies correction to be `Union[int, float]` while we currently only support integers.
https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html

As std/var is calculated currently, the final count of elements is already done
in floating point so we can make the correction floating point without any loss
of precision or generality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94073
Approved by: https://github.com/ezyang
2023-02-23 05:50:45 +00:00
Li-Huai (Allan) Lin
b6a1c238bd [MPS] Remove mps specialized path in BCE backward (#95220)
Remove mps specialized path in BCE backward as `logit` op has been implemented for mps.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95220
Approved by: https://github.com/soulitzer
2023-02-22 19:43:53 +00:00
kshitij12345
311b20aae1 [fix] torch.pow handle real negative base and complex exponent (#95198)
Fixes https://github.com/pytorch/pytorch/issues/89903 https://github.com/pytorch/pytorch/issues/95111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95198
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-02-21 18:36:20 +00:00
min-jean-cho
92f3feabaa fix torch.var backward when n==correction (#94546)
Fixes #94184

This PR, as discussed in [comment ](https://github.com/pytorch/pytorch/issues/94184#issuecomment-1422128166),  returns `x.grad` of same shape as `x`, and filled with `NaN` when the gradient of `torch.var(unbiased=True)` is `NaN`. The gradient of unbiased variance is `NaN` (undefined, divide by zero in the denom `N-1`, where `N` is the number of samples) when `N` is 1 (i.e., there's one sample only -- product of dim is 1 such as `[1]`, `[1,...,1]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94546
Approved by: https://github.com/soulitzer
2023-02-13 23:38:38 +00:00
Yanming Wang
9bef1ebb9e Fix div by fp64 scalar issue on xla device (#94459)
This PR fixes https://github.com/pytorch/xla/issues/4574. I'll create a separate test PR in pytorch/xla repo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94459
Approved by: https://github.com/ezyang
2023-02-10 17:57:47 +00:00
cyy
1a32db15e7 Some performance fixes (#94034)
Applies some performance fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034
Approved by: https://github.com/Skylion007
2023-02-04 02:17:48 +00:00
Nikita Vedeneev
b484d17c24 _sparse_coo_tensor_with_dims_and_tensors backward: simplify and optimize (#91704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91704
Approved by: https://github.com/albanD, https://github.com/cpuhrsch
2023-02-01 09:02:25 +00:00
cyy
4d51c8532c Some simple fixes (#93221)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93221
Approved by: https://github.com/Skylion007
2023-01-30 05:14:03 +00:00
Aaron Gokaslan
0247ed27cc Apply Clang-Tidy readability-container-size-empty (#93236)
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
2023-01-29 23:28:19 +00:00
mfkasim1
75cfc0be21 Logcumsumexp for CPU (#93153)
Partial work from #90847, in the direction of solving #89205.
Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot.

tag: @albanD, @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93153
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-01-27 22:29:33 +00:00
cyy
e292ddff4e More clang-tidy fixes (#92944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92944
Approved by: https://github.com/Skylion007
2023-01-25 19:11:51 +00:00
PyTorch MergeBot
9b23fd378f Revert "Logcumsumexp for complex in CPU and CUDA (#90847)"
This reverts commit 64985123e4.

Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
Aaron Gokaslan
8c8cd9539d Add missing moves to torch autograd (#92772)
Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772
Approved by: https://github.com/ezyang
2023-01-24 02:01:52 +00:00
Nikita Vedeneev
9f381c9b7f sparse_sparse_matmul: simplify backward (#91712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712
Approved by: https://github.com/albanD
2023-01-23 19:24:28 +00:00
mfkasim1
64985123e4 Logcumsumexp for complex in CPU and CUDA (#90847)
Another PR towards solving #89205.
What's in this PR:

* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers

What's missing:

* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
Peter Bell
4058dedf21 Replace log(1 + x) with log1p(x) (#92114)
`log1p` offers better precision near zero since `(1 + x) - 1` truncates any
values less than the float epsilon to zero. For `soft_margin_loss` this also
requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup
on CUDA and a 1.1x speedup on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-01-18 10:43:56 +00:00
Peter Bell
fb1427ea8f squeeze: allow squeezing multiple dimensions at once (#89017)
Ref #70924

This addresses part 1 of the issue, allowing `torch.squeeze` to be
passed a tuple of dimensions. e.g.
```python
x.squeeze(0).squeeze(0)
```
can now be written
```python
x.squeeze((0, 1))
```
(assuming x has at least 2 dimensions)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017
Approved by: https://github.com/albanD
2023-01-17 14:20:15 +00:00
David Berard
d7dc1c2fd5 Support zero dimensions in softmax decompositions (#91322)
The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including:
* decompositions & refs (this was causing dynamo failures)
* forward AD for logsumexp
* MPS log_softmax_backward

This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos.

example of "computation along zero dimensions":

```python
# example of where
import torch

t = torch.rand((4, 0, 0))
print("~")
print(torch.nn.functional.softmax(t, dim=-1))  # this passes
print("~")
torch._refs.softmax(t, dim=-1)  # this fails
print("~")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322
Approved by: https://github.com/lezcano
2023-01-11 09:35:43 +00:00
PyTorch MergeBot
df4b3b13bc Revert "squeeze: allow squeezing multiple dimensions at once (#89017)"
This reverts commit e26cb06681.

Reverted https://github.com/pytorch/pytorch/pull/89017 on behalf of https://github.com/mehtanirav due to Internal breakages
2023-01-05 19:25:08 +00:00
Peter Bell
e26cb06681 squeeze: allow squeezing multiple dimensions at once (#89017)
Ref #70924

This addresses part 1 of the issue, allowing `torch.squeeze` to be
passed a tuple of dimensions. e.g.
```python
x.squeeze(0).squeeze(0)
```
can now be written
```python
x.squeeze((0, 1))
```
(assuming x has at least 2 dimensions)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017
Approved by: https://github.com/albanD
2023-01-04 14:40:56 +00:00
lezcano
d5163f5206 Fix NumPy broadcasting in lstsq_backward (#91460)
Fixes https://github.com/pytorch/pytorch/issues/77225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91460
Approved by: https://github.com/albanD
2022-12-30 10:49:20 +00:00
lezcano
051d16a2f7 Fix NumPy-compat broadcasting in the derivative of linalg.solve (#91456)
Fixes https://github.com/pytorch/pytorch/issues/89761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91456
Approved by: https://github.com/albanD
2022-12-30 10:49:20 +00:00
lezcano
484dd40022 Implement PReLU in a compositional way (#91238)
The PReLU implementation was all over the place. This lead to a number
of bugs like https://github.com/pytorch/pytorch/issues/68760.  We fix it by:
- Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel
- This second kernel is just a good-ol' pointwise kernel.
- We implement the derivative for the pointwise kernel via TI as well for speed.
- We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally

This fixes a number of issues:
- We don't perform copies any more when the inputs are not contiguous
- The derivatives are now correct
- We fix vmap and many other functorch-related issues.
- CPU and CUDA now share the relevant broadcasting logic
- The implementation is about 1/3 the length.

Fixes https://github.com/pytorch/pytorch/issues/68760
Fixes https://github.com/pytorch/pytorch/issues/89895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238
Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD
2022-12-30 10:42:30 +00:00
lezcano
5b223c43ec Avoid calling allclose in the backward if there are tensor subclasses (#91444)
`allclose` it's data-dependent (returns a bool) so it does not play well
with functorch. We are skipping that check in the context of subclasses
to avoid hard errors.

Partially fixes https://github.com/pytorch/pytorch/issues/90499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91444
Approved by: https://github.com/albanD
2022-12-28 19:12:50 +00:00
Nikita Karetnikov
cc11edb084 [aot_autograd] symintify logsumexp (#91442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91442
Approved by: https://github.com/albanD
2022-12-28 18:06:26 +00:00
Nikita Vedeneev
3870a9e28d to_sparse_XXX: backward support (#90281)
As per title. Fixes https://github.com/pytorch/pytorch/issues/85226

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90281
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2022-12-14 09:05:17 +00:00
soulitzer
98a9235dce Fix prelu ref when a.ndim < 2 (#89809)
Fixes https://github.com/pytorch/pytorch/issues/89560

Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures:
- forward AD (fixed in this PR)
- vmap (filed https://github.com/pytorch/pytorch/issues/89895)
- ref/meta (fixed this PR, though this also regresses nvFuser support)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809
Approved by: https://github.com/ngimel
2022-12-12 23:55:31 +00:00
Aaron Gokaslan
7541c9f8be [Fix]: remove unnecessary copies in aten, c10, and torch bindings (#90629)
Applies various automated fixes that reduces the number of spurious copies in torch, aten, and c10. I also inlined any default dtors that would have made the type trivially destructible.

Follow up to #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90629
Approved by: https://github.com/ezyang
2022-12-12 17:05:52 +00:00
Richard Zou
4b1053497c [vmap] Prepend "legacy" to files for old vmap implementation (#90324)
We have an older torch.vmap implementation. It is no longer supported.
It still needs to exist somewhere for the sake of BC with
torch.autograd.functional.

This PR makes it clear what files are meant for implementing the old
vmap implementation. I've seen a couple of PRs recently adding support
for the old vmap implementation, so this will lessen the confusion.

Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90324
Approved by: https://github.com/samdow
2022-12-07 18:46:15 +00:00
Nikita Karetnikov
4cb6bbbe27 Symintify embedding (#89327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
Approved by: https://github.com/ezyang
2022-11-24 03:25:00 +00:00
Andrew M. James
a41f70603a Round out rad2deg sparse support (#88442)
- Add sparse coo dispatch
- Modify backward to work with sparse compressed layouts
- Enable sparse_compressed autograd testing
- Correct layout support attributes on OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442
Approved by: https://github.com/cpuhrsch
2022-11-17 06:00:23 +00:00
Kazuaki Ishizaki
e0c194f10b Fix typos in messages under torch (#88961)
This PR fixes typos of messages and parms in c++ source and head files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961
Approved by: https://github.com/albanD
2022-11-14 19:06:41 +00:00
Brian Hirsh
a16ced03c9 reland "fix as_strided_scatter_backward (#87646)" (#88342)
This reverts commit 71fb763e54.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88342
Approved by: https://github.com/zou3519
2022-11-07 15:00:58 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
6938dd0b2c Support sparse inputs to deg2rad (#88156)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
PyTorch MergeBot
71fb763e54 Revert "fix as_strided_scatter_backward (#87646)"
This reverts commit f9d7985851.

Reverted https://github.com/pytorch/pytorch/pull/87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk 70782981f0
2022-11-02 16:54:36 +00:00
Brian Hirsh
f9d7985851 fix as_strided_scatter_backward (#87646)
as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory.

Fixes https://github.com/pytorch/pytorch/issues/88105

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87646
Approved by: https://github.com/albanD
2022-11-02 14:36:49 +00:00
albanD
8a9aca7b8d Reland 2 Many symintifications (#87604) (#87980)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980
Approved by: https://github.com/ezyang
2022-10-28 13:40:11 +00:00
PyTorch MergeBot
8b4d95759c Revert "Many symintifications (#87604)"
This reverts commit 777e6a2c51.

Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2022-10-28 03:00:11 +00:00
albanD
777e6a2c51 Many symintifications (#87604)
Adds
expand_inplace
conv conv_double_backward
convolution
adaptive_avg_pool2d_symint
_embedding_bag_backward_symint
cudnn_grid_sampler
cuda 32 bit indexing
nll_loss / nll_loss_2d
tensor split
pooling same mode
cudnn_is_acceptable
storage nbytes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604
Approved by: https://github.com/ezyang
2022-10-26 17:33:53 +00:00