Commit Graph

449 Commits

Author SHA1 Message Date
Nikita Karetnikov
c4a6f86062 [pt2] add metas for max_unpool2d and max_unpool3d (#103821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103821
Approved by: https://github.com/Skylion007, https://github.com/Chillee
2023-07-01 01:33:35 +00:00
Nikita Vedeneev
5cf3a99013 sampled_addmm: backward performance improvements (#103544)
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises https://github.com/pytorch/pytorch/pull/103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103544
Approved by: https://github.com/soulitzer
2023-06-28 08:49:54 +00:00
Nikita Vedeneev
567b5e5b28 Multioutput backward formula: allow conditional guards against saving (#103750)
Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for `_trilinear` with this modification:
```
at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

```

with the following backward modifications:
```
 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103750
Approved by: https://github.com/soulitzer
2023-06-27 15:12:09 +00:00
cyy
d4a98280a8 [Reland] Use missing-prototypes in torch_cpu (#104138)
This PR enables Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal,vulkan backends and caffe2 sources.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104138
Approved by: https://github.com/albanD, https://github.com/malfet
2023-06-26 22:53:43 +00:00
PyTorch MergeBot
7274582390 Revert "sparse_mask: backward support for sparse lhs (#95165)"
This reverts commit f090fdf3b4.

Reverted https://github.com/pytorch/pytorch/pull/95165 on behalf of https://github.com/huydhn due to Sorry for reverting this. I think one of the tests test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128 is failing on slow gradcheck f090fdf3b4 ([comment](https://github.com/pytorch/pytorch/pull/95165#issuecomment-1604696109))
2023-06-23 18:40:15 +00:00
Nikita Vedeneev
f090fdf3b4 sparse_mask: backward support for sparse lhs (#95165)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95165
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-06-23 12:27:27 +00:00
PyTorch MergeBot
b5594f7df0 Revert "Use missing-prototypes in torch_cpu (#103725)"
This reverts commit 716b3b893d.

Reverted https://github.com/pytorch/pytorch/pull/103725 on behalf of https://github.com/osalpekar due to Broke caffe2 builds due. More info at [D46920675](https://www.internalfb.com/diff/D46920675) ([comment](https://github.com/pytorch/pytorch/pull/103725#issuecomment-1603129273))
2023-06-22 18:30:31 +00:00
cyy
716b3b893d Use missing-prototypes in torch_cpu (#103725)
This PR enables  Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal backends.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103725
Approved by: https://github.com/albanD
2023-06-21 13:19:55 +00:00
Nikita Vedeneev
056d92e2a0 sparse.mm backward: performance improvements (#94991)
`torch.sparse.mm` - faster and without syncs in "most" cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94991
Approved by: https://github.com/Skylion007, https://github.com/pearu, https://github.com/cpuhrsch
2023-06-12 20:57:29 +00:00
Nikita Karetnikov
2b3d955ffd [pt2] add meta and SymInt support for linalg_matrix_exp (#102945)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102945
Approved by: https://github.com/lezcano
2023-06-09 22:45:16 +00:00
Nikita Karetnikov
4bda4a7e4d [pt2] add meta for lu_unpack (#102937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102937
Approved by: https://github.com/lezcano
2023-06-06 08:06:53 +00:00
Li-Huai (Allan) Lin
1c2dfdf30c Add renorm forward-ad (#100798)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100798
Approved by: https://github.com/soulitzer
2023-06-05 20:25:35 +00:00
Nikita Karetnikov
995ac703cd [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-27 11:10:47 +00:00
PyTorch MergeBot
da3aba1e46 Revert "[pt2] add SymInt support for linalg.pinv (#102367)"
This reverts commit 0d5b74da0c.

Reverted https://github.com/pytorch/pytorch/pull/102367 on behalf of https://github.com/kit1980 due to Broke slow tests https://github.com/pytorch/pytorch/actions/runs/5095190248/jobs/9160028124 ([comment](https://github.com/pytorch/pytorch/pull/102367#issuecomment-1565104562))
2023-05-27 00:33:42 +00:00
Nikita Karetnikov
0d5b74da0c [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-26 15:20:34 +00:00
Nikita Karetnikov
9eb1748b2b [pt2] add meta and SymInt support for linalg_lu (#101372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
a8964d6377 [pt2] add meta and SymInt support for linalg_householder_product (#101315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
yanbing-j
36d91b5513 Add differentiable mkldnn_rnn_layer_backward to support double backward of LSTM (#100627)
### Description

This PR is to fix #99413, which shows the limitation of double backward using oneDNN in LSTM.

This PR does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements mkldnn_rnn_layer_backward using differentiable operations, so that double backward can be done automatically.

During backward process, it needs to use gates and hidden states between cells during one layer. However, these middle variables are stored in the `workspace`, and it is hard to figure them out. Therefore, in backward, we need re-calculate them first.

Corresponding UT has been added based on the failing case in # 99413. The UT with gradcheck and gradgradcheck which is added in https://github.com/pytorch/pytorch/pull/26660 cannot test LSTM using oneDNN, because UT only supports `double` datatype, while oneDNN does not support it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100627
Approved by: https://github.com/jgong5, https://github.com/soulitzer
2023-05-09 12:58:57 +00:00
Nikita Karetnikov
6eb0d7541d [pt2] add SymInt support for linalg_qr_backward (#100833)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100833
Approved by: https://github.com/ezyang
2023-05-08 13:48:25 +00:00
Li-Huai (Allan) Lin
6af509860e Add logcumsumexp forward-ad (#100629)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 8bb6158</samp>

This pull request adds forward and backward AD support for the `logcumsumexp` operator in functorch, a library for composable function transformations. It implements a forward-mode formula and a decomposition in `derivatives.yaml`, a C++ function for computing directional derivatives in `FunctionsManual.cpp`, and updates the tests and metadata in `test_ops.py` and `common_methods_invocations.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100629
Approved by: https://github.com/soulitzer
2023-05-06 04:08:55 +00:00
PyTorch MergeBot
d66add688f Revert "Add logcumsumexp forward-ad (#100629)"
This reverts commit d658c62677.

Reverted https://github.com/pytorch/pytorch/pull/100629 on behalf of https://github.com/clee2000 due to broke slow test, see above comment for details ([comment](https://github.com/pytorch/pytorch/pull/100629#issuecomment-1536575442))
2023-05-05 17:42:35 +00:00
Nikita Karetnikov
37f1be041a [pt2] enable svd in fake_tensor (#100130)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-05 06:27:59 +00:00
Li-Huai (Allan) Lin
d658c62677 Add logcumsumexp forward-ad (#100629)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100629
Approved by: https://github.com/soulitzer
2023-05-05 02:21:27 +00:00
Li-Huai (Allan) Lin
cb569dbccd Fix cat forward-AD tests (#99596)
Fixes #94115

Not sure where to add the test. There is an existing sample input but apparently doesn't fail any test.

6580b160d3/torch/testing/_internal/common_methods_invocations.py (L2043)

Edited: Found the skipper and xfailed some failures, which are pre-existing and unrelated to the fix in question. (Those failures are of gradgrad check, while the fix is of forward-AD).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99596
Approved by: https://github.com/soulitzer
2023-04-27 15:21:26 +00:00
cyy
dbc7e919b8 add Wmissing-prototypes to clang-tidy (#96805)
This PR introduces **-Wmissing-prototypes** of clang-tidy to prevent further coding errors such as the one fixed by PR #96714.

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at fd2cf2a</samp>

This pull request makes several internal functions static to improve performance and avoid name clashes. It also fixes some typos, formatting, and missing includes in various files. It adds a new .clang-tidy check to warn about missing prototypes for non-static functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96805
Approved by: https://github.com/malfet, https://github.com/albanD
2023-04-25 18:20:36 +00:00
Nikita Karetnikov
21681f36f4 [pt2] add SymInt support for fft ops (#99115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99115
Approved by: https://github.com/ezyang
2023-04-15 18:01:39 +00:00
Nikita Karetnikov
f89b7c2bec [pt2] add SymInt support for roll (#99114)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99114
Approved by: https://github.com/ezyang
2023-04-15 18:01:39 +00:00
Li-Huai (Allan) Lin
ca791b6909 [MPS] Add higher order derivatives warning to max_pool2d (#98582)
The higher order derivatives calculations of `max_pool2d` require indices provided, but `mps_max_pool2d` kernel doesn't calculate it. If we calculate indices during back propagations afterwards, that would be expensive and unnecessary since users can directly call `max_pool2d` with `return_indices=True`, which calculates `indices` along.

This PR adds a warning for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98582
Approved by: https://github.com/soulitzer
2023-04-11 18:03:46 +00:00
kshitij12345
ffd76d11c9 [fix] take : backward batching rule (#95772)
Fixes https://github.com/pytorch/pytorch/issues/95738

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95772
Approved by: https://github.com/zou3519
2023-03-30 17:18:17 +00:00
Li-Huai (Allan) Lin
7776653a0c Add linear gradgrad (#97151)
Fixes #92206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97151
Approved by: https://github.com/albanD
2023-03-30 07:25:02 +00:00
Edward Z. Yang
32fdd44577 SymIntify maybe_multiply (#97675)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97675
Approved by: https://github.com/albanD
2023-03-27 23:20:23 +00:00
Joel Schlosser
0d66db1b2a Implement last dim split_with_sizes for NT (forward only, non-SymInt-ified) (#97446)
This is needed for the HSTU model.

Details:
* ~~NT `chunk` now calls into NT `split_with_sizes` since the latter is more general~~ (removed; they're totally separate)
* Throws for backward
* Only operates over the last dim (`dim=-1`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97446
Approved by: https://github.com/cpuhrsch
2023-03-23 22:17:06 +00:00
Driss Guessous
98a5cf090d [SDPA] Remove the chunk_grad from mem-eff attention (#96880)
# Summary

There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized.

I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880
Approved by: https://github.com/cpuhrsch
2023-03-17 21:28:25 +00:00
Driss Guessous
7ec0d6f006 Moves SDPA backward helper native function to functionsmanual.cpp (#95821)
## Summary
chunk_grad_outputs should have been created within functionsmanual.cpp to begin with. This removes it as a native function and adds to its appropriate home.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95821
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-03-14 17:49:07 +00:00
Nikita Shulga
82daf98151 [Sparse] Move SparseTensorUtils.* to native/ (#96696)
Fixes internal linking problem after `DECLARE_DISPATCH` was introduced in SparseTensorUtils.cpp, but implemented inside the native library.

Also, fix `sign-unsigned` compare in `_flatten_indices_impl`
Followups:
 Move code declared/implemented in `SparseTensorUtils.*` to `at::native` namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96696
Approved by: https://github.com/albanD
2023-03-14 02:56:52 +00:00
Kazuaki Ishizaki
69aa6b4bb9 fix typo in comments under torch/csrc/autograd (#96061)
This PR fixes typos in comments of `.cpp` and `.h` files under `torch/csrc/autograd` directory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96061
Approved by: https://github.com/soulitzer
2023-03-06 18:05:14 +00:00
Masaki Kozuki
49f6849f58 Fix codegen logic for foreach derivatives (#95263)
follow-up https://github.com/pytorch/pytorch/pull/93901.

Unexpected numerical mismatches observed in some foreach functions' backward result seemed to be caused by the wrong order of `IndexRangeGenerator::range` call.
This pr has `args_with_derivatives` have the same or similar order of `foreach_native_function.func.arguments.flat_non_out`

---

what the current master generates for `_foreach_mul.List`:
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto other_ix = gen.range(other_size_);
  auto self_ix = gen.range(self_size_);
  variable_list grad_inputs(gen.size());
  auto other = unpack_list(other_);
  auto self = unpack_list(self_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}
```

with this PR the generated backward is
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto self_ix = gen.range(self_size_);                                         <----- diff
  auto other_ix = gen.range(other_size_);                                       <----- diff
  variable_list grad_inputs(gen.size());
  auto self = unpack_list(self_);
  auto other = unpack_list(other_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}

```

The change is to fix the order of `self_ix` and `other_ix`.[](url)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95263
Approved by: https://github.com/soulitzer
2023-03-04 20:03:54 +00:00
Edward Z. Yang
fb10e66d35 Bulk convert numel() to sym_numel() in FunctionsManual (#95543)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95543
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2023-02-27 13:46:13 +00:00
Peter Bell
bc438af6fe std/var: support floating point correction value (#94073)
Ref https://github.com/pytorch/pytorch/issues/61492#issuecomment-1413003480

The array API specifies correction to be `Union[int, float]` while we currently only support integers.
https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html

As std/var is calculated currently, the final count of elements is already done
in floating point so we can make the correction floating point without any loss
of precision or generality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94073
Approved by: https://github.com/ezyang
2023-02-23 05:50:45 +00:00
Li-Huai (Allan) Lin
b6a1c238bd [MPS] Remove mps specialized path in BCE backward (#95220)
Remove mps specialized path in BCE backward as `logit` op has been implemented for mps.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95220
Approved by: https://github.com/soulitzer
2023-02-22 19:43:53 +00:00
kshitij12345
311b20aae1 [fix] torch.pow handle real negative base and complex exponent (#95198)
Fixes https://github.com/pytorch/pytorch/issues/89903 https://github.com/pytorch/pytorch/issues/95111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95198
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-02-21 18:36:20 +00:00
min-jean-cho
92f3feabaa fix torch.var backward when n==correction (#94546)
Fixes #94184

This PR, as discussed in [comment ](https://github.com/pytorch/pytorch/issues/94184#issuecomment-1422128166),  returns `x.grad` of same shape as `x`, and filled with `NaN` when the gradient of `torch.var(unbiased=True)` is `NaN`. The gradient of unbiased variance is `NaN` (undefined, divide by zero in the denom `N-1`, where `N` is the number of samples) when `N` is 1 (i.e., there's one sample only -- product of dim is 1 such as `[1]`, `[1,...,1]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94546
Approved by: https://github.com/soulitzer
2023-02-13 23:38:38 +00:00
Yanming Wang
9bef1ebb9e Fix div by fp64 scalar issue on xla device (#94459)
This PR fixes https://github.com/pytorch/xla/issues/4574. I'll create a separate test PR in pytorch/xla repo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94459
Approved by: https://github.com/ezyang
2023-02-10 17:57:47 +00:00
cyy
1a32db15e7 Some performance fixes (#94034)
Applies some performance fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034
Approved by: https://github.com/Skylion007
2023-02-04 02:17:48 +00:00
Nikita Vedeneev
b484d17c24 _sparse_coo_tensor_with_dims_and_tensors backward: simplify and optimize (#91704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91704
Approved by: https://github.com/albanD, https://github.com/cpuhrsch
2023-02-01 09:02:25 +00:00
cyy
4d51c8532c Some simple fixes (#93221)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93221
Approved by: https://github.com/Skylion007
2023-01-30 05:14:03 +00:00
Aaron Gokaslan
0247ed27cc Apply Clang-Tidy readability-container-size-empty (#93236)
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
2023-01-29 23:28:19 +00:00
mfkasim1
75cfc0be21 Logcumsumexp for CPU (#93153)
Partial work from #90847, in the direction of solving #89205.
Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot.

tag: @albanD, @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93153
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-01-27 22:29:33 +00:00
cyy
e292ddff4e More clang-tidy fixes (#92944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92944
Approved by: https://github.com/Skylion007
2023-01-25 19:11:51 +00:00
PyTorch MergeBot
9b23fd378f Revert "Logcumsumexp for complex in CPU and CUDA (#90847)"
This reverts commit 64985123e4.

Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
Aaron Gokaslan
8c8cd9539d Add missing moves to torch autograd (#92772)
Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772
Approved by: https://github.com/ezyang
2023-01-24 02:01:52 +00:00
Nikita Vedeneev
9f381c9b7f sparse_sparse_matmul: simplify backward (#91712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712
Approved by: https://github.com/albanD
2023-01-23 19:24:28 +00:00
mfkasim1
64985123e4 Logcumsumexp for complex in CPU and CUDA (#90847)
Another PR towards solving #89205.
What's in this PR:

* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers

What's missing:

* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
Peter Bell
4058dedf21 Replace log(1 + x) with log1p(x) (#92114)
`log1p` offers better precision near zero since `(1 + x) - 1` truncates any
values less than the float epsilon to zero. For `soft_margin_loss` this also
requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup
on CUDA and a 1.1x speedup on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-01-18 10:43:56 +00:00
Peter Bell
fb1427ea8f squeeze: allow squeezing multiple dimensions at once (#89017)
Ref #70924

This addresses part 1 of the issue, allowing `torch.squeeze` to be
passed a tuple of dimensions. e.g.
```python
x.squeeze(0).squeeze(0)
```
can now be written
```python
x.squeeze((0, 1))
```
(assuming x has at least 2 dimensions)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017
Approved by: https://github.com/albanD
2023-01-17 14:20:15 +00:00
David Berard
d7dc1c2fd5 Support zero dimensions in softmax decompositions (#91322)
The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including:
* decompositions & refs (this was causing dynamo failures)
* forward AD for logsumexp
* MPS log_softmax_backward

This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos.

example of "computation along zero dimensions":

```python
# example of where
import torch

t = torch.rand((4, 0, 0))
print("~")
print(torch.nn.functional.softmax(t, dim=-1))  # this passes
print("~")
torch._refs.softmax(t, dim=-1)  # this fails
print("~")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322
Approved by: https://github.com/lezcano
2023-01-11 09:35:43 +00:00
PyTorch MergeBot
df4b3b13bc Revert "squeeze: allow squeezing multiple dimensions at once (#89017)"
This reverts commit e26cb06681.

Reverted https://github.com/pytorch/pytorch/pull/89017 on behalf of https://github.com/mehtanirav due to Internal breakages
2023-01-05 19:25:08 +00:00
Peter Bell
e26cb06681 squeeze: allow squeezing multiple dimensions at once (#89017)
Ref #70924

This addresses part 1 of the issue, allowing `torch.squeeze` to be
passed a tuple of dimensions. e.g.
```python
x.squeeze(0).squeeze(0)
```
can now be written
```python
x.squeeze((0, 1))
```
(assuming x has at least 2 dimensions)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017
Approved by: https://github.com/albanD
2023-01-04 14:40:56 +00:00
lezcano
d5163f5206 Fix NumPy broadcasting in lstsq_backward (#91460)
Fixes https://github.com/pytorch/pytorch/issues/77225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91460
Approved by: https://github.com/albanD
2022-12-30 10:49:20 +00:00
lezcano
051d16a2f7 Fix NumPy-compat broadcasting in the derivative of linalg.solve (#91456)
Fixes https://github.com/pytorch/pytorch/issues/89761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91456
Approved by: https://github.com/albanD
2022-12-30 10:49:20 +00:00
lezcano
484dd40022 Implement PReLU in a compositional way (#91238)
The PReLU implementation was all over the place. This lead to a number
of bugs like https://github.com/pytorch/pytorch/issues/68760.  We fix it by:
- Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel
- This second kernel is just a good-ol' pointwise kernel.
- We implement the derivative for the pointwise kernel via TI as well for speed.
- We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally

This fixes a number of issues:
- We don't perform copies any more when the inputs are not contiguous
- The derivatives are now correct
- We fix vmap and many other functorch-related issues.
- CPU and CUDA now share the relevant broadcasting logic
- The implementation is about 1/3 the length.

Fixes https://github.com/pytorch/pytorch/issues/68760
Fixes https://github.com/pytorch/pytorch/issues/89895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238
Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD
2022-12-30 10:42:30 +00:00
lezcano
5b223c43ec Avoid calling allclose in the backward if there are tensor subclasses (#91444)
`allclose` it's data-dependent (returns a bool) so it does not play well
with functorch. We are skipping that check in the context of subclasses
to avoid hard errors.

Partially fixes https://github.com/pytorch/pytorch/issues/90499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91444
Approved by: https://github.com/albanD
2022-12-28 19:12:50 +00:00
Nikita Karetnikov
cc11edb084 [aot_autograd] symintify logsumexp (#91442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91442
Approved by: https://github.com/albanD
2022-12-28 18:06:26 +00:00
Nikita Vedeneev
3870a9e28d to_sparse_XXX: backward support (#90281)
As per title. Fixes https://github.com/pytorch/pytorch/issues/85226

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90281
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2022-12-14 09:05:17 +00:00
soulitzer
98a9235dce Fix prelu ref when a.ndim < 2 (#89809)
Fixes https://github.com/pytorch/pytorch/issues/89560

Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures:
- forward AD (fixed in this PR)
- vmap (filed https://github.com/pytorch/pytorch/issues/89895)
- ref/meta (fixed this PR, though this also regresses nvFuser support)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809
Approved by: https://github.com/ngimel
2022-12-12 23:55:31 +00:00
Aaron Gokaslan
7541c9f8be [Fix]: remove unnecessary copies in aten, c10, and torch bindings (#90629)
Applies various automated fixes that reduces the number of spurious copies in torch, aten, and c10. I also inlined any default dtors that would have made the type trivially destructible.

Follow up to #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90629
Approved by: https://github.com/ezyang
2022-12-12 17:05:52 +00:00
Richard Zou
4b1053497c [vmap] Prepend "legacy" to files for old vmap implementation (#90324)
We have an older torch.vmap implementation. It is no longer supported.
It still needs to exist somewhere for the sake of BC with
torch.autograd.functional.

This PR makes it clear what files are meant for implementing the old
vmap implementation. I've seen a couple of PRs recently adding support
for the old vmap implementation, so this will lessen the confusion.

Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90324
Approved by: https://github.com/samdow
2022-12-07 18:46:15 +00:00
Nikita Karetnikov
4cb6bbbe27 Symintify embedding (#89327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
Approved by: https://github.com/ezyang
2022-11-24 03:25:00 +00:00
Andrew M. James
a41f70603a Round out rad2deg sparse support (#88442)
- Add sparse coo dispatch
- Modify backward to work with sparse compressed layouts
- Enable sparse_compressed autograd testing
- Correct layout support attributes on OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442
Approved by: https://github.com/cpuhrsch
2022-11-17 06:00:23 +00:00
Kazuaki Ishizaki
e0c194f10b Fix typos in messages under torch (#88961)
This PR fixes typos of messages and parms in c++ source and head files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961
Approved by: https://github.com/albanD
2022-11-14 19:06:41 +00:00
Brian Hirsh
a16ced03c9 reland "fix as_strided_scatter_backward (#87646)" (#88342)
This reverts commit 71fb763e54.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88342
Approved by: https://github.com/zou3519
2022-11-07 15:00:58 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
6938dd0b2c Support sparse inputs to deg2rad (#88156)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
PyTorch MergeBot
71fb763e54 Revert "fix as_strided_scatter_backward (#87646)"
This reverts commit f9d7985851.

Reverted https://github.com/pytorch/pytorch/pull/87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk 70782981f0
2022-11-02 16:54:36 +00:00
Brian Hirsh
f9d7985851 fix as_strided_scatter_backward (#87646)
as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory.

Fixes https://github.com/pytorch/pytorch/issues/88105

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87646
Approved by: https://github.com/albanD
2022-11-02 14:36:49 +00:00
albanD
8a9aca7b8d Reland 2 Many symintifications (#87604) (#87980)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980
Approved by: https://github.com/ezyang
2022-10-28 13:40:11 +00:00
PyTorch MergeBot
8b4d95759c Revert "Many symintifications (#87604)"
This reverts commit 777e6a2c51.

Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2022-10-28 03:00:11 +00:00
albanD
777e6a2c51 Many symintifications (#87604)
Adds
expand_inplace
conv conv_double_backward
convolution
adaptive_avg_pool2d_symint
_embedding_bag_backward_symint
cudnn_grid_sampler
cuda 32 bit indexing
nll_loss / nll_loss_2d
tensor split
pooling same mode
cudnn_is_acceptable
storage nbytes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604
Approved by: https://github.com/ezyang
2022-10-26 17:33:53 +00:00
albanD
12b2f70a89 Symintify pad ops (#87046)
Following comments below, we need to add support for `std::negate`/`std::min`/`std::max`/`operator-` for SymInt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87046
Approved by: https://github.com/ezyang
2022-10-19 21:43:08 +00:00
Nikita Vedeneev
f2ec9fbd03 torch.ormqr: backward support (#86800)
Seems good to have, especially when neither `a` nor `tau` requires grads and/or they are pretty small in number.
Fixes https://github.com/pytorch/pytorch/issues/86267

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86800
Approved by: https://github.com/lezcano
2022-10-18 09:07:35 +00:00
albanD
3a4c0900c7 Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795)
Take 3
Contains:
- symintification of split*
- floor support on SymFloat
- pad_backward, gather, scatter meta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795
Approved by: https://github.com/z-a-f
2022-10-17 02:09:40 +00:00
Brian Hirsh
34c86adec4 symintify all of derivatives.yaml (#86610)
Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing.

* with the exception of `split()`, which is tougher to land because it requires internal changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610
Approved by: https://github.com/albanD
2022-10-14 20:15:48 +00:00
albanD
66cab5245f Reland 2 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86797)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86797
Approved by: https://github.com/bdhirsh
2022-10-13 00:31:19 +00:00
PyTorch MergeBot
2aa981ab74 Revert "Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488)"
This reverts commit 978b46d7c9.

Reverted https://github.com/pytorch/pytorch/pull/86488 on behalf of https://github.com/osalpekar due to Broke executorch builds internally with the following message: RuntimeError: Missing out variant for functional op: aten::split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] . Make sure you have loaded your custom_ops_generated_lib
2022-10-11 23:39:50 +00:00
PyTorch MergeBot
811b8e012b Revert "min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643)"
This reverts commit 86f914e996.

Reverted https://github.com/pytorch/pytorch/pull/86643 on behalf of https://github.com/osalpekar due to Need to revert this to cleanly revert https://github.com/pytorch/pytorch/pull/86488. This should be safe to re-land later
2022-10-11 23:12:40 +00:00
albanD
86f914e996 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86643
Approved by: https://github.com/anjali411
2022-10-11 17:37:30 +00:00
albanD
978b46d7c9 Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488)
symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops

add meta_bernoulli_

meta kernel for at::gather

get pytorch_struct to pass: meta for scatter_add, fix backward

symintify split ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86488
Approved by: https://github.com/ezyang
2022-10-10 15:54:28 +00:00
anjali411
c89d286af6 symintify unbind_backward and tensor_split (#86357)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86357
Approved by: https://github.com/albanD
2022-10-09 16:25:55 +00:00
Edward Z. Yang
33f0e98a49 Re-land*4 "SymIntify cat and narrow" (#86468)
This re-lands https://github.com/pytorch/pytorch/pull/86289 but with more wrappers.

Contains implicit inclusion of <ATen/native/NonSymbolicBC.h> in internal usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86468
Approved by: https://github.com/albanD
2022-10-08 07:17:37 +00:00
PyTorch MergeBot
65b408074f Revert "Relandx3 "SymIntify cat and narrow" (#86289)"
This reverts commit a00f8489df.

Reverted https://github.com/pytorch/pytorch/pull/86289 on behalf of https://github.com/malfet due to @seemether  unlanded the rest of the stack and it will fail intern import anyway
2022-10-07 16:29:27 +00:00
PyTorch MergeBot
75df4b5e3d Revert "Merge more symbolic meta kernels and symint changes from branch (#86334)"
This reverts commit 08e3999fa4.

Reverted https://github.com/pytorch/pytorch/pull/86334 on behalf of https://github.com/seemethere due to Trying to revert https://github.com/pytorch/pytorch/pull/86207, this PR causes merge conflicts with the initial revert so will have to revert this as well
2022-10-07 16:03:30 +00:00
Edward Z. Yang
a00f8489df Relandx3 "SymIntify cat and narrow" (#86289)
This reverts commit fc94a2115b.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86289
Approved by: https://github.com/wconstab
2022-10-07 14:04:10 +00:00
PyTorch MergeBot
2110c89443 Revert "Revert "Revert "SymIntify cat and narrow (#86191)"" (#86289)"
This reverts commit e778fbf519.

Reverted https://github.com/pytorch/pytorch/pull/86289 on behalf of https://github.com/seemethere due to Fails internal tests see: https://www.internalfb.com/intern/sandcastle/job/27021598552487548/
2022-10-07 05:20:36 +00:00
Brian Hirsh
08e3999fa4 Merge more symbolic meta kernels and symint changes from branch (#86334)
symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops

add meta_bernoulli_

meta kernel for at::gather

get pytorch_struct to pass: meta for scatter_add, fix backward

symintify split ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86334
Approved by: https://github.com/ezyang
2022-10-06 23:29:04 +00:00
Pearu Peterson
8f2c2167d4 Support autograd on sparse_mm in full. (#86301)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86301
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:31 +00:00
Pearu Peterson
f104490d63 Support autograd on Linear with sparse compressed weight. (#86137)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86137
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:25 +00:00
Edward Z. Yang
e778fbf519 Revert "Revert "SymIntify cat and narrow (#86191)"" (#86289)
This reverts commit fc94a2115b.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86289
Approved by: https://github.com/wconstab
2022-10-05 20:51:28 +00:00
PyTorch MergeBot
fc94a2115b Revert "SymIntify cat and narrow (#86191)"
This reverts commit 63d8d4f6ec.

Reverted https://github.com/pytorch/pytorch/pull/86191 on behalf of https://github.com/seemethere due to Fails internal tests, see [D40106464](https://www.internalfb.com/diff/D40106464)
2022-10-05 17:19:55 +00:00
Will Constable
63d8d4f6ec SymIntify cat and narrow (#86191)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86191
Approved by: https://github.com/ezyang
2022-10-05 14:46:55 +00:00
Edward Z. Yang
79dd621f76 Symbolic shapes mega merge PR (Oct 3) (#86160)
- TensorGeometry supports symint
- check_size supports symint
- functorch batch rule improved symint
- Some operator support for symint in LTC
- More supported operations on SymInt and SymFloat
- More symint support in backwards formulas

This merge includes code contributions from bdhirsh and anjali411.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86160
Approved by: https://github.com/Chillee
2022-10-04 04:12:09 +00:00