Commit Graph

186 Commits

Author SHA1 Message Date
cyy
ecbe82b9ce Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-22 03:49:31 +00:00
Joel Schlosser
cd6bfc7965 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-22 02:12:36 +00:00
PyTorch MergeBot
224beecee6 Revert "Proper view support for jagged layout NestedTensor (#113279)"
This reverts commit 5855c490f0.

Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))
2024-03-21 22:03:01 +00:00
Joel Schlosser
5855c490f0 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-20 23:45:34 +00:00
PyTorch MergeBot
c0996866f4 Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit 4305c64fea.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164))
2024-03-08 20:01:03 +00:00
cyy
4305c64fea Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
Isuru Fernando
c3496d50f0 Fix torch.return_types init signature (#119284)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119284
Approved by: https://github.com/peterbell10, https://github.com/XuehaiPan
2024-02-23 21:52:34 +00:00
Yu, Guangye
5c46600f84 [RELAND] refactor lazy init to device-agnostic (#119248)
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.

# Design
We maintain a flag for each backend to manage the lazy initialization state separately.

# Additional Context
No need more UTs.
This is a reland PR, the original PR is [refactor lazy init to device-agnostic](https://github.com/pytorch/pytorch/pull/118846).
This is a common PR, and does not trigger xpu ciflow.

Differential Revision: [D53478332](https://our.internmc.facebook.com/intern/diff/D53478332)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119248
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/atalman
2024-02-07 15:58:51 +00:00
PyTorch MergeBot
ab613a4019 Revert "refactor lazy init to device-agnostic (#118846)"
This reverts commit 520771d7b3.

Reverted https://github.com/pytorch/pytorch/pull/118846 on behalf of https://github.com/atalman due to Failing, tests https://github.com/pytorch/torchdistx/blob/main/src/python/torchdistx/_C/fake.cc#L11  ([comment](https://github.com/pytorch/pytorch/pull/118846#issuecomment-1927651305))
2024-02-05 18:06:30 +00:00
Yu, Guangye
520771d7b3 refactor lazy init to device-agnostic (#118846)
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.

# Design
We maintain a flag for each backend to manage the lazy initialization state separately.

# Additional Context
No need more UTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118846
Approved by: https://github.com/malfet
2024-02-02 12:10:39 +00:00
Aaron Gokaslan
1562dae62c [BE]: Apply RUF025 dict.fromkeys preview rule (#118637)
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
Edward Z. Yang
46712b019d Enable local_partial_types (#118467)
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432
2024-01-28 13:38:22 +00:00
albanD
24133e44b1 Fix return type hint for list types (#118238)
All single element list types are `Tensor[]` so they will always be Tuple.
I don't know of any way to easily access the pyi type and compare that to a real run so no testing here :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118238
Approved by: https://github.com/ezyang
2024-01-25 23:35:20 +00:00
Joel Schlosser
16d69290c6 Use view name instead of view_copy name for functional inverses (#117056)
Ex: `unsqueeze_copy_inverse()` -> `unsqueeze_inverse()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117056
Approved by: https://github.com/bdhirsh
2024-01-10 00:52:36 +00:00
Joel Schlosser
52f0457d7d Support view returns for functional inverses on narrowing views (#115893)
Part 1 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw).

The following functional inverses are currently implemented scatter-style and thus never return views:
* `as_strided_copy_inverse()`
* `diagonal_copy_inverse()`
* `expand_copy_inverse()`
* `select_copy_int_inverse()`
* `slice_copy_Tensor_inverse()`
* `split_copy_Tensor_inverse()`
* `split_with_sizes_copy_inverse()`
* `unbind_copy_int_inverse()`
* `unfold_copy_inverse()`

We need to get actual views for the introduction of reverse view funcs coming next.

Details:
* Use `as_strided()` to implement actual view inverses for the above
    * Assumes we're given a mutated_view that is actually part of a bigger storage; this isn't really the case for functionalization
* Introduce `InverseReturnMode` enum for customization of functional inverses
    * `AlwaysView` - always return an actual view; needed for reverse view_funcs()
    * `NeverView` - always do a copy; useful for certain functionalization use cases (e.g. XLA, executorch)
    * `ViewOrScatterInverse` - return an actual view in most cases, but prefer scatter inverses when they exist. this avoids the need to implement `as_strided()` for subclasses, which can be difficult or impossible
* Make sure functionalization works as before
    * Use `ViewOrScatterInverse` when reapply_views TLS is True or `NeverView` otherwise
    * Adds tests to ensure old behavior for above inverses **in functionalization**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115893
Approved by: https://github.com/bdhirsh
2023-12-21 21:39:22 +00:00
Aaron Gokaslan
ee5d981249 [BE]: Enable RUFF PERF402 and apply fixes (#115505)
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
cdzhan
99554112d3 [pytorch] add namespace for optTypeMetaToScalarType in codegen to avoid not declared when compile (#115623)
Fixes compilation failure in some environment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115623
Approved by: https://github.com/albanD
2023-12-13 00:59:01 +00:00
Antonio Kim
7fc292930c Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-21 23:07:21 +00:00
Edward Z. Yang
8c4812be80 Replace expect_int with guard_int (#113921)
The idea is that instead of erroring, we will just specialize at these sites.

Fixes https://github.com/pytorch/pytorch/issues/113142

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921
Approved by: https://github.com/zou3519
2023-11-20 21:27:48 +00:00
Brian Vaughan
dbb96ef30d improve annotation device parameters where a device ordinal is allowed (#113647)
Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal.

`error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str | device"  [arg-type]`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647
Approved by: https://github.com/albanD
2023-11-17 14:41:22 +00:00
Jane Xu
deec2380c7 Add 0dim Tensor overload for _foreach_div (#113688)
This PR is ALMOST basically just following the steps from #106677 EXCEPT! We do add one feature. Similar to fused_adam(w), for the CUDA dispatches: when the scalar tensor is on CPU, we .item and redispatch to the normal scalar overload. Otherwise, the cuda kernel will complain about mismatch in devices between the scalar and the tensors.

Why do we add this feature? Our optimizers want to allow lr as a tensor, and lr could be a CPU tensor. lr is used with foreach_div_ in Adam, so our CI will break otherwise.

After this PR, `_foreach_mul` and `_foreach_div` will accept either a CPU or a GPU tensor for the scalar tensor (vs only a GPU tensor). They join the ranks of `fused_adam(w)` in this characteristic. I did not yet do the same thing for foreach_add (the only other foreach op with a .Tensor overload) because there is no use case and will be more involved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113688
Approved by: https://github.com/mlazos, https://github.com/albanD
2023-11-15 20:59:32 +00:00
George White
6c187246d6 Add support for float8_e4m3fnuz and _e5m2fnuz (#107586)
This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit). It has been based on #104242 which adds similar float8 types.

These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586
Approved by: https://github.com/seemethere, https://github.com/malfet
2023-11-15 15:01:11 +00:00
PyTorch MergeBot
252e68a83b Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 54493fe8c4.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))
2023-11-15 00:51:23 +00:00
Antonio Kim
54493fe8c4 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-13 23:18:14 +00:00
PyTorch MergeBot
9a28a7b498 Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 27e31ab6e8.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))
2023-11-07 15:53:32 +00:00
Antonio Kim
27e31ab6e8 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-06 21:27:02 +00:00
Mengwei Liu
19e9f5cc7b [torchgen] Add support for optional tensor (#112938)
Summary: As titled

Test Plan: rely on CI

Differential Revision: D50997957

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112938
Approved by: https://github.com/Skylion007
2023-11-06 20:03:05 +00:00
Aaron Gokaslan
1ad0f0b308 [BE]: remove unnecessary enumerate calls (#111690)
Remove unnecessary enumerate calls, entirely automated fixes so probably reasonably low risk.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111690
Approved by: https://github.com/malfet
2023-10-20 23:20:29 +00:00
Jane Xu
ca7d084ff9 Add ScalarTensor or 0dim overload for _foreach_add (#111079)
Adding a Tensor overload will allow us to:
- optimize in more cases than before
- increase coverage for scalarTensor instead of just scalars in our foreach APIs

The main complication in this PR was that add.Tensor has a scalar overload, so I've now built out support for that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111079
Approved by: https://github.com/albanD
2023-10-20 01:34:07 +00:00
Kazuaki Ishizaki
ac48c11ab7 Fix typo under torchgen directory (#111154)
This PR fixes typo in comments and messages in files under `torchgen` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111154
Approved by: https://github.com/rajveer43, https://github.com/Skylion007
2023-10-13 16:43:46 +00:00
isdanni
dede1e96e2 [BE] Enable Ruff's Flake8 PYI018 (#111101)
Enable [unused-private-type-var (PYI018)](https://docs.astral.sh/ruff/rules/unused-private-type-var/#unused-private-type-var-pyi018)

Link: #110950

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111101
Approved by: https://github.com/albanD
2023-10-12 16:26:21 +00:00
Edward Z. Yang
6a974bec5d Change flash attention outputs to be SymInt instead of int (#110533)
Fixes https://github.com/pytorch/pytorch/issues/110322

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
Fabrice Pont
053367b1ed fix: flake8-bugbear code B024 (#107265)
See #106571 item B024

This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes.

Should I also include PEP8 compliant reformatting on the files I had to modify ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265
Approved by: https://github.com/kit1980
2023-10-04 23:52:52 +00:00
cyy
d9fb7166d6 [BE] use DeviceIndex instead of int64_t for related device interfaces (#103068)
This PR unifies the device interfaces in aten/*cpp and torch/csrc/*cpp to use  **c10::DeviceIndex**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103068
Approved by: https://github.com/malfet
2023-08-25 20:16:14 +00:00
Masaki Kozuki
5814380e7b Revert "Revert "Reland "Add forward mode AD to out-place foreach functions (#102409) (#106043)""" (#106320)
Fixed a typo specifying the number of tensors and elements in the test having failed in slow gradcheck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106320
Approved by: https://github.com/soulitzer
2023-08-18 23:01:42 +00:00
PyTorch MergeBot
2b427ae3a7 Revert "Reland "Add forward mode AD to out-place foreach functions (#102409) (#106043)"
This reverts commit e773f28ee3.

Reverted https://github.com/pytorch/pytorch/pull/106043 on behalf of https://github.com/DanilBaibak due to Break slow tests ([comment](https://github.com/pytorch/pytorch/pull/106043#issuecomment-1658642734))
2023-07-31 15:50:36 +00:00
Masaki Kozuki
e773f28ee3 Reland "Add forward mode AD to out-place foreach functions (#102409) (#106043)
forward-mode AD of out-of-place foreach functions, finally.

rel:
- #102409
- #105504
- #58833
- #100695

---

# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  std::vector<bool> _any_has_forward_grad_result(self.size());
  for (const auto& i : c10::irange(self.size())) {
    _any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
  }
  std::shared_ptr<ForeachSinhBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = make_saved_variable_list(self);
    grad_fn->self_size_ = self.size();
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    if (_any_has_forward_grad_result[i]) {
        auto self_t_raw = toNonOptFwGrad(self[i]);
        auto self_tensor = toNonOptTensor(self[i]);
        auto self_t = (self_t_raw.defined() || !self_tensor.defined())
          ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
        auto self_p = toNonOptPrimal(self[i]);
        result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
    }
  }
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
    if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
      // The hardcoded 0 here will need to be updated once we support multiple levels.
      result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
    }
  }
  return result;
}

::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
  auto self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  std::vector<bool> _any_has_forward_grad_result(self.size());
  for (const auto& i : c10::irange(self.size())) {
    _any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
  }
  std::shared_ptr<ForeachNormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->ord = ord;
    grad_fn->self_ = make_saved_variable_list(self);
    grad_fn->self_size_ = self.size();
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    if (_any_has_forward_grad_result[i]) {
        auto self_t_raw = toNonOptFwGrad(self[i]);
        auto self_tensor = toNonOptTensor(self[i]);
        auto self_t = (self_t_raw.defined() || !self_tensor.defined())
          ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
        auto self_p = toNonOptPrimal(self[i]);
        result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
    }
  }
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
    if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
      // The hardcoded 0 here will need to be updated once we support multiple levels.
      result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
    }
  }
  if (grad_fn) {
    grad_fn->result = result;
  }
  return result;
}

```

# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
  std::shared_ptr<SinhBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value() &&
      !at::impl::dispatch_mode_enabled() &&
      !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
    TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
  }
  if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
    TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_result && (result.defined())) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_tensor = toNonOptTensor(self);
      auto self_t = (self_t_raw.defined() || !self_tensor.defined())
        ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
      auto self_p = toNonOptPrimal(self);
      result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
  }
  if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }
  return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
  auto& self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
  std::shared_ptr<NormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->p = p;
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value() &&
      !at::impl::dispatch_mode_enabled() &&
      !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
    TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
  }
  if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
    TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  throw_error_for_complex_autograd(result, "norm");
  c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_result && (result.defined())) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_tensor = toNonOptTensor(self);
      auto self_t = (self_t_raw.defined() || !self_tensor.defined())
        ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
      auto self_p = toNonOptPrimal(self);
      result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
  }
  if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }
  if (grad_fn) {
    grad_fn->result_ = SavedVariable(result, true);
  }
  return result;
}

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106043
Approved by: https://github.com/soulitzer
2023-07-27 03:13:24 +00:00
Alan Ji
70b0f1b248 fix some typos (#106018)
Fixes #ISSUE_NUMBER
Fix typos in `test_static_module.cc`, `backend_cutting_test.cc` and `types_base.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106018
Approved by: https://github.com/awgu
2023-07-26 18:14:44 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Amadeusz Skrzypczak
b64bd4a5dd Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
PyTorch MergeBot
f2b15772ff Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)"
This reverts commit a9804130e5.

Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284))
2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak
a9804130e5 Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
Justin Chu
964d29f312 [BE] Enable ruff's UP rules and autoformat torchgen/ (#105423)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105423
Approved by: https://github.com/Skylion007
2023-07-18 06:44:20 +00:00
PyTorch MergeBot
8958f041be Revert "Add forward mode AD to out-place foreach functions (#102409)"
This reverts commit e2ec0ba404.

Reverted https://github.com/pytorch/pytorch/pull/102409 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it is failing some tests in trunk e799f565eb ([comment](https://github.com/pytorch/pytorch/pull/102409#issuecomment-1615254393))
2023-06-30 22:46:57 +00:00
Masaki Kozuki
e2ec0ba404 Add forward mode AD to out-place foreach functions (#102409)
The major difference from in-place support is that some out-place functions have their derivatives spelled out in derivatives.yaml, which requires some changes in `load_derivatives.py` and some handlings in various places due to the others whose derivatives are generated by `torchgen`.

rel:
- #58833
- #100695

---

# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  std::vector<bool> _any_has_forward_grad_result(self.size());
  for (const auto& i : c10::irange(self.size())) {
    _any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
  }
  std::shared_ptr<ForeachSinhBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = make_saved_variable_list(self);
    grad_fn->self_size_ = self.size();
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    if (_any_has_forward_grad_result[i]) {
        auto self_t_raw = toNonOptFwGrad(self[i]);
        auto self_tensor = toNonOptTensor(self[i]);
        auto self_t = (self_t_raw.defined() || !self_tensor.defined())
          ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
        auto self_p = toNonOptPrimal(self[i]);
        result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
    }
  }
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
    if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
      // The hardcoded 0 here will need to be updated once we support multiple levels.
      result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
    }
  }
  return result;
}

::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
  auto self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  std::vector<bool> _any_has_forward_grad_result(self.size());
  for (const auto& i : c10::irange(self.size())) {
    _any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
  }
  std::shared_ptr<ForeachNormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->ord = ord;
    grad_fn->self_ = make_saved_variable_list(self);
    grad_fn->self_size_ = self.size();
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    if (_any_has_forward_grad_result[i]) {
        auto self_t_raw = toNonOptFwGrad(self[i]);
        auto self_tensor = toNonOptTensor(self[i]);
        auto self_t = (self_t_raw.defined() || !self_tensor.defined())
          ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
        auto self_p = toNonOptPrimal(self[i]);
        result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
    }
  }
  for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
    auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
    if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
      // The hardcoded 0 here will need to be updated once we support multiple levels.
      result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
    }
  }
  if (grad_fn) {
    grad_fn->result = result;
  }
  return result;
}

```

# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
  std::shared_ptr<SinhBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value() &&
      !at::impl::dispatch_mode_enabled() &&
      !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
    TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
  }
  if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
    TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_result && (result.defined())) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_tensor = toNonOptTensor(self);
      auto self_t = (self_t_raw.defined() || !self_tensor.defined())
        ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
      auto self_p = toNonOptPrimal(self);
      result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
  }
  if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }
  return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
  auto& self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
  std::shared_ptr<NormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->p = p;
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value() &&
      !at::impl::dispatch_mode_enabled() &&
      !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
    TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
    TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
  }
  if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
    TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  throw_error_for_complex_autograd(result, "norm");
  c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_result && (result.defined())) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_tensor = toNonOptTensor(self);
      auto self_t = (self_t_raw.defined() || !self_tensor.defined())
        ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
      auto self_p = toNonOptPrimal(self);
      result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
  }
  if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }
  if (grad_fn) {
    grad_fn->result_ = SavedVariable(result, true);
  }
  return result;
}

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102409
Approved by: https://github.com/soulitzer
2023-06-30 04:51:43 +00:00
SherlockNoMad
d997969b8b [Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#103107)
Differential Revision: D46459100

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103107
Approved by: https://github.com/angelayi, https://github.com/soulitzer
2023-06-12 19:18:49 +00:00
Masaki Kozuki
ba2bc7df8f Enable backward on _foreach_zero_ (#101149)
Currently torchgen cannot find an appropriate `DifferentiabilityInfo` for `_foreach_zero_` because `gen_foreach_derivativeinfo` doesn't correctly make use of `functional_info_by_signature` and `differentiability_infos`, and `is_reference_for_foreach` a bit too strict to `_foreach_zero_`.

Generated code in `VariableType`
```c++
void _foreach_zero_(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );

  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<ZeroBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<ZeroBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<ZeroBackward0>(new ZeroBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i] ));
              return grad_fn;
          }
      }());
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_zero_(ks & c10::after_autograd_keyset, self_);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
}
```

Rel:
- #58833
- #96405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101149
Approved by: https://github.com/soulitzer
2023-05-17 03:10:13 +00:00
Nikita Shulga
20cf42de2c Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.… (#100749)"
This reverts commit bb454891ed.
2023-05-16 18:17:02 -07:00
Edward Z. Yang
b94f143ace SymIntify convNd and conv_transposeNd, fix inductor symint handling (#101488)
Fixes https://github.com/pytorch/pytorch/issues/101014

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488
Approved by: https://github.com/ngimel
2023-05-16 17:46:52 +00:00
Sherlock Huang
bb454891ed [Reland] Add sym_size/stride/numel/storage_offset to native_function.… (#100749)
…yaml (#91… (#91919)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919 Approved by: https://github.com/ezyang

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92402

Reviewed By: ezyang

Differential Revision: D42565586

Pulled By: SherlockNoMad

fbshipit-source-id: 1c2986e45307e076d239836a1b45441a9fa3c9d9
ghstack-source-id: 969f4928486e04c57aaf98e20e3c3ca946c51613

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100749
Approved by: https://github.com/zhxchen17, https://github.com/albanD
2023-05-12 22:57:42 +00:00
Natalia Gimelshein
bfe5f5bbe1 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-08 16:19:18 +00:00
PyTorch MergeBot
c3aa59c8f5 Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196)"
This reverts commit 32615618e4.

Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Masaki Kozuki
6c934a89a7 Skip invalid grads in outplace foreachs' backward (#100256)
Fixes #100248
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100256
Approved by: https://github.com/soulitzer, https://github.com/albanD
2023-04-29 22:45:26 +00:00
Masaki Kozuki
674018903d per-Tensor grad_fn for in-place foreach functions (#96405)
Generate a `grad_fn` for each (tuple of) `Tensor`(s) of the same index for `_foreach_foo_` and each `grad_fn` is `FooBackward`.

The current status of foreach functions' backward support for the record:
- out-place: Implemented, but no optimized implementations like their forward path
- in-place: not implemented. I think this check 7eaaefafb3/torchgen/api/autograd.py (L309-L311) is partly responsible but the difference of signature between out-place and in-place (see https://github.com/pytorch/pytorch/pull/96405#discussion_r1154690940) would prevent in-place from using out-place versions (the logic is around 7eaaefafb3/torchgen/api/autograd.py (L495-L500))

```c++
void _foreach_abs_(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_abs_(ks & c10::after_autograd_keyset, self_);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      AT_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      AT_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
}
```

Related:
- #95431
- #95765 for multiple `grad_fn`s logic

---

Examples: outputs of `_foreach_add_.List`, `_foreach_addcmul_.ScalarList`, and `_foreach_exp`

```c++
void _foreach_addcmul__ScalarList(c10::DispatchKeySet ks, at::TensorList self, at::TensorList tensor1, at::TensorList tensor2, at::ArrayRef<at::Scalar> scalars) {
  auto self_ = unpack(self, "self", 0);
  auto tensor1_ = unpack(tensor1, "tensor1", 1);
  auto tensor2_ = unpack(tensor2, "tensor2", 2);
  auto _any_requires_grad = compute_requires_grad( self, tensor1, tensor2 );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<AddcmulBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i], tensor1[i], tensor2[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<AddcmulBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<AddcmulBackward0>(new AddcmulBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i], tensor1[i], tensor2[i] ));
              return grad_fn;
          }
      }());
    }
    if (!grad_fns.empty()) {

        for (const auto& i : c10::irange(grad_fns.size())) {
            auto grad_fn = grad_fns[i];
            if (grad_fn != nullptr) {
                grad_fn->self_scalar_type = self[i].scalar_type();
                grad_fn->tensor1_scalar_type = tensor1[i].scalar_type();
                if (grad_fn->should_compute_output(1)) {
                  grad_fn->tensor2_ = SavedVariable(tensor2[i], false);
                }
                grad_fn->value = scalars[i];
                if (grad_fn->should_compute_output(2)) {
                  grad_fn->tensor1_ = SavedVariable(tensor1[i], false);
                }
                grad_fn->tensor2_scalar_type = tensor2[i].scalar_type();
            }
        }
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> tensor1__storage_saved(tensor1_.size());
  for (const Tensor& tensor : tensor1_)
    tensor1__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> tensor1__impl_saved(tensor1_.size());
  for (size_t i=0; i<tensor1_.size(); i++)
    if (tensor1_[i].defined()) tensor1__impl_saved[i] = tensor1_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> tensor2__storage_saved(tensor2_.size());
  for (const Tensor& tensor : tensor2_)
    tensor2__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> tensor2__impl_saved(tensor2_.size());
  for (size_t i=0; i<tensor2_.size(); i++)
    if (tensor2_[i].defined()) tensor2__impl_saved[i] = tensor2_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_addcmul_(ks & c10::after_autograd_keyset, self_, tensor1_, tensor2_, scalars);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor1__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor1_))
      TORCH_INTERNAL_ASSERT(tensor1__storage_saved[i].value().is_alias_of(tensor1_[i].storage()));
  }
  for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor1__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor1_))
      TORCH_INTERNAL_ASSERT(tensor1__impl_saved[i] == tensor1_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor2__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor2_))
      TORCH_INTERNAL_ASSERT(tensor2__storage_saved[i].value().is_alias_of(tensor2_[i].storage()));
  }
  for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor2__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor2_))
      TORCH_INTERNAL_ASSERT(tensor2__impl_saved[i] == tensor2_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
}

```

```c++
void _foreach_add__List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) {
  auto self_ = unpack(self, "self", 0);
  auto other_ = unpack(other, "other", 1);
  auto _any_requires_grad = compute_requires_grad( self, other );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<AddBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i], other[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<AddBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<AddBackward0>(new AddBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i], other[i] ));
              return grad_fn;
          }
      }());
    }
    if (!grad_fns.empty()) {

        for (const auto& i : c10::irange(grad_fns.size())) {
            auto grad_fn = grad_fns[i];
            if (grad_fn != nullptr) {
                grad_fn->other_scalar_type = other[i].scalar_type();
                grad_fn->alpha = alpha;
                grad_fn->self_scalar_type = self[i].scalar_type();
            }
        }
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> other__storage_saved(other_.size());
  for (const Tensor& tensor : other_)
    other__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> other__impl_saved(other_.size());
  for (size_t i=0; i<other_.size(); i++)
    if (other_[i].defined()) other__impl_saved[i] = other_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_add_(ks & c10::after_autograd_keyset, self_, other_, alpha);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (other__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(other_))
      TORCH_INTERNAL_ASSERT(other__storage_saved[i].value().is_alias_of(other_[i].storage()));
  }
  for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (other__impl_saved[i] && !at::impl::tensorlist_has_dispatch(other_))
      TORCH_INTERNAL_ASSERT(other__impl_saved[i] == other_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
}

...

void _foreach_exp_(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<ExpBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<ExpBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<ExpBackward0>(new ExpBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i] ));
              return grad_fn;
          }
      }());
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_exp_(ks & c10::after_autograd_keyset, self_);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
  if (!grad_fns.empty()) {

      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              grad_fn->result_ = SavedVariable(self[i], true, self[i].is_view());
          }
      }
  }
}

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96405
Approved by: https://github.com/soulitzer
2023-04-28 00:55:04 +00:00
Edward Z. Yang
3a5427baf4 Add torch.utils._content_store (#99809)
Implements a simple content-addressable store for storages (with tensors implemented as cheap references on top), enabling incremental serialization of tensors to disk, which I intend to use in the accuracy repro extractor.  Check the comment at the top of torch/utils/_content_store.py for more details on the intended use case.

One major piece of this PR is implementing the content hash for tensors.  For our prospective use case, we may need to repeatedly hash up to 80 GB of tensor data every time we snapshot (and we may snapshot multiple times).  Using a conventional cryptographic hash and hashing each snapshot would likely take on order of minutes, which seemed too slow to me.  So instead, I implemented a crappy hash function that can be run on GPU.  It is at least somewhat theoretically grounded: using random parameters generated by Philox, we use the standard shift-multiply and xor sum universal hash family.  The hash function is a bit dorky though; instead of properly doing 160-bit math, it just runs 32-bit hash five times and cats them together.  By the way, this sets the first precedent for kernel in PyTorch library which MUST be torch.compile'd to be run (in fact, this kernel does not run in eager mode because of the use of xor_sum, which doesn't actually exist in ATen.)

I had to add a few more primitives to inductor, namely randint (over the entire int range) and xor_sum.  Fortunately, these primitives are natively supported by Triton/C++, and so they were very easy to plumb through.  xor_sum is exposed as a prim, while randint special cases on when low/high span the entire 32-bit signed integer range.

Thanks to Jeff Johnson for letting me bounce ideas of him on a Saturday morning lol.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99809
Approved by: https://github.com/voznesenskym
2023-04-26 18:02:59 +00:00
Nikita Karetnikov
42921fc801 [torchgen] accept scalars for unary SymInt arrays (#99921)
Fixes https://github.com/pytorch/pytorch/issues/99907
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99921
Approved by: https://github.com/malfet
2023-04-25 00:49:15 +00:00
Luca Wehrstedt
24bf15fe8d Support record_stream in dispatch mode (#99529)
Summary:
Issuing a `t.record_stream(s)` call while a `TorchDispatchMode` is active was throwing because PyTorch was unable to convert a c10::Stream back to a Python object. It's now fixed.

Fixes https://github.com/pytorch/pytorch/issues/94403

Test Plan: Added a unit test

Differential Revision: D45117566

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99529
Approved by: https://github.com/albanD
2023-04-21 07:17:19 +00:00
dilililiwhy
526b564fa0 Uniformly use elem when checking ListType (#97873)
Fixes #ISSUE_NUMBER
a initial trial to let code of arg parser become more readable (go through and understand logic behind *torchgen* as a rookie)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97873
Approved by: https://github.com/ezyang
2023-04-05 12:06:03 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
Aaron Gokaslan
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
Brian Hirsh
35c9ea89fa dont bake in defaults when tracing *_like factories (#97564)
quick fix for https://github.com/pytorch/pytorch/issues/97541. letting CI run to see if there's any fallout

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97564
Approved by: https://github.com/ezyang
2023-03-27 22:53:44 +00:00
Chung-chieh Shan
2c588b3ad5 Allow new_full's fill_value argument type to be complex (#91345)
It seems that this code should type-check but doesn't:
```python
torch.zeros((2,3),dtype=torch.cdouble).new_full((4,5),complex(6,7))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91345
Approved by: https://github.com/zou3519, https://github.com/ezyang
2023-03-21 12:34:00 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Masaki Kozuki
49f6849f58 Fix codegen logic for foreach derivatives (#95263)
follow-up https://github.com/pytorch/pytorch/pull/93901.

Unexpected numerical mismatches observed in some foreach functions' backward result seemed to be caused by the wrong order of `IndexRangeGenerator::range` call.
This pr has `args_with_derivatives` have the same or similar order of `foreach_native_function.func.arguments.flat_non_out`

---

what the current master generates for `_foreach_mul.List`:
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto other_ix = gen.range(other_size_);
  auto self_ix = gen.range(self_size_);
  variable_list grad_inputs(gen.size());
  auto other = unpack_list(other_);
  auto self = unpack_list(self_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}
```

with this PR the generated backward is
```cpp
variable_list ForeachMulBackward0List::apply(variable_list&& grads) {
  std::lock_guard<std::mutex> lock(mutex_);
  TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE);
  TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE);
  IndexRangeGenerator gen;
  auto self_ix = gen.range(self_size_);                                         <----- diff
  auto other_ix = gen.range(other_size_);                                       <----- diff
  variable_list grad_inputs(gen.size());
  auto self = unpack_list(self_);
  auto other = unpack_list(other_);
  if (task_should_compute_output({ other_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type()));
    }
    copy_range(grad_inputs, other_ix, grad_result);
  }
  if (task_should_compute_output({ self_ix })) {
    std::vector<Tensor> grad_result;
    grad_result.reserve(grads.size());
    for (const auto & i : c10::irange(grads.size())) {
      grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type()));
    }
    copy_range(grad_inputs, self_ix, grad_result);
  }
  return grad_inputs;
}

```

The change is to fix the order of `self_ix` and `other_ix`.[](url)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95263
Approved by: https://github.com/soulitzer
2023-03-04 20:03:54 +00:00
Xuehai Pan
22d3ac79d2 [torchgen] Prettify generated type annotations (#95877)
Changes:

1. Use class inheritance for `torch/return_types.pyi`:

    Before:

    ```python
    max = NamedTuple("max", [("values", Tensor), ("indices", Tensor)])
    ```

    After:

    ```python
    class max(NamedTuple):
        values: Tensor
        indices: Tensor
    ```

------

2. Add missing spaces in generated type annotations.

    1. Always has a space after `,`.
    2. If an argument is annotated, then there need spaces around `=` when it has a default value.

        ```diff
        - def func(..., out: Optional[Tensor]=None, ...) -> Tensor:
        + def func(..., out: Optional[Tensor] = None, ...) -> Tensor:
        ```

    3. If an argument is not annotated, then there should be no spaces around `=` when it has a default value.

        ```python
        def contiguous(self, memory_format=torch.contiguous_format) -> Tensor: ...
        ```

------

3. ~Remove redundant import alias in `torch/nn/functional.pyi`:~ (Reverted)

    UPDATE: `mypy` needs the alias to work.

    Before:

    ```python
    from .. import conv1d as conv1d
    from .. import conv2d as conv2d
    from .. import conv3d as conv3d
    from .. import conv_transpose1d as conv_transpose1d
    from .. import conv_transpose2d as conv_transpose2d
    from .. import conv_transpose3d as conv_transpose3d
    from .. import conv_tbc as conv_tbc
    from .. import avg_pool1d as avg_pool1d
    from .. import relu_ as relu_
    from .. import selu_ as selu_
    from .. import celu_ as celu_
    from .. import rrelu_ as rrelu_
    from .. import pixel_shuffle as pixel_shuffle
    from .. import pixel_unshuffle as pixel_unshuffle
    from .. import channel_shuffle as channel_shuffle
    from .. import native_channel_shuffle as native_channel_shuffle
    from .. import pdist as pdist
    from .. import cosine_similarity as cosine_similarity
    ```

    After:

    ```python
    from .. import (
        conv1d,
        conv2d,
        conv3d,
        conv_transpose1d,
        conv_transpose2d,
        conv_transpose3d,
        conv_tbc,
        avg_pool1d,
        relu_,
        selu_,
        celu_,
        rrelu_,
        pixel_shuffle,
        pixel_unshuffle,
        channel_shuffle,
        native_channel_shuffle,
        pdist,
        cosine_similarity,
    )
    ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95877
Approved by: https://github.com/ezyang
2023-03-03 07:08:40 +00:00
alexdremov
b9e95158d5 [MPS] Fix LSTM backward and forward pass (#95137)
Fixes #91694
Fixes #92615

Several transpositions were missing for backward graph in case of `batch_first=True`. The #91694 is not reproduced with `batch_first=False`.

After fixing transpose issue, I finally thought that now I can use LSTM freely in my project. And then I got horrific results on train. Seems related to #92615.

After that I decided to fix LSTM's backward step completely. I collected all my findings in this thread — seems like I succeeded

Funny enough, backward tests were completely disabled before and were not passing:
```python
    @unittest.skipIf(True, "Backward of lstm returns wrong result")
    def test_lstm_2(self, device="mps", dtype=torch.float32):
```

UPD: forward pass of multi-layer version also was wrong due to the incorrect `initState, initCell` slices. Tests were passing because states were inited with zeros. *Accidentally* fixed this too

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95137
Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/soulitzer
2023-02-23 17:32:42 +00:00
Masaki Kozuki
f54233e273 [foreach] bump tensor's version and define backward via torchgen (as possible) (#93901)
## summary
- increment tensor versions in inplace foreach functions
- add a logic to take care of `ArrayRef<Scalar>`

rel: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/89591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93901
Approved by: https://github.com/albanD
2023-02-20 23:18:07 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
PyTorch MergeBot
f7bd5d0ccb Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#91… (#92402)"
This reverts commit 965f4ea3ba.

Reverted https://github.com/pytorch/pytorch/pull/92402 on behalf of https://github.com/zhxchen17 due to Caused a regression for an export model.
2023-02-03 03:12:43 +00:00
Driss Guessous
653dc73df0 [SDPA] Wire up FlashAttention's backward (#92917)
# Summary
This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml.

The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](33e0860c9c/flash_attn/flash_attn_interface.py (L126)) natively in PyTorch.  One thing that we don't have access to is ctx.save_for_backward in native PyTorch so in order to save these variables I extended the returned objects from the forward functions.

### MetaFunctions
I also updated the FlashAttention meta functions to mirror the real outputs now. As well I added a meta registration for backwards. I have an XLMR training script and while eager training now works with FlashAttention compiling this module fails with the inductor error down below.

### Questions?
Performance issues vs mem efficient when using torch.nn.mha_forward

TorchCompile -> See purposed solution below.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92917
Approved by: https://github.com/cpuhrsch
2023-02-02 04:02:30 +00:00
Sherlock Huang
965f4ea3ba [Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#91… (#92402)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919
Approved by: https://github.com/ezyang

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92402
Approved by: https://github.com/ezyang
2023-02-01 04:47:49 +00:00
Masaki Kozuki
30876229a7 [mta] Backward of unary foreach functions (#89591)
as per title, this PR defines backward of those.

This doesn't implement forward-mode automatic differentiation as [the current codegen](a747326423/tools/autograd/gen_variable_type.py (L1513)) doesn't seem to handle `ArrayRef<Tensor>`.

Rel:
- https://github.com/pytorch/pytorch/issues/53796
- https://github.com/pytorch/pytorch/issues/58833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89591
Approved by: https://github.com/albanD
2023-01-23 08:28:06 +00:00
PyTorch MergeBot
befe815466 Revert "Add sym_size/stride/numel/storage_offset to native_function.yaml (#91919)"
This reverts commit 0388400f3f.

Reverted https://github.com/pytorch/pytorch/pull/91919 on behalf of https://github.com/atalman due to Break internal build
2023-01-17 21:03:18 +00:00
Sherlock Huang
0388400f3f Add sym_size/stride/numel/storage_offset to native_function.yaml (#91919)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919
Approved by: https://github.com/ezyang
2023-01-17 03:39:57 +00:00
Luca Lumetti
a4a0195c6c Fix torch.where signature mismatch that was caused by torchgen (#91627)
Fixes #91003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91627
Approved by: https://github.com/albanD
2023-01-13 16:17:55 +00:00
Edward Z. Yang
1c46a32b67 Minor typing improvements (#91068)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91068
Approved by: https://github.com/Skylion007, https://github.com/soumith
2022-12-20 23:43:11 +00:00
Edward Z. Yang
4fa8d774b8
Add macro C10_AS_INTARRAYREF_SLOW (#90675)
This makes it easier to narrow down who is throwing the error,
instead of having to use gdb.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: [D42088781](https://our.internmc.facebook.com/intern/diff/D42088781)
2022-12-16 15:10:35 -08:00
PyTorch MergeBot
140a3139d6 Revert "Add macro C10_AS_INTARRAYREF_SLOW (#90675)"
This reverts commit 8090cb5386.

Reverted https://github.com/pytorch/pytorch/pull/90675 on behalf of https://github.com/osalpekar due to broke internal acc_tensor implementation in training_platform contbuild. See [D42052101](https://www.internalfb.com/diff/D42052101) for details.
2022-12-16 00:30:50 +00:00
Edward Z. Yang
8090cb5386 Add macro C10_AS_INTARRAYREF_SLOW (#90675)
This makes it easier to narrow down who is throwing the error,
instead of having to use gdb.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90675
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/JackCaoG
2022-12-14 21:29:23 +00:00
Larry Liu
453ff96029 [torchgen] Refactor types (#90589)
A retry of #89487. Accidentally closed.

## Split `torchgen.api.types` into `types_base`, `types` and `signatures`.

In `types_base`:
* Created base class `CType`. `BaseCType` and `ConstRefCType` etc are inheriting `CType`.
* Only keep abstract type model definitions, such as `BaseCppType`.

In `types`:
* Define `BaseCppType` with `at` and `c10` namespaces.
* All the signatures using these types.

In `signatures`:
* Define all the signatures.

In `__init__`:
* `from ... import *`, suppress flake8 error.

Differential Revision: [D41455634](https://our.internmc.facebook.com/intern/diff/D41455634/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41455634/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90589
Approved by: https://github.com/iseeyuan
2022-12-10 04:34:00 +00:00
albanD
8a9aca7b8d Reland 2 Many symintifications (#87604) (#87980)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980
Approved by: https://github.com/ezyang
2022-10-28 13:40:11 +00:00
PyTorch MergeBot
8b4d95759c Revert "Many symintifications (#87604)"
This reverts commit 777e6a2c51.

Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2022-10-28 03:00:11 +00:00
albanD
777e6a2c51 Many symintifications (#87604)
Adds
expand_inplace
conv conv_double_backward
convolution
adaptive_avg_pool2d_symint
_embedding_bag_backward_symint
cudnn_grid_sampler
cuda 32 bit indexing
nll_loss / nll_loss_2d
tensor split
pooling same mode
cudnn_is_acceptable
storage nbytes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604
Approved by: https://github.com/ezyang
2022-10-26 17:33:53 +00:00
Edward Z. Yang
45f03d6948 Add at::symint:: namespace for ease of templated functions (#86329)
Our prevailing strategy for symbolic shapes in C++ is to only
write the SymInt version of the code, and pay a slight performance
tax from not knowing if it is symbolic or not.  However, there are
some fastpath functions where this tax is unacceptable, and we want
to specialize for the int case.  Sometimes, it is easy to template
the function; but when the function involves Tensors, it is not,
because the functions you may want to call are not templated,
e.g., t.view vs t.view_symint

This PR adds an at::symint:: namespace which contains templated
functions for all functions in PyTorch which you can use in this
way.  To show this works, I refactored sum_to to stop incorrectly
reinterpret casting and instead use a template.  Instead of
t.sizes(), we call at::symint::sizes<T>(t), and so forth.

The template functions are SFINAE'd using a template argument that
is not otherwise used. As such, deduction is impossible. Typically, deduction
is hard anyway, because many of the constructors are ambiguous (this
is why we split foo and foo_symint in the first place). So you must pass
a template argument to these functions.

These functions are codegened into Functions.h so they are subject
to per-operator headers.  This matters most for methods, which likely
didn't include the per-operator header, so you will have to add an
include in that case.  We never generate method variants for these.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86329
Approved by: https://github.com/bdhirsh, https://github.com/voznesenskym
2022-10-06 04:09:17 +00:00
Edward Z. Yang
e8b0bea677 Rename fromIntArrayRef to fromIntArrayRefSlow, audit call sites (#86235)
Some of them are known non-negative, I've revised them accordingly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86235
Approved by: https://github.com/albanD
2022-10-05 23:11:01 +00:00
Edward Z. Yang
c857b3e73e More fixes for LTC symint interlock. (#86043)
Now, we also avoid translating SymInt to valueT if you haven't asked
for a SymInt implementation.  This makes embedding_dense_backward
work without changes to LTC.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86043
Approved by: https://github.com/wconstab
2022-10-02 02:53:49 +00:00
Edward Z. Yang
07800c9c81 Miscellaneous fixes from symbolic-shapes branch (#86042)
- Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is
  expected
- Binding for symintlistOptional in python arg parser
- Teach translate to convert from IntArrayRef to ArrayRef<int64_t>
- Don't query _symint function for meta info in LTC unless LTC is
  code generating a symint function

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042
Approved by: https://github.com/Chillee
2022-10-01 13:57:58 +00:00
Edward Z. Yang
84a06d7193 Enable convolution_backward with bias and symints (#85970)
Originally by Krovatkin from https://github.com/pytorch/pytorch/pull/85816

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85970
Approved by: https://github.com/albanD
2022-09-30 21:21:11 +00:00
Edward Z. Yang
793488cda2 Revert "Revert "Symintifying slice ops (#85196)"" (#85746)
This reverts commit 3a171dfb0c.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746
Approved by: https://github.com/albanD
2022-09-28 04:37:35 +00:00
Kannav
0a64e73d12 52189: refractor unreachable Runtime Error (#85725)
Fixes #52189

Since the PR #52228 had gone cold. I made the requested changes and fixed the linting error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85725
Approved by: https://github.com/lezcano
2022-09-27 22:58:09 +00:00
Brian Hirsh
4a2d2e5e40 Change API type Tensor[] for structured kernels. (#73350)
Partially fixes: #66328

This PR:
- adds support for `ITensorList` to the dispatcher for:
  - computing the dispatch key
  - boxing and unboxing `ITensorList`
- modified the codegen for structured kernels:
  - codegen APIs use `ITensorList` instead of `ArrayRef<Tensor>`

**Changes summary:**

- Signature changes due to the different APIs:
  - dispatcher API (e.g. `BatchingRegistrations.cpp`)
  - C++ API (e.g. `TensorShape.cpp`)
- Miscelaneous functions used by codegen'd functions (e.g. `FunctionalTensorWrapper.*`)
- Dispatcher changes for handling `ITensorList` correctly (e.g. `DispatchKeyExtractor.h`)
- Signature changes of `at::cat` due to the need of `const` inside `TensorBody.h`
- Forward declarations of `ITensorList` (e.g. `MethodOperators.h`)
- Codegen changes, special casing structured kernels (e.g. `gen.py`)

**Short description of structured kernels special casing:**

I introduced, mainly, 5 types of changes to the codegen for generating code depending on
whether the kernel is structured or not:

1. Added a `structured_type_override` flag to the `argument_type` function definition of
the affected APIs (mainly the dispatcher and C++ APIs).
  - `api/cpp.py`, `api/dispatcher.py`, `api/native.py`
2. Added a `structured_type_override` member to the signature
classes (e.g. `CppSignature`), since `FunctionSchema` doesn't really know whether the
function is structured or not
  - `api/types.py`
3. Added a `part_of_structured_group` to `NativeFunction` class, which is just a
convenient function to forward to `structured_type_override` wherever needed
  - `model.py`
4. Appropriately changed the rest of the codegen, whenever it used either the signature
classes or the `arguments` function directly
5. Added a check for `const ITensorList&` type wherever there was a check for `TensorList`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73350
Approved by: https://github.com/bdhirsh
2022-09-26 21:46:38 +00:00
Edward Z. Yang
9e5563dbb1 Delete SymIntArrayRef wrapper struct (#84837)
Since we separated at::foo and at::foo_symint there is no benefit
to trying to make initializer lists work in both cases.  So we can
get rid of the special different struct.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837
Approved by: https://github.com/kit1980
2022-09-12 20:04:01 +00:00
PyTorch MergeBot
034f2db1fd Revert "Delete SymIntArrayRef wrapper struct (#84837)"
This reverts commit 9c78f599e4.

Reverted https://github.com/pytorch/pytorch/pull/84837 on behalf of https://github.com/ZainRizvi due to The test test_post_localSGD_optimizer_step_reload in the X linux-bionic-cuda11.6-py3.10-gcc7 workflow has started consistently failing since this PR was submitted
2022-09-12 19:04:07 +00:00
Edward Z. Yang
9c78f599e4 Delete SymIntArrayRef wrapper struct (#84837)
Since we separated at::foo and at::foo_symint there is no benefit
to trying to make initializer lists work in both cases.  So we can
get rid of the special different struct.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837
Approved by: https://github.com/kit1980
2022-09-12 16:28:20 +00:00
Edward Z. Yang
caf034a9a2 Fix bugs in how LTC decides whether or not to symint op or not (#84832)
This fixes two problems:

- First, shape signature didn't respect the symint property (so it
  would always mark the operator as symint).  This was relatively
  easy to fix.

- Second, the call to fallback goes directly through at::_ops, so
  it must always be SymInt-aware, even if SymInt is disabled externally.
  This was a bit more difficult, because the current LTC codegen
  is poorly factored.  First, I needed to make it so individual
  arguments knew if they were going to be SymInt in LTC or not; second,
  I need to plumb enough information about the enclosing bindings so
  that I could use translate to do the expressions (previously, it was
  just assumed the signatures matched.)

The LTC codegen would do well to have a complete rewrite, but this will
have to do for now, I suppose.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84832
Approved by: https://github.com/wconstab
2022-09-12 04:49:04 +00:00
Eli Uriegas
93aef3a010 Use presence of _symint in kernel name to generate symint sig or not (#84579)
Something people found confusing was that whether or not a native::
signature would get SymInt or not in its type was based on the dispatch
key.  This changes it so that SymInt or not in type is based on whether
or not you have _symint in the name of the kernel or not.  This means
that even when we make operators support SymInt, you no longer have to
go and update all the preexisting definitions; instead, you now
selectively write _symint to opt individual kernels into SymInt support.

I then go and update a bunch of kernels that don't have proper SymInt
support to make use of this convention.  There is some hacking around
for view generation code.

I also add support for external backends to specify 'symint' operators, for which we generate SymInt signatures instead of regular signatures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D39310060](https://our.internmc.facebook.com/intern/diff/D39310060)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84579
Approved by: https://github.com/wconstab
2022-09-09 18:31:56 +00:00
Nikolay Korovaiko
f725009a48 as_strided supports SymInt; codegen supports optional SymInt (#84393)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393
Approved by: https://github.com/ezyang
2022-09-06 16:39:24 +00:00
Edward Z. Yang
ad44670fa1 Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628)" (#84173)
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"

Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9

Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159

Also update Metal registrations for C++ registration changes.

Also update NNPI registration to account for tightened schema checking

Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
2022-08-29 18:01:07 +00:00
PyTorch MergeBot
c7edcd6968 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 9790d90e4b.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487
2022-08-27 01:23:17 +00:00