PyTorch MergeBot
c0996866f4
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit 4305c64fea .
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164 ))
2024-03-08 20:01:03 +00:00
cyy
4305c64fea
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
Isuru Fernando
c3496d50f0
Fix torch.return_types init signature ( #119284 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119284
Approved by: https://github.com/peterbell10 , https://github.com/XuehaiPan
2024-02-23 21:52:34 +00:00
Yu, Guangye
5c46600f84
[RELAND] refactor lazy init to device-agnostic ( #119248 )
...
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.
# Design
We maintain a flag for each backend to manage the lazy initialization state separately.
# Additional Context
No need more UTs.
This is a reland PR, the original PR is [refactor lazy init to device-agnostic](https://github.com/pytorch/pytorch/pull/118846 ).
This is a common PR, and does not trigger xpu ciflow.
Differential Revision: [D53478332](https://our.internmc.facebook.com/intern/diff/D53478332 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119248
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/jgong5 , https://github.com/atalman
2024-02-07 15:58:51 +00:00
PyTorch MergeBot
ab613a4019
Revert "refactor lazy init to device-agnostic ( #118846 )"
...
This reverts commit 520771d7b3 .
Reverted https://github.com/pytorch/pytorch/pull/118846 on behalf of https://github.com/atalman due to Failing, tests https://github.com/pytorch/torchdistx/blob/main/src/python/torchdistx/_C/fake.cc#L11 ([comment](https://github.com/pytorch/pytorch/pull/118846#issuecomment-1927651305 ))
2024-02-05 18:06:30 +00:00
Yu, Guangye
520771d7b3
refactor lazy init to device-agnostic ( #118846 )
...
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.
# Design
We maintain a flag for each backend to manage the lazy initialization state separately.
# Additional Context
No need more UTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118846
Approved by: https://github.com/malfet
2024-02-02 12:10:39 +00:00
Aaron Gokaslan
1562dae62c
[BE]: Apply RUF025 dict.fromkeys preview rule ( #118637 )
...
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
Edward Z. Yang
46712b019d
Enable local_partial_types ( #118467 )
...
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414 , #118418 , #118432
2024-01-28 13:38:22 +00:00
albanD
24133e44b1
Fix return type hint for list types ( #118238 )
...
All single element list types are `Tensor[]` so they will always be Tuple.
I don't know of any way to easily access the pyi type and compare that to a real run so no testing here :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118238
Approved by: https://github.com/ezyang
2024-01-25 23:35:20 +00:00
Joel Schlosser
16d69290c6
Use view name instead of view_copy name for functional inverses ( #117056 )
...
Ex: `unsqueeze_copy_inverse()` -> `unsqueeze_inverse()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117056
Approved by: https://github.com/bdhirsh
2024-01-10 00:52:36 +00:00
Joel Schlosser
52f0457d7d
Support view returns for functional inverses on narrowing views ( #115893 )
...
Part 1 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw ).
The following functional inverses are currently implemented scatter-style and thus never return views:
* `as_strided_copy_inverse()`
* `diagonal_copy_inverse()`
* `expand_copy_inverse()`
* `select_copy_int_inverse()`
* `slice_copy_Tensor_inverse()`
* `split_copy_Tensor_inverse()`
* `split_with_sizes_copy_inverse()`
* `unbind_copy_int_inverse()`
* `unfold_copy_inverse()`
We need to get actual views for the introduction of reverse view funcs coming next.
Details:
* Use `as_strided()` to implement actual view inverses for the above
* Assumes we're given a mutated_view that is actually part of a bigger storage; this isn't really the case for functionalization
* Introduce `InverseReturnMode` enum for customization of functional inverses
* `AlwaysView` - always return an actual view; needed for reverse view_funcs()
* `NeverView` - always do a copy; useful for certain functionalization use cases (e.g. XLA, executorch)
* `ViewOrScatterInverse` - return an actual view in most cases, but prefer scatter inverses when they exist. this avoids the need to implement `as_strided()` for subclasses, which can be difficult or impossible
* Make sure functionalization works as before
* Use `ViewOrScatterInverse` when reapply_views TLS is True or `NeverView` otherwise
* Adds tests to ensure old behavior for above inverses **in functionalization**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115893
Approved by: https://github.com/bdhirsh
2023-12-21 21:39:22 +00:00
Aaron Gokaslan
ee5d981249
[BE]: Enable RUFF PERF402 and apply fixes ( #115505 )
...
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
cdzhan
99554112d3
[pytorch] add namespace for optTypeMetaToScalarType in codegen to avoid not declared when compile ( #115623 )
...
Fixes compilation failure in some environment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115623
Approved by: https://github.com/albanD
2023-12-13 00:59:01 +00:00
Antonio Kim
7fc292930c
Add support for torch.Generator type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-21 23:07:21 +00:00
Edward Z. Yang
8c4812be80
Replace expect_int with guard_int ( #113921 )
...
The idea is that instead of erroring, we will just specialize at these sites.
Fixes https://github.com/pytorch/pytorch/issues/113142
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921
Approved by: https://github.com/zou3519
2023-11-20 21:27:48 +00:00
Brian Vaughan
dbb96ef30d
improve annotation device parameters where a device ordinal is allowed ( #113647 )
...
Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal.
`error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str | device" [arg-type]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647
Approved by: https://github.com/albanD
2023-11-17 14:41:22 +00:00
Jane Xu
deec2380c7
Add 0dim Tensor overload for _foreach_div ( #113688 )
...
This PR is ALMOST basically just following the steps from #106677 EXCEPT! We do add one feature. Similar to fused_adam(w), for the CUDA dispatches: when the scalar tensor is on CPU, we .item and redispatch to the normal scalar overload. Otherwise, the cuda kernel will complain about mismatch in devices between the scalar and the tensors.
Why do we add this feature? Our optimizers want to allow lr as a tensor, and lr could be a CPU tensor. lr is used with foreach_div_ in Adam, so our CI will break otherwise.
After this PR, `_foreach_mul` and `_foreach_div` will accept either a CPU or a GPU tensor for the scalar tensor (vs only a GPU tensor). They join the ranks of `fused_adam(w)` in this characteristic. I did not yet do the same thing for foreach_add (the only other foreach op with a .Tensor overload) because there is no use case and will be more involved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113688
Approved by: https://github.com/mlazos , https://github.com/albanD
2023-11-15 20:59:32 +00:00
George White
6c187246d6
Add support for float8_e4m3fnuz and _e5m2fnuz ( #107586 )
...
This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit ). It has been based on #104242 which adds similar float8 types.
These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915 . A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586
Approved by: https://github.com/seemethere , https://github.com/malfet
2023-11-15 15:01:11 +00:00
PyTorch MergeBot
252e68a83b
Revert "Add support for torch.Generator type in TorchScript ( #110413 )"
...
This reverts commit 54493fe8c4 .
Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557 ))
2023-11-15 00:51:23 +00:00
Antonio Kim
54493fe8c4
Add support for torch.Generator type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-13 23:18:14 +00:00
PyTorch MergeBot
9a28a7b498
Revert "Add support for torch.Generator type in TorchScript ( #110413 )"
...
This reverts commit 27e31ab6e8 .
Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164 ))
2023-11-07 15:53:32 +00:00
Antonio Kim
27e31ab6e8
Add support for torch.Generator type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-06 21:27:02 +00:00
Mengwei Liu
19e9f5cc7b
[torchgen] Add support for optional tensor ( #112938 )
...
Summary: As titled
Test Plan: rely on CI
Differential Revision: D50997957
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112938
Approved by: https://github.com/Skylion007
2023-11-06 20:03:05 +00:00
Aaron Gokaslan
1ad0f0b308
[BE]: remove unnecessary enumerate calls ( #111690 )
...
Remove unnecessary enumerate calls, entirely automated fixes so probably reasonably low risk.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111690
Approved by: https://github.com/malfet
2023-10-20 23:20:29 +00:00
Jane Xu
ca7d084ff9
Add ScalarTensor or 0dim overload for _foreach_add ( #111079 )
...
Adding a Tensor overload will allow us to:
- optimize in more cases than before
- increase coverage for scalarTensor instead of just scalars in our foreach APIs
The main complication in this PR was that add.Tensor has a scalar overload, so I've now built out support for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111079
Approved by: https://github.com/albanD
2023-10-20 01:34:07 +00:00
Kazuaki Ishizaki
ac48c11ab7
Fix typo under torchgen directory ( #111154 )
...
This PR fixes typo in comments and messages in files under `torchgen` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111154
Approved by: https://github.com/rajveer43 , https://github.com/Skylion007
2023-10-13 16:43:46 +00:00
isdanni
dede1e96e2
[BE] Enable Ruff's Flake8 PYI018 ( #111101 )
...
Enable [unused-private-type-var (PYI018)](https://docs.astral.sh/ruff/rules/unused-private-type-var/#unused-private-type-var-pyi018 )
Link: #110950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111101
Approved by: https://github.com/albanD
2023-10-12 16:26:21 +00:00
Edward Z. Yang
6a974bec5d
Change flash attention outputs to be SymInt instead of int ( #110533 )
...
Fixes https://github.com/pytorch/pytorch/issues/110322
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
Fabrice Pont
053367b1ed
fix: flake8-bugbear code B024 ( #107265 )
...
See #106571 item B024
This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes.
Should I also include PEP8 compliant reformatting on the files I had to modify ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265
Approved by: https://github.com/kit1980
2023-10-04 23:52:52 +00:00
cyy
d9fb7166d6
[BE] use DeviceIndex instead of int64_t for related device interfaces ( #103068 )
...
This PR unifies the device interfaces in aten/*cpp and torch/csrc/*cpp to use **c10::DeviceIndex**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103068
Approved by: https://github.com/malfet
2023-08-25 20:16:14 +00:00
Masaki Kozuki
5814380e7b
Revert "Revert "Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )""" ( #106320 )
...
Fixed a typo specifying the number of tensors and elements in the test having failed in slow gradcheck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106320
Approved by: https://github.com/soulitzer
2023-08-18 23:01:42 +00:00
PyTorch MergeBot
2b427ae3a7
Revert "Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )"
...
This reverts commit e773f28ee3 .
Reverted https://github.com/pytorch/pytorch/pull/106043 on behalf of https://github.com/DanilBaibak due to Break slow tests ([comment](https://github.com/pytorch/pytorch/pull/106043#issuecomment-1658642734 ))
2023-07-31 15:50:36 +00:00
Masaki Kozuki
e773f28ee3
Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )
...
forward-mode AD of out-of-place foreach functions, finally.
rel:
- #102409
- #105504
- #58833
- #100695
---
# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachSinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
return result;
}
::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachNormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->ord = ord;
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
if (grad_fn) {
grad_fn->result = result;
}
return result;
}
```
# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<SinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<NormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->p = p;
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
throw_error_for_complex_autograd(result, "norm");
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (grad_fn) {
grad_fn->result_ = SavedVariable(result, true);
}
return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106043
Approved by: https://github.com/soulitzer
2023-07-27 03:13:24 +00:00
Alan Ji
70b0f1b248
fix some typos ( #106018 )
...
Fixes #ISSUE_NUMBER
Fix typos in `test_static_module.cc`, `backend_cutting_test.cc` and `types_base.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106018
Approved by: https://github.com/awgu
2023-07-26 18:14:44 +00:00
Justin Chu
4cc1745b13
[BE] f-stringify torch/ and scripts ( #105538 )
...
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.
- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/
Command used:
```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```
and excluded `collect_env.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang , https://github.com/malfet
2023-07-21 19:35:24 +00:00
Amadeusz Skrzypczak
b64bd4a5dd
Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )
...
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
PyTorch MergeBot
f2b15772ff
Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )"
...
This reverts commit a9804130e5 .
Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284 ))
2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak
a9804130e5
Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )
...
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
Justin Chu
964d29f312
[BE] Enable ruff's UP rules and autoformat torchgen/ ( #105423 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105423
Approved by: https://github.com/Skylion007
2023-07-18 06:44:20 +00:00
PyTorch MergeBot
8958f041be
Revert "Add forward mode AD to out-place foreach functions ( #102409 )"
...
This reverts commit e2ec0ba404 .
Reverted https://github.com/pytorch/pytorch/pull/102409 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it is failing some tests in trunk e799f565eb ([comment](https://github.com/pytorch/pytorch/pull/102409#issuecomment-1615254393 ))
2023-06-30 22:46:57 +00:00
Masaki Kozuki
e2ec0ba404
Add forward mode AD to out-place foreach functions ( #102409 )
...
The major difference from in-place support is that some out-place functions have their derivatives spelled out in derivatives.yaml, which requires some changes in `load_derivatives.py` and some handlings in various places due to the others whose derivatives are generated by `torchgen`.
rel:
- #58833
- #100695
---
# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachSinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
return result;
}
::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachNormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->ord = ord;
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
if (grad_fn) {
grad_fn->result = result;
}
return result;
}
```
# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<SinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<NormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->p = p;
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
throw_error_for_complex_autograd(result, "norm");
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (grad_fn) {
grad_fn->result_ = SavedVariable(result, true);
}
return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102409
Approved by: https://github.com/soulitzer
2023-06-30 04:51:43 +00:00
SherlockNoMad
d997969b8b
[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml ( #103107 )
...
Differential Revision: D46459100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103107
Approved by: https://github.com/angelayi , https://github.com/soulitzer
2023-06-12 19:18:49 +00:00
Masaki Kozuki
ba2bc7df8f
Enable backward on _foreach_zero_ ( #101149 )
...
Currently torchgen cannot find an appropriate `DifferentiabilityInfo` for `_foreach_zero_` because `gen_foreach_derivativeinfo` doesn't correctly make use of `functional_info_by_signature` and `differentiability_infos`, and `is_reference_for_foreach` a bit too strict to `_foreach_zero_`.
Generated code in `VariableType`
```c++
void _foreach_zero_(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
std::vector<std::shared_ptr<ZeroBackward0>> grad_fns;
if (_any_requires_grad) {
for (const auto& i : c10::irange( self.size() )) {
const auto ith_requires_grad = compute_requires_grad(self[i]);
check_inplace(self[i], ith_requires_grad);
grad_fns.push_back([&]() -> std::shared_ptr<ZeroBackward0> {
if (!ith_requires_grad) {
return nullptr;
} else {
auto grad_fn = std::shared_ptr<ZeroBackward0>(new ZeroBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self[i] ));
return grad_fn;
}
}());
}
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
{
at::AutoDispatchBelowAutograd guard;
at::redispatch::_foreach_zero_(ks & c10::after_autograd_keyset, self_);
}
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (!grad_fns.empty()) {
auto differentiable_outputs = flatten_tensor_args( self );
TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
for (const auto& i : c10::irange(grad_fns.size())) {
auto grad_fn = grad_fns[i];
if (grad_fn != nullptr) {
rebase_history(differentiable_outputs[i], grad_fns[i]);
}
}
}
}
```
Rel:
- #58833
- #96405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101149
Approved by: https://github.com/soulitzer
2023-05-17 03:10:13 +00:00
Nikita Shulga
20cf42de2c
Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.… ( #100749 )"
...
This reverts commit bb454891ed .
2023-05-16 18:17:02 -07:00
Edward Z. Yang
b94f143ace
SymIntify convNd and conv_transposeNd, fix inductor symint handling ( #101488 )
...
Fixes https://github.com/pytorch/pytorch/issues/101014
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488
Approved by: https://github.com/ngimel
2023-05-16 17:46:52 +00:00
Sherlock Huang
bb454891ed
[Reland] Add sym_size/stride/numel/storage_offset to native_function.… ( #100749 )
...
…yaml (#91… (#91919 )
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919 Approved by: https://github.com/ezyang
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92402
Reviewed By: ezyang
Differential Revision: D42565586
Pulled By: SherlockNoMad
fbshipit-source-id: 1c2986e45307e076d239836a1b45441a9fa3c9d9
ghstack-source-id: 969f4928486e04c57aaf98e20e3c3ca946c51613
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100749
Approved by: https://github.com/zhxchen17 , https://github.com/albanD
2023-05-12 22:57:42 +00:00
Natalia Gimelshein
bfe5f5bbe1
[WIP] enable cuda graphs support for flash attention with dropout ( #100196 )
...
Fixes #99905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-08 16:19:18 +00:00
PyTorch MergeBot
c3aa59c8f5
Revert "[WIP] enable cuda graphs support for flash attention with dropout ( #100196 )"
...
This reverts commit 32615618e4 .
Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810 ))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4
[WIP] enable cuda graphs support for flash attention with dropout ( #100196 )
...
Fixes #99905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Masaki Kozuki
6c934a89a7
Skip invalid grads in outplace foreachs' backward ( #100256 )
...
Fixes #100248
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100256
Approved by: https://github.com/soulitzer , https://github.com/albanD
2023-04-29 22:45:26 +00:00