pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Nikita Shulga	b56b002842	Fix NULL dereference in binary CPU ops (#115183 ) Targeted fix for https://github.com/pytorch/pytorch/issues/113037 A more fundamental one, where those functions are not even called for empty tensors are coming later Pull Request resolved: https://github.com/pytorch/pytorch/pull/115183 Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn	2023-12-06 03:37:47 +00:00
Isuru Fernando	2f536ff92c	Refactor values kwarg in foreach tests (#112781 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112781 Approved by: https://github.com/lezcano ghstack dependencies: #112778	2023-11-22 22:10:54 +00:00
Isuru Fernando	1f1ff629a8	Use parent class attribute supports_out for foreach_zero opinfo (#112778 ) Instead of introducing a new has_no_out_of_place attribute Also fixes foreach_copy tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/112778 Approved by: https://github.com/lezcano	2023-11-22 18:00:44 +00:00
Jane Xu	deec2380c7	Add 0dim Tensor overload for _foreach_div (#113688 ) This PR is ALMOST basically just following the steps from #106677 EXCEPT! We do add one feature. Similar to fused_adam(w), for the CUDA dispatches: when the scalar tensor is on CPU, we .item and redispatch to the normal scalar overload. Otherwise, the cuda kernel will complain about mismatch in devices between the scalar and the tensors. Why do we add this feature? Our optimizers want to allow lr as a tensor, and lr could be a CPU tensor. lr is used with foreach_div_ in Adam, so our CI will break otherwise. After this PR, `_foreach_mul` and `_foreach_div` will accept either a CPU or a GPU tensor for the scalar tensor (vs only a GPU tensor). They join the ranks of `fused_adam(w)` in this characteristic. I did not yet do the same thing for foreach_add (the only other foreach op with a .Tensor overload) because there is no use case and will be more involved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113688 Approved by: https://github.com/mlazos, https://github.com/albanD	2023-11-15 20:59:32 +00:00
rraminen	44367c59b2	Update skip reason for failing unit tests on ROCm 5.7 (#113286 ) Follow up to https://github.com/pytorch/pytorch/pull/110465. Updated skip reason for failing unit tests on ROCm 5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113286 Approved by: https://github.com/malfet	2023-11-13 19:29:04 +00:00
Isuru Fernando	3b915f9de0	[pt2] enable meta tests for `foreach` ops (#113484 ) Try https://github.com/pytorch/pytorch/pull/113059 again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113484 Approved by: https://github.com/lezcano	2023-11-11 02:43:41 +00:00
Nikita Shulga	d5eb9f725c	Fix test_add_scalar_with_empty_list_tensor (#113262 ) By actually instantiating test method to a different types and devices rather than always creating it on CPU. Also, remove `bool` from the list, as adding 1 to bool is not supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113262 Approved by: https://github.com/jeanschmidt, https://github.com/atalman, https://github.com/lezcano	2023-11-08 20:56:37 +00:00
rraminen	3a429423fc	Upgrade CI to ROCm5.7 (#110465 ) This PR is to upgrade CI to ROCm5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110465 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2023-11-08 06:11:10 +00:00
Nikita Shulga	236eff9531	[BE] Refactor repeated assets in test_foreach.py (#112348 ) Tested conditions in `test_binary_op_list_error_cases` looks almost identical, although it tests method and in-place variants. Use for loop to make distinction a bit more explicit Pull Request resolved: https://github.com/pytorch/pytorch/pull/112348 Approved by: https://github.com/albanD ghstack dependencies: #112349	2023-10-31 01:11:44 +00:00
Nikita Shulga	80de49653a	Prevent OOB access in foreach_list variants (#112349 ) By checking that lists sizes are the same before computing forward gradients. Before the change ```cpp ::std::vector<at::Tensor> _foreach_add_List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) { auto self_ = unpack(self, "self", 0); auto other_ = unpack(other, "other", 1); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, other ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]) \|\| isFwGradDefined(other[i]); } ... ``` after the change: ```cpp ::std::vector<at::Tensor> _foreach_add_List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) { auto self_ = unpack(self, "self", 0); auto other_ = unpack(other, "other", 1); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, other ); TORCH_CHECK( self.size() == other.size(), "Tensor lists must have the same number of tensors, got ", self.size(), " and ", other.size()); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]) \|\| isFwGradDefined(other[i]); } ``` Add regression test Fixes https://github.com/pytorch/pytorch/issues/112305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112349 Approved by: https://github.com/Chillee	2023-10-30 20:43:03 +00:00
Isuru Fernando	c7dcba9276	Remove passing disable_fastpath in kwargs (#112250 ) Fixes an issue that came up in https://github.com/pytorch/pytorch/pull/112030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112250 Approved by: https://github.com/lezcano	2023-10-27 18:29:20 +00:00
Jane Xu	ca7d084ff9	Add ScalarTensor or 0dim overload for _foreach_add (#111079 ) Adding a Tensor overload will allow us to: - optimize in more cases than before - increase coverage for scalarTensor instead of just scalars in our foreach APIs The main complication in this PR was that add.Tensor has a scalar overload, so I've now built out support for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111079 Approved by: https://github.com/albanD	2023-10-20 01:34:07 +00:00
Jane Xu	0a60219fe3	[foreach] Fix 0-size handling for real for real (#109402 ) @crcrpar's last attempt to fix the 0-size problem unfortunately did not pass all cases. See my comment in https://github.com/pytorch/pytorch/issues/100701. When we have a tail tensor of size 0, the old code would mess with the chunk logic to check the previous tensor's length. This is flawed because: 1. if the previous tensor was also 0 sized, (so a tensor list of [tensor, tensor, tensor, ..., 0-sized tensor, 0-sized tensor],) chunks would still be 0 and the nested for loop would be missed. 2. the nested forloop pronounces side effects on tensorListMeta that _shouldn't_ be there! This can mess up the compute in unexpected ways that I haven't really needed to reason through. We noticed that the problem had not been fixed due to an internal report. This PR solves the issue by: - removing the finagling of chunks when the tail tensor is 0-sized - adding a surefire way for the kernel to be launched in the case where the last tensor is 0-sized AND there's content in the metadata, signifying there is stuff to compute still. ## test plan As I went through the code, I also added some comments explaining what's up and modified our tensor inputs to ensure that this case is tested in the test_parity test in test_foreach.py. Yes, I do realize there is quite a bit of duplication and that this file could be due for a refactor. That said, the primary goal of this PR is to fix the pretty egregious bug and refactoring can be a followup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109402 Approved by: https://github.com/albanD	2023-09-26 17:38:20 +00:00
Jane Xu	4b0281b32c	[BE][foreach] name tests correctly. noncontiguous inputs != fastpath (#109771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109771 Approved by: https://github.com/soulitzer	2023-09-22 19:16:14 +00:00
Masaki Kozuki	602413a0a0	Refactor `test_foreach.py` (#107869 ) ## Summary - Change the default of `supports_autograd` and `supports_forward_ad` of `ForeachFuncInfo` to `True` - Add `test_zero_size_tensor_inputs` to make sure that foreach functions can handle 0-size Tensor inputs - Add `test_parity` to check the consistency between outputs of foreach and for-loop of native function. - Add `test_autodiff` to check forward-mode and reverse-mode AD - Keep the corner cases that are not covered by the newly introduced methods rel: - #58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107869 Approved by: https://github.com/janeyx99	2023-09-14 19:39:26 +00:00
Masaki Kozuki	5814380e7b	Revert "Revert "Reland "Add forward mode AD to out-place foreach functions (#102409 ) (#106043 )""" (#106320 ) Fixed a typo specifying the number of tensors and elements in the test having failed in slow gradcheck Pull Request resolved: https://github.com/pytorch/pytorch/pull/106320 Approved by: https://github.com/soulitzer	2023-08-18 23:01:42 +00:00
Masaki Kozuki	b234b94760	Add in-place `_foreach_copy` (#107226 ) Fixes #107162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107226 Approved by: https://github.com/janeyx99	2023-08-17 00:11:18 +00:00
PyTorch MergeBot	354484ea6d	Revert "Add `_foreach_clamp` (#106574 )" This reverts commit `2b560d3c3a`. Reverted https://github.com/pytorch/pytorch/pull/106574 on behalf of https://github.com/kit1980 due to breaking internal windows builds ([comment](https://github.com/pytorch/pytorch/pull/106574#issuecomment-1675400335))	2023-08-11 21:05:04 +00:00
Masaki Kozuki	2b560d3c3a	Add `_foreach_clamp` (#106574 ) Rel: - #106221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106574 Approved by: https://github.com/janeyx99	2023-08-10 05:26:09 +00:00
Masaki Kozuki	9e4e0ecdd9	Add 0-dim `Tensor` overload to `_foreach_mul` (#106677 ) rel: - https://github.com/pytorch/pytorch/issues/106427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106677 Approved by: https://github.com/janeyx99	2023-08-08 03:00:01 +00:00
PyTorch MergeBot	2b427ae3a7	Revert "Reland "Add forward mode AD to out-place foreach functions (#102409 ) (#106043 )" This reverts commit `e773f28ee3`. Reverted https://github.com/pytorch/pytorch/pull/106043 on behalf of https://github.com/DanilBaibak due to Break slow tests ([comment](https://github.com/pytorch/pytorch/pull/106043#issuecomment-1658642734))	2023-07-31 15:50:36 +00:00
Masaki Kozuki	e773f28ee3	Reland "Add forward mode AD to out-place foreach functions (#102409 ) (#106043 ) forward-mode AD of out-of-place foreach functions, finally. rel: - #102409 - #105504 - #58833 - #100695 --- # Generated Foreach ```c++ ::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]); } std::shared_ptr<ForeachSinhBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = make_saved_variable_list(self); grad_fn->self_size_ = self.size(); } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt); for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { if (_any_has_forward_grad_result[i]) { auto self_t_raw = toNonOptFwGrad(self[i]); auto self_tensor = toNonOptTensor(self[i]); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self[i]); result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj(); } } for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i]; if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level / 0, / is_inplace_op / false); } } return result; } ::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]); } std::shared_ptr<ForeachNormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->ord = ord; grad_fn->self_ = make_saved_variable_list(self); grad_fn->self_size_ = self.size(); } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord); })(); auto result = std::move(_tmp); #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt); for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { if (_any_has_forward_grad_result[i]) { auto self_t_raw = toNonOptFwGrad(self[i]); auto self_tensor = toNonOptTensor(self[i]); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self[i]); result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]); } } for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i]; if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result[i]._set_fw_grad(result_new_fw_grad_opt.value(), / level / 0, / is_inplace_op / false); } } if (grad_fn) { grad_fn->result = result; } return result; } ``` # Reference ```c++ at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) { auto& self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self)); std::shared_ptr<SinhBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) { TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh"); } if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh"); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_result && (result.defined())) { auto self_t_raw = toNonOptFwGrad(self); auto self_tensor = toNonOptTensor(self); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self); result_new_fw_grad_opt = (self_t.conj() self_p.cosh().conj()).conj(); } if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad_opt.value(), /* level / 0, / is_inplace_op / false); } return result; } at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) { auto& self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self)); std::shared_ptr<NormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->p = p; grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) { TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar"); } if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar"); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "norm"); c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_result && (result.defined())) { auto self_t_raw = toNonOptFwGrad(self); auto self_tensor = toNonOptTensor(self); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self); result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result); } if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad_opt.value(), / level / 0, / is_inplace_op */ false); } if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } return result; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106043 Approved by: https://github.com/soulitzer	2023-07-27 03:13:24 +00:00
Masaki Kozuki	72f2c87a5a	[foreach] Set `SavedVariable.is_output` to `true` for `grad_fn->result_` (#105504 ) fixes #105502 The scope of this pull request is out-of-place foreach functions that depend on their output tensorlist for backward such as `_foreach_exp`. An example of the generated code with this update is as follows: ```c++ variable_list ForeachExpBackward0::apply(variable_list&& grads) { std::lock_guard<std::mutex> lock(mutex_); TORCH_CHECK(!result_released_, ERR_BACKWARD_TWICE); IndexRangeGenerator gen; auto self_ix = gen.range(self_size_); variable_list grad_inputs(gen.size()); auto result = unpack_list(result_, shared_from_this()); if (task_should_compute_output({ self_ix })) { std::vector<Tensor> grad_result; grad_result.reserve(grads.size()); for (const auto & i : c10::irange(grads.size())) { if (grads[i].defined()) { grad_result.emplace_back(grads[i] * result[i].conj()); } else { grad_result.emplace_back(Tensor()); } } copy_range(grad_inputs, self_ix, grad_result); } return grad_inputs; } ::std::vector<at::Tensor> _foreach_exp(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<ForeachExpBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<ForeachExpBackward0>(new ForeachExpBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_size_ = self.size(); } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif auto _tmp = ([&]() { if ((isFwGradDefinedTensorList(self))) { static c10::OperatorName full_name("aten::_foreach_exp", ""); static c10::optional<c10::OperatorHandle> opt_op = c10::Dispatcher::singleton().findSchema(full_name); return impl::run_jit_decomposition_with_args_for_jvp<::std::vector<at::Tensor>>("_foreach_exp", *opt_op, ks, self); } else { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::_foreach_exp(ks & c10::after_autograd_keyset, self_); } })(); auto result = std::move(_tmp); #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } if (grad_fn) { grad_fn->result_ = make_saved_variable_list(result, true); } return result; } ``` A bit of context: - https://github.com/pytorch/pytorch/pull/105368#issuecomment-1640912479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105504 Approved by: https://github.com/soulitzer	2023-07-26 14:29:32 +00:00
Jane Xu	803d42e457	add lerp cpu support for half (#105607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105607 Approved by: https://github.com/albanD	2023-07-21 20:29:05 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
PyTorch MergeBot	8958f041be	Revert "Add forward mode AD to out-place foreach functions (#102409 )" This reverts commit `e2ec0ba404`. Reverted https://github.com/pytorch/pytorch/pull/102409 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it is failing some tests in trunk `e799f565eb` ([comment](https://github.com/pytorch/pytorch/pull/102409#issuecomment-1615254393))	2023-06-30 22:46:57 +00:00
Masaki Kozuki	e2ec0ba404	Add forward mode AD to out-place foreach functions (#102409 ) The major difference from in-place support is that some out-place functions have their derivatives spelled out in derivatives.yaml, which requires some changes in `load_derivatives.py` and some handlings in various places due to the others whose derivatives are generated by `torchgen`. rel: - #58833 - #100695 --- # Generated Foreach ```c++ ::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]); } std::shared_ptr<ForeachSinhBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = make_saved_variable_list(self); grad_fn->self_size_ = self.size(); } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt); for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { if (_any_has_forward_grad_result[i]) { auto self_t_raw = toNonOptFwGrad(self[i]); auto self_tensor = toNonOptTensor(self[i]); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self[i]); result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj(); } } for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i]; if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level / 0, / is_inplace_op / false); } } return result; } ::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]); } std::shared_ptr<ForeachNormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->ord = ord; grad_fn->self_ = make_saved_variable_list(self); grad_fn->self_size_ = self.size(); } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord); })(); auto result = std::move(_tmp); #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt); for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { if (_any_has_forward_grad_result[i]) { auto self_t_raw = toNonOptFwGrad(self[i]); auto self_tensor = toNonOptTensor(self[i]); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self[i]); result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]); } } for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) { auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i]; if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result[i]._set_fw_grad(result_new_fw_grad_opt.value(), / level / 0, / is_inplace_op / false); } } if (grad_fn) { grad_fn->result = result; } return result; } ``` # Reference ```c++ at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) { auto& self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self)); std::shared_ptr<SinhBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) { TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh"); } if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh"); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_result && (result.defined())) { auto self_t_raw = toNonOptFwGrad(self); auto self_tensor = toNonOptTensor(self); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self); result_new_fw_grad_opt = (self_t.conj() self_p.cosh().conj()).conj(); } if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad_opt.value(), /* level / 0, / is_inplace_op / false); } return result; } at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) { auto& self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self)); std::shared_ptr<NormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->p = p; grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) { TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar"); } if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar"); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "norm"); c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_result && (result.defined())) { auto self_t_raw = toNonOptFwGrad(self); auto self_tensor = toNonOptTensor(self); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options()); auto self_p = toNonOptPrimal(self); result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result); } if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad_opt.value(), / level / 0, / is_inplace_op */ false); } if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } return result; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102409 Approved by: https://github.com/soulitzer	2023-06-30 04:51:43 +00:00
Nikita Shulga	6d2887cc06	Reland "Move tensor grouping to ATen" (#103912 ) This is a reland of https://github.com/pytorch/pytorch/pull/100007 with a build fix for Windows debug builds. `at::native::ParamsHash` only works on structs with standard layout, but `std::string` isn't one in Visual C++ debug builds, which one can easily verified by running something like: ```cpp #define _DEBUG #include <type_traits> #include <string> static_assert(std::is_standard_layout_v<std::string>, "Oh noes"); ``` If above conditon is not met, instead of printing a static_assert output, VC++ raises a very cryptic compilation errors, see https://github.com/pytorch/pytorch/pull/100007#discussion_r1227116292 for more detail. Also, using `std::hash` for string should result in a faster hash function. (cherry picked from commit `74b7a6c75e`) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 5914771</samp> This pull request introduces a new function `_group_tensors_by_device_and_dtype` that can group tensors by their device and dtype, and updates the `foreach` utilities and several optimizers to use this function. The goal is to improve the performance, readability, and compatibility of the code that handles tensors with different properties. The pull request also adds a test case and type annotations for the new function, and some error checks for the `fused` argument in Adam and AdamW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103912 Approved by: https://github.com/janeyx99	2023-06-21 09:26:33 +00:00
PyTorch MergeBot	0cb5bc3b04	Revert "Move tensor grouping to ATen (#100007 )" This reverts commit `74b7a6c75e`. Reverted https://github.com/pytorch/pytorch/pull/100007 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629727 ([comment](https://github.com/pytorch/pytorch/pull/100007#issuecomment-1587861598))	2023-06-12 18:30:33 +00:00
Masaki Kozuki	74b7a6c75e	Move tensor grouping to ATen (#100007 ) rel: #94344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100007 Approved by: https://github.com/janeyx99	2023-06-09 15:44:46 +00:00
Masaki Kozuki	0bb2b01541	Add forward mode AD to in-place foreach functions (#100695 ) Awkwardly implement fwd AD by - adding a few `CodeTemplate`s - allowing for the cases where a variable is initialized with i-th element of TensorList <!-- ### TODOs: - [x] ~~remove the first `_any_has_forward_grad_self`~~ make it a vector of bool - [ ] clean up mapping of names from reference impl to foreach impl - [x] add tests --> ### Rel: - #58833 - #96405 --- `_foreach_addcmul_.ScalarList` from `VariableType` ```c++ void _foreach_addcmul__ScalarList(c10::DispatchKeySet ks, at::TensorList self, at::TensorList tensor1, at::TensorList tensor2, at::ArrayRef<at::Scalar> scalars) { auto self_ = unpack(self, "self", 0); auto tensor1_ = unpack(tensor1, "tensor1", 1); auto tensor2_ = unpack(tensor2, "tensor2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, tensor1, tensor2 ); std::vector<bool> _any_has_forward_grad_self(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_self[i] = isFwGradDefined(self[i]) \|\| isFwGradDefined(tensor1[i]) \|\| isFwGradDefined(tensor2[i]); } std::vector<c10::optional<at::Tensor>> original_selfs(self.size()); std::vector<std::shared_ptr<AddcmulBackward0>> grad_fns; if (_any_requires_grad) { for (const auto& i : c10::irange( self.size() )) { const auto ith_requires_grad = compute_requires_grad(self[i], tensor1[i], tensor2[i]); check_inplace(self[i], ith_requires_grad); grad_fns.push_back([&]() -> std::shared_ptr<AddcmulBackward0> { if (!ith_requires_grad) { return nullptr; } else { auto grad_fn = std::shared_ptr<AddcmulBackward0>(new AddcmulBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self[i], tensor1[i], tensor2[i] )); return grad_fn; } }()); } if (!grad_fns.empty()) { for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { grad_fn->self_scalar_type = self[i].scalar_type(); grad_fn->tensor1_scalar_type = tensor1[i].scalar_type(); if (grad_fn->should_compute_output(1)) { grad_fn->tensor2_ = SavedVariable(tensor2[i], false); } grad_fn->value = scalars[i]; if (grad_fn->should_compute_output(2)) { grad_fn->tensor1_ = SavedVariable(tensor1[i], false); } grad_fn->tensor2_scalar_type = tensor2[i].scalar_type(); } } } } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); std::vector<c10::optional<Storage>> tensor1__storage_saved(tensor1_.size()); for (const Tensor& tensor : tensor1_) tensor1__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> tensor1__impl_saved(tensor1_.size()); for (size_t i=0; i<tensor1_.size(); i++) if (tensor1_[i].defined()) tensor1__impl_saved[i] = tensor1_[i].getIntrusivePtr(); std::vector<c10::optional<Storage>> tensor2__storage_saved(tensor2_.size()); for (const Tensor& tensor : tensor2_) tensor2__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> tensor2__impl_saved(tensor2_.size()); for (size_t i=0; i<tensor2_.size(); i++) if (tensor2_[i].defined()) tensor2__impl_saved[i] = tensor2_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_addcmul_(ks & c10::after_autograd_keyset, self_, tensor1_, tensor2_, scalars); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor1__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor1_)) TORCH_INTERNAL_ASSERT(tensor1__storage_saved[i].value().is_alias_of(tensor1_[i].storage())); } for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor1__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor1_)) TORCH_INTERNAL_ASSERT(tensor1__impl_saved[i] == tensor1_[i].getIntrusivePtr()); } for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor2__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor2_)) TORCH_INTERNAL_ASSERT(tensor2__storage_saved[i].value().is_alias_of(tensor2_[i].storage())); } for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor2__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor2_)) TORCH_INTERNAL_ASSERT(tensor2__impl_saved[i] == tensor2_[i].getIntrusivePtr()); } #endif if (!grad_fns.empty()) { auto differentiable_outputs = flatten_tensor_args( self ); TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size()); for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { rebase_history(differentiable_outputs[i], grad_fns[i]); } } } std::vector<c10::optional<at::Tensor>> self_new_fw_grad_opts(self.size(), c10::nullopt); for (const auto& i : c10::irange(self_new_fw_grad_opts.size())) { if (_any_has_forward_grad_self[i]) { auto self_t_raw = toNonOptFwGrad(self[i]); auto self_tensor = toNonOptTensor(self[i]); auto self_t = (self_t_raw.defined() \|\| !self_tensor.defined()) ? self_t_raw : at::zeros(self_tensor.sizes(), self_tensor.options()); auto tensor1_t_raw = toNonOptFwGrad(tensor1[i]); auto tensor1_tensor = toNonOptTensor(tensor1[i]); auto tensor1_t = (tensor1_t_raw.defined() \|\| !tensor1_tensor.defined()) ? tensor1_t_raw : at::_efficientzerotensor(tensor1_tensor.sizes(), tensor1_tensor.options()); auto tensor1_p = toNonOptPrimal(tensor1[i]); auto tensor2_t_raw = toNonOptFwGrad(tensor2[i]); auto tensor2_tensor = toNonOptTensor(tensor2[i]); auto tensor2_t = (tensor2_t_raw.defined() \|\| !tensor2_tensor.defined()) ? tensor2_t_raw : at::_efficientzerotensor(tensor2_tensor.sizes(), tensor2_tensor.options()); auto tensor2_p = toNonOptPrimal(tensor2[i]); self_t = GradMode::is_enabled() ? self_t.clone() : self_t; self_new_fw_grad_opts[i] = self_t_raw.defined() ? self_t_raw.copy_(self_t + maybe_multiply(tensor1_t * tensor2_p, scalars[i]) + maybe_multiply(tensor2_t * tensor1_p, scalars[i])) : self_t + maybe_multiply(tensor1_t * tensor2_p, scalars[i]) + maybe_multiply(tensor2_t * tensor1_p, scalars[i]); } } for (const auto& i : c10::irange(self_new_fw_grad_opts.size())) { auto& self_new_fw_grad_opt = self_new_fw_grad_opts[i]; if (self_new_fw_grad_opt.has_value() && self_new_fw_grad_opt.value().defined() && self[i].defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. self[i]._set_fw_grad(self_new_fw_grad_opt.value(), /* level / 0, / is_inplace_op */ true); } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100695 Approved by: https://github.com/soulitzer	2023-05-25 15:39:48 +00:00
Masaki Kozuki	ba2bc7df8f	Enable `backward` on `_foreach_zero_` (#101149 ) Currently torchgen cannot find an appropriate `DifferentiabilityInfo` for `_foreach_zero_` because `gen_foreach_derivativeinfo` doesn't correctly make use of `functional_info_by_signature` and `differentiability_infos`, and `is_reference_for_foreach` a bit too strict to `_foreach_zero_`. Generated code in `VariableType` ```c++ void _foreach_zero_(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self ); std::vector<c10::optional<at::Tensor>> original_selfs(self.size()); std::vector<std::shared_ptr<ZeroBackward0>> grad_fns; if (_any_requires_grad) { for (const auto& i : c10::irange( self.size() )) { const auto ith_requires_grad = compute_requires_grad(self[i]); check_inplace(self[i], ith_requires_grad); grad_fns.push_back([&]() -> std::shared_ptr<ZeroBackward0> { if (!ith_requires_grad) { return nullptr; } else { auto grad_fn = std::shared_ptr<ZeroBackward0>(new ZeroBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self[i] )); return grad_fn; } }()); } } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_zero_(ks & c10::after_autograd_keyset, self_); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (!grad_fns.empty()) { auto differentiable_outputs = flatten_tensor_args( self ); TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size()); for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { rebase_history(differentiable_outputs[i], grad_fns[i]); } } } } ``` Rel: - #58833 - #96405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101149 Approved by: https://github.com/soulitzer	2023-05-17 03:10:13 +00:00
Masaki Kozuki	6c934a89a7	Skip invalid grads in outplace foreachs' backward (#100256 ) Fixes #100248 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100256 Approved by: https://github.com/soulitzer, https://github.com/albanD	2023-04-29 22:45:26 +00:00
Masaki Kozuki	674018903d	per-Tensor `grad_fn` for in-place foreach functions (#96405 ) Generate a `grad_fn` for each (tuple of) `Tensor`(s) of the same index for `_foreach_foo_` and each `grad_fn` is `FooBackward`. The current status of foreach functions' backward support for the record: - out-place: Implemented, but no optimized implementations like their forward path - in-place: not implemented. I think this check `7eaaefafb3/torchgen/api/autograd.py (L309-L311)` is partly responsible but the difference of signature between out-place and in-place (see https://github.com/pytorch/pytorch/pull/96405#discussion_r1154690940) would prevent in-place from using out-place versions (the logic is around `7eaaefafb3/torchgen/api/autograd.py (L495-L500)`) ```c++ void _foreach_abs_(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_abs_(ks & c10::after_autograd_keyset, self_); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) AT_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) AT_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif } ``` Related: - #95431 - #95765 for multiple `grad_fn`s logic --- Examples: outputs of `_foreach_add_.List`, `_foreach_addcmul_.ScalarList`, and `_foreach_exp` ```c++ void _foreach_addcmul__ScalarList(c10::DispatchKeySet ks, at::TensorList self, at::TensorList tensor1, at::TensorList tensor2, at::ArrayRef<at::Scalar> scalars) { auto self_ = unpack(self, "self", 0); auto tensor1_ = unpack(tensor1, "tensor1", 1); auto tensor2_ = unpack(tensor2, "tensor2", 2); auto _any_requires_grad = compute_requires_grad( self, tensor1, tensor2 ); (void)_any_requires_grad; std::vector<c10::optional<at::Tensor>> original_selfs(self.size()); std::vector<std::shared_ptr<AddcmulBackward0>> grad_fns; if (_any_requires_grad) { for (const auto& i : c10::irange( self.size() )) { const auto ith_requires_grad = compute_requires_grad(self[i], tensor1[i], tensor2[i]); check_inplace(self[i], ith_requires_grad); grad_fns.push_back([&]() -> std::shared_ptr<AddcmulBackward0> { if (!ith_requires_grad) { return nullptr; } else { auto grad_fn = std::shared_ptr<AddcmulBackward0>(new AddcmulBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self[i], tensor1[i], tensor2[i] )); return grad_fn; } }()); } if (!grad_fns.empty()) { for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { grad_fn->self_scalar_type = self[i].scalar_type(); grad_fn->tensor1_scalar_type = tensor1[i].scalar_type(); if (grad_fn->should_compute_output(1)) { grad_fn->tensor2_ = SavedVariable(tensor2[i], false); } grad_fn->value = scalars[i]; if (grad_fn->should_compute_output(2)) { grad_fn->tensor1_ = SavedVariable(tensor1[i], false); } grad_fn->tensor2_scalar_type = tensor2[i].scalar_type(); } } } } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); std::vector<c10::optional<Storage>> tensor1__storage_saved(tensor1_.size()); for (const Tensor& tensor : tensor1_) tensor1__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> tensor1__impl_saved(tensor1_.size()); for (size_t i=0; i<tensor1_.size(); i++) if (tensor1_[i].defined()) tensor1__impl_saved[i] = tensor1_[i].getIntrusivePtr(); std::vector<c10::optional<Storage>> tensor2__storage_saved(tensor2_.size()); for (const Tensor& tensor : tensor2_) tensor2__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> tensor2__impl_saved(tensor2_.size()); for (size_t i=0; i<tensor2_.size(); i++) if (tensor2_[i].defined()) tensor2__impl_saved[i] = tensor2_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_addcmul_(ks & c10::after_autograd_keyset, self_, tensor1_, tensor2_, scalars); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor1__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor1_)) TORCH_INTERNAL_ASSERT(tensor1__storage_saved[i].value().is_alias_of(tensor1_[i].storage())); } for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor1__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor1_)) TORCH_INTERNAL_ASSERT(tensor1__impl_saved[i] == tensor1_[i].getIntrusivePtr()); } for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor2__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor2_)) TORCH_INTERNAL_ASSERT(tensor2__storage_saved[i].value().is_alias_of(tensor2_[i].storage())); } for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (tensor2__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor2_)) TORCH_INTERNAL_ASSERT(tensor2__impl_saved[i] == tensor2_[i].getIntrusivePtr()); } #endif if (!grad_fns.empty()) { auto differentiable_outputs = flatten_tensor_args( self ); TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size()); for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { rebase_history(differentiable_outputs[i], grad_fns[i]); } } } } ``` ```c++ void _foreach_add__List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) { auto self_ = unpack(self, "self", 0); auto other_ = unpack(other, "other", 1); auto _any_requires_grad = compute_requires_grad( self, other ); (void)_any_requires_grad; std::vector<c10::optional<at::Tensor>> original_selfs(self.size()); std::vector<std::shared_ptr<AddBackward0>> grad_fns; if (_any_requires_grad) { for (const auto& i : c10::irange( self.size() )) { const auto ith_requires_grad = compute_requires_grad(self[i], other[i]); check_inplace(self[i], ith_requires_grad); grad_fns.push_back([&]() -> std::shared_ptr<AddBackward0> { if (!ith_requires_grad) { return nullptr; } else { auto grad_fn = std::shared_ptr<AddBackward0>(new AddBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self[i], other[i] )); return grad_fn; } }()); } if (!grad_fns.empty()) { for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { grad_fn->other_scalar_type = other[i].scalar_type(); grad_fn->alpha = alpha; grad_fn->self_scalar_type = self[i].scalar_type(); } } } } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); std::vector<c10::optional<Storage>> other__storage_saved(other_.size()); for (const Tensor& tensor : other_) other__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> other__impl_saved(other_.size()); for (size_t i=0; i<other_.size(); i++) if (other_[i].defined()) other__impl_saved[i] = other_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_add_(ks & c10::after_autograd_keyset, self_, other_, alpha); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (other__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(other_)) TORCH_INTERNAL_ASSERT(other__storage_saved[i].value().is_alias_of(other_[i].storage())); } for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (other__impl_saved[i] && !at::impl::tensorlist_has_dispatch(other_)) TORCH_INTERNAL_ASSERT(other__impl_saved[i] == other_[i].getIntrusivePtr()); } #endif if (!grad_fns.empty()) { auto differentiable_outputs = flatten_tensor_args( self ); TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size()); for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { rebase_history(differentiable_outputs[i], grad_fns[i]); } } } } ... void _foreach_exp_(c10::DispatchKeySet ks, at::TensorList self) { auto self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); (void)_any_requires_grad; std::vector<c10::optional<at::Tensor>> original_selfs(self.size()); std::vector<std::shared_ptr<ExpBackward0>> grad_fns; if (_any_requires_grad) { for (const auto& i : c10::irange( self.size() )) { const auto ith_requires_grad = compute_requires_grad(self[i]); check_inplace(self[i], ith_requires_grad); grad_fns.push_back([&]() -> std::shared_ptr<ExpBackward0> { if (!ith_requires_grad) { return nullptr; } else { auto grad_fn = std::shared_ptr<ExpBackward0>(new ExpBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self[i] )); return grad_fn; } }()); } } #ifndef NDEBUG std::vector<c10::optional<Storage>> self__storage_saved(self_.size()); for (const Tensor& tensor : self_) self__storage_saved.push_back( tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt); std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size()); for (size_t i=0; i<self_.size(); i++) if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr(); #endif { at::AutoDispatchBelowAutograd guard; at::redispatch::_foreach_exp_(ks & c10::after_autograd_keyset, self_); } #ifndef NDEBUG for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage())); } for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) { if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_)) TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr()); } #endif if (!grad_fns.empty()) { auto differentiable_outputs = flatten_tensor_args( self ); TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size()); for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { rebase_history(differentiable_outputs[i], grad_fns[i]); } } } if (!grad_fns.empty()) { for (const auto& i : c10::irange(grad_fns.size())) { auto grad_fn = grad_fns[i]; if (grad_fn != nullptr) { grad_fn->result_ = SavedVariable(self[i], true, self[i].is_view()); } } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96405 Approved by: https://github.com/soulitzer	2023-04-28 00:55:04 +00:00
Masaki Kozuki	13ca08435c	[test_foreach] add cases of zero size tensors (#95028 ) supply zero-size tensors only if multi_tensor_apply_kernel would be called w.h.p, i.e. device is cuda and dtype is float32 rel: - https://github.com/pytorch/pytorch/pull/94655 - https://github.com/pytorch/pytorch/issues/94865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95028 Approved by: https://github.com/ngimel	2023-03-23 00:12:13 +00:00
Masaki Kozuki	a48d518e45	test_foreach: remove `skipMeta` (#96599 ) Happened to notice that the test doesn't seem to require the guard (at least on my local environment) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96599 Approved by: https://github.com/bdhirsh	2023-03-13 22:14:36 +00:00
Masaki Kozuki	f54233e273	[foreach] bump tensor's version and define backward via torchgen (as possible) (#93901 ) ## summary - increment tensor versions in inplace foreach functions - add a logic to take care of `ArrayRef<Scalar>` rel: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/89591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93901 Approved by: https://github.com/albanD	2023-02-20 23:18:07 +00:00
Masaki Kozuki	3e9df622fb	[mta] implement `_foreach_pow` (#92303 ) Mainly for foreach path of `Adam` and `AdamW` rel: https://github.com/pytorch/pytorch/issues/58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92303 Approved by: https://github.com/albanD	2023-02-16 02:28:26 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Masaki Kozuki	30876229a7	[mta] Backward of unary foreach functions (#89591 ) as per title, this PR defines backward of those. This doesn't implement forward-mode automatic differentiation as [the current codegen](`a747326423/tools/autograd/gen_variable_type.py (L1513)`) doesn't seem to handle `ArrayRef<Tensor>`. Rel: - https://github.com/pytorch/pytorch/issues/53796 - https://github.com/pytorch/pytorch/issues/58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89591 Approved by: https://github.com/albanD	2023-01-23 08:28:06 +00:00
Masaki Kozuki	32b2d8009a	check if `multi_tensor_apply_kernel` was called (#92077 ) Replacing all the hard coded number of cuda kernel launches with `multi_tensor_apply_kernel` call check, keeping the dependency on kineto profiler there Rel: https://github.com/pytorch/pytorch/pull/91844#issuecomment-1379844523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92077 Approved by: https://github.com/ngimel	2023-01-23 06:46:36 +00:00
milesial	a76bc410df	Fix `_foreach_norm` on some tensor sizes (#91844 ) This PR fixes 2 bugs with CUDA `_foreach_norm`: 1. Wrong norm when tensors are larger than kChunkSize = 65536 ``` >>> torch._foreach_norm([torch.ones(60000, device="cuda") for _ in range(1)]) (tensor(244.9490, device='cuda:0', grad_fn=<NotImplemented>),) >>> torch._foreach_norm([torch.ones(70000, device="cuda") for _ in range(1)]) (tensor(256., device='cuda:0', grad_fn=<NotImplemented>),) >>> torch.ones(60000, device="cuda").norm() tensor(244.9490, device='cuda:0', grad_fn=<LinalgVectorNormBackward0>) >>> torch.ones(70000, device="cuda").norm() tensor(264.5751, device='cuda:0', grad_fn=<LinalgVectorNormBackward0>) ``` 2. Error when a tensor numel is smaller than the number of tensors ``` >> torch._foreach_norm([torch.ones(9, device="cuda") for _ in range(10)]) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: select(): index 9 out of range for tensor of size [9] at dimension 0 ``` This bug could have been caught by tests if `PYTORCH_TEST_WITH_SLOW` was 1, because it would have tested tensors of size 300*300=90000. It's not enabled by default, does someone know if it's ever enabled? Pull Request resolved: https://github.com/pytorch/pytorch/pull/91844 Approved by: https://github.com/ngimel	2023-01-12 05:48:01 +00:00
Masaki Kozuki	554a796aef	Implement `torch._foreach_lerp` (#87562 ) As per title. - [ ] ~~Q: Do we want `torch._foreach_lerp.ScalarList` as well?~~ - [ ] ~~we might want to have `ATen/native/cuda/lerp.cuh` and include it in `ATen/native/cuda/Lerp.cu` and `ATen/native/cuda/ForeachTernaryOp.cu`~~ Related: - https://github.com/pytorch/pytorch/issues/58833 - https://github.com/pytorch/pytorch/issues/71683 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87562 Approved by: https://github.com/ngimel	2023-01-11 02:52:04 +00:00
milesial	9d20d6d5ec	Foreach clamp_min clamp_max (#91384 ) Adds `_foreach_clamp_min` and `_foreach_clamp_max` as binary ops, with scalar, scalarlist and tensorlist support. Timing example for `_foreach_clamp_min_` on a GTX3070Ti across a list of tensors with varying count and item size (times are in microseconds (us)): CUDA: ``` [------------------ (tensors, scalar) -------------------] \| for loop \| foreach 10 tensors of size 4 \| 29.0 \| 10.2 100 tensors of size 4 \| 234.4 \| 18.3 1000 tensors of size 4 \| 2194.1 \| 113.5 10000 tensors of size 4 \| 21745.6 \| 1144.5 10 tensors of size 16 \| 29.5 \| 12.0 100 tensors of size 16 \| 256.9 \| 19.9 1000 tensors of size 16 \| 2499.7 \| 123.6 10000 tensors of size 16 \| 25022.2 \| 1295.6 10 tensors of size 256 \| 32.8 \| 11.2 100 tensors of size 256 \| 258.8 \| 19.7 1000 tensors of size 256 \| 2509.2 \| 123.7 10000 tensors of size 256 \| 25016.2 \| 1295.4 10 tensors of size 65536 \| 32.9 \| 18.7 100 tensors of size 65536 \| 327.1 \| 150.3 1000 tensors of size 65536 \| 3051.3 \| 1388.0 10000 tensors of size 65536 \| 30476.9 \| 14021.5 [------------------ (tensors, tensors) ------------------] \| for loop \| foreach 10 tensors of size 4 \| 26.8 \| 17.3 100 tensors of size 4 \| 206.8 \| 90.5 1000 tensors of size 4 \| 1993.0 \| 828.9 10000 tensors of size 4 \| 19851.0 \| 9063.3 10 tensors of size 16 \| 34.7 \| 20.0 100 tensors of size 16 \| 232.2 \| 102.1 1000 tensors of size 16 \| 2220.9 \| 977.3 10000 tensors of size 16 \| 22644.5 \| 10361.4 10 tensors of size 256 \| 30.5 \| 19.7 100 tensors of size 256 \| 231.6 \| 102.4 1000 tensors of size 256 \| 2251.9 \| 978.7 10000 tensors of size 256 \| 22680.3 \| 10405.8 10 tensors of size 65536 \| 30.6 \| 34.4 100 tensors of size 65536 \| 315.1 \| 223.6 1000 tensors of size 65536 \| 3252.1 \| 2114.4 10000 tensors of size 65536 \| 30578.0 \| 22826.3 ``` CPU: ``` [------------------- (tensors, scalar) -------------------] \| for loop \| foreach 10 tensors of size 4 \| 13.0 \| 9.6 100 tensors of size 4 \| 62.4 \| 31.6 1000 tensors of size 4 \| 562.2 \| 245.6 10000 tensors of size 4 \| 5552.2 \| 2517.7 10 tensors of size 16 \| 14.9 \| 11.3 100 tensors of size 16 \| 74.1 \| 36.9 1000 tensors of size 16 \| 663.7 \| 285.5 10000 tensors of size 16 \| 6765.2 \| 2947.5 10 tensors of size 256 \| 15.2 \| 11.8 100 tensors of size 256 \| 76.0 \| 37.7 1000 tensors of size 256 \| 728.8 \| 323.9 10000 tensors of size 256 \| 7274.4 \| 3800.3 10 tensors of size 65536 \| 105.6 \| 124.5 100 tensors of size 65536 \| 982.8 \| 939.7 1000 tensors of size 65536 \| 14993.1 \| 14579.2 10000 tensors of size 65536 \| 163091.0 \| 151555.8 [------------------- (tensors, tensors) ------------------] \| for loop \| foreach 10 tensors of size 4 \| 11.8 \| 10.5 100 tensors of size 4 \| 53.1 \| 38.2 1000 tensors of size 4 \| 465.1 \| 316.1 10000 tensors of size 4 \| 4616.9 \| 3625.9 10 tensors of size 16 \| 13.5 \| 12.3 100 tensors of size 16 \| 63.0 \| 46.5 1000 tensors of size 16 \| 560.1 \| 359.9 10000 tensors of size 16 \| 5586.8 \| 3765.9 10 tensors of size 256 \| 15.2 \| 13.7 100 tensors of size 256 \| 64.4 \| 48.3 1000 tensors of size 256 \| 653.7 \| 410.0 10000 tensors of size 256 \| 5916.6 \| 3901.3 10 tensors of size 65536 \| 109.1 \| 106.8 100 tensors of size 65536 \| 1128.9 \| 1105.0 1000 tensors of size 65536 \| 16245.0 \| 15950.8 10000 tensors of size 65536 \| 171111.3 \| 163540.2 ``` Example use: ``` tensors = [torch.randn(16, device='cuda') for _ in range(10)] out = torch._foreach_clamp_min(tensors, 0.1) out = torch._foreach_clamp_min(tensors, [0.1] * len(tensors)) out = torch._foreach_clamp_min(tensors, tensors) torch._foreach_clamp_min_(tensors, 0.1) torch._foreach_clamp_min_(tensors, [0.1] * len(tensors)) torch._foreach_clamp_min_(tensors, tensors) ``` Does not support complex types. Changes the existing `foreach_minimum/maximum` to use this new implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91384 Approved by: https://github.com/ngimel	2023-01-09 19:28:47 +00:00
Christian Puhrsch	6fd416650a	Add _foreach_addc(div/mul)(_).Tensor (#88157 ) Support passing value scalars as a flat 1D Tensor. Currently we can only pass either an individual scalar or a ScalarList. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88157 Approved by: https://github.com/ngimel, https://github.com/albanD	2022-11-02 23:24:35 +00:00
Elias Ellison	f701cb04fb	Test Dynamo CI w Fake Tensors (#84282 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282 Approved by: https://github.com/anijain2305	2022-09-01 00:15:05 +00:00
Masaki Kozuki	3139722679	[foreach][mta] Inplace `maximum` and `minimum` (#82523 ) ### Description <!-- What did you change and why was it needed? --> Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`. ### Issue <!-- Link to Issue ticket or RFP --> - https://github.com/pytorch/pytorch/issues/78807 - https://github.com/pytorch/pytorch/pull/81894 - https://github.com/pytorch/pytorch/pull/81348 - https://github.com/pytorch/pytorch/pull/81705 - https://github.com/pytorch/pytorch/issues/58833 - https://github.com/pytorch/pytorch/issues/68041 ### Testing <!-- How did you test your change? --> Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]` cc @ngimel @albanD @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523 Approved by: https://github.com/albanD	2022-08-03 03:40:42 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00

1 2 3

108 Commits