pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Ivan Zaitsev	821493715c	Back out "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )", Back out "Forwatd fix for D46427687" (#103128 ) Test Plan: revertitparrot Reviewed By: malfet Differential Revision: D46506433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128 Approved by: https://github.com/malfet	2023-06-07 01:41:41 +00:00
Nikita Karetnikov	ec0aa965da	[pt2] add meta for `_linalg_solve_ex` (#102454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102454 Approved by: https://github.com/lezcano	2023-06-06 08:06:55 +00:00
Nikita Karetnikov	4bda4a7e4d	[pt2] add meta for `lu_unpack` (#102937 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102937 Approved by: https://github.com/lezcano	2023-06-06 08:06:53 +00:00
Nikita Karetnikov	6ac3352a37	[pt2] add meta for `_linalg_slogdet` (#102464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102464 Approved by: https://github.com/ezyang	2023-06-05 03:17:08 +00:00
Kurt Mohler	a84bb2709a	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-03 02:23:21 +00:00
Shunting Zhang	86c7652503	[inductor] layout optimization for conv (#99773 ) convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much. Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16) - TB: 1.64x -> 1.69x - HF: 1.79x -> 1.78x (random noise) - TIMM: 1.51x -> 1.65x Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773 Approved by: https://github.com/jansel	2023-06-02 21:08:18 +00:00
PyTorch MergeBot	a7efa0ce35	Revert "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )" This reverts commit `fb79d43649`. Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))	2023-06-02 20:00:48 +00:00
Kurt Mohler	fb79d43649	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-02 19:13:45 +00:00
Nikita Karetnikov	0f1621df1a	[pt2] fix typos in `checkFloatingOrComplex` errors (#102456 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102456 Approved by: https://github.com/lezcano	2023-05-30 11:18:50 +00:00
Nikita Karetnikov	c3ea8cc58b	[pt2] convert `out` params in `register_meta` (#101344 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344 Approved by: https://github.com/lezcano	2023-05-27 18:38:52 +00:00
Michael Lazos	69c7f710ba	Add meta registrations for some foreach ops (#102225 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/102225 Approved by: https://github.com/ngimel	2023-05-25 02:59:11 +00:00
Peter Bell	ce42010722	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-24 22:17:32 +00:00
Nikita Karetnikov	42b974e8f7	[pt2] add meta for `linalg_lu_solve` (#101836 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836 Approved by: https://github.com/lezcano	2023-05-24 00:21:50 +00:00
PyTorch MergeBot	5147fe4969	Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 )" This reverts commit `b9721bd705`. Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))	2023-05-23 23:06:21 +00:00
Peter Bell	b9721bd705	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-22 20:39:18 +00:00
drisspg	6f13d6892a	Add meta support for multinomial (#101324 ) # Summary Found this when trying to compile the text gen loop of nanogpt here: `b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324 Approved by: https://github.com/ngimel	2023-05-19 00:04:26 +00:00
Angela Yi	72a73ef67b	Add aten.searchsorted.Tensor meta kernel (#101637 ) Test Plan: CI Differential Revision: D45933187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637 Approved by: https://github.com/ezyang	2023-05-18 06:55:11 +00:00
Peter Bell	66e398951a	[inductor/decomp] Add aten._unsafe_index to disable range checks (#101602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101602 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-17 23:36:24 +00:00
Nikita Karetnikov	42e65a2587	[pt2] add meta for `linalg_lu_factor_ex` (#101375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101375 Approved by: https://github.com/lezcano	2023-05-16 20:56:54 +00:00
kshitij12345	afea1a9fe9	[meta] error checking for inplace ops (#101532 ) Fixes #100753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101532 Approved by: https://github.com/lezcano	2023-05-16 17:26:59 +00:00
Nikita Karetnikov	9eb1748b2b	[pt2] add meta and `SymInt` support for `linalg_lu` (#101372 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-05-15 20:25:00 +00:00
Nikita Karetnikov	ac4cc63ae2	[pt2] add meta for `linalg_ldl_solve` (#101367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101367 Approved by: https://github.com/lezcano	2023-05-15 20:25:00 +00:00
Nikita Karetnikov	7dd8e08817	[pt2] add meta for `linalg_ldl_factor_ex` (#101362 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101362 Approved by: https://github.com/lezcano	2023-05-15 02:56:49 +00:00
Nikita Karetnikov	a8964d6377	[pt2] add meta and `SymInt` support for `linalg_householder_product` (#101315 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315 Approved by: https://github.com/lezcano	2023-05-15 02:56:49 +00:00
Natalia Gimelshein	15a51e2012	simplify sdpa backward meta registration (#101128 ) Per title. there's an off chance that query_reshaped etc was actually discontiguous after reshape, but even in that case I'm pretty sure the computed gradients would still be contiguous, and we are properly transposing output gradients to produce correct strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101128 Approved by: https://github.com/drisspg	2023-05-11 03:30:07 +00:00
Nikita Karetnikov	c0d33f66c9	[pt2] remove unused `meta_linalg_eigh` (#100965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100965 Approved by: https://github.com/ezyang	2023-05-10 15:45:36 +00:00
Nikita Karetnikov	6abde61f8e	[pt2] add meta function for `_linalg_eigh` (#100964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100964 Approved by: https://github.com/ezyang	2023-05-10 15:45:15 +00:00
Natalia Gimelshein	bfe5f5bbe1	[WIP] enable cuda graphs support for flash attention with dropout (#100196 ) Fixes #99905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196 Approved by: https://github.com/drisspg	2023-05-08 16:19:18 +00:00
Nikita Karetnikov	1e591a8b64	[pt2] add meta function for `solve_triangular` (#100829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829 Approved by: https://github.com/ezyang	2023-05-08 13:48:15 +00:00
Nikita Karetnikov	266c84e3ab	[pt2] add meta function for `linalg_qr` (#100714 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100714 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-05-06 15:04:02 +00:00
Nikita Karetnikov	37f1be041a	[pt2] enable `svd` in `fake_tensor` (#100130 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-05-05 06:27:59 +00:00
Michael Voznesensky	fe3ecfe0cf	Add AotAutogradFallbackTests to dynamic suite (#100454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454 Approved by: https://github.com/ezyang	2023-05-04 04:28:45 +00:00
PyTorch MergeBot	c3aa59c8f5	Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196 )" This reverts commit `32615618e4`. Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build `32615618e4` https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))	2023-05-03 01:41:56 +00:00
Natalia Gimelshein	32615618e4	[WIP] enable cuda graphs support for flash attention with dropout (#100196 ) Fixes #99905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196 Approved by: https://github.com/drisspg	2023-05-02 23:05:31 +00:00
Justin Chu	e779a30d50	[BE] Fix SIM109 `compare-with-tuple` (#100337 ) Use {replacement} instead of multiple equality comparisons Pull Request resolved: https://github.com/pytorch/pytorch/pull/100337 Approved by: https://github.com/Skylion007	2023-04-30 19:51:32 +00:00
Tugsbayasgalan Manlaibaatar	d4bf76c2a4	Persist torch.assert in aten graph (#100101 ) This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101 Approved by: https://github.com/jansel	2023-04-28 07:31:43 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Xiaodong Wang	cc01568efd	[pt2] Register meta func to randperm.default (#99593 ) Summary: Looks we're missing the meta func for randperm.default. I get complaints like this when I compile randperm with dynamic shape which I think is because it gets into the real implementation but not the meta func. ``` RuntimeError: expected int but got s0 Exception raised from expect_int at fbcode/caffe2/c10/core/SymInt.h:128 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 4 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 5 at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const # 6 at::_ops::randperm::redispatch(c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 7 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) # 8 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, false>::call(c10::OperatorKernel, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) ``` Differential Revision: D45137851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99593 Approved by: https://github.com/ezyang	2023-04-25 08:55:43 +00:00
Wanchao Liang	ca24a96216	minor fix to fused adam meta registration (#99436 ) This PR fixes the registration by adding `max_exp_avg_sqs` to the output shape list too, and fix some type check issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/99436 Approved by: https://github.com/mrshenli	2023-04-24 22:50:02 +00:00
Edward Z. Yang	10c938abef	Handle meta['val'] for tuple of lists. (#99724 ) Fixes https://github.com/pytorch/pytorch/issues/99356 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724 Approved by: https://github.com/wanchaol	2023-04-21 22:33:21 +00:00
Rodrigo Kumpera	38e964056b	Reland python ops (#99170 ) Waiting for the revert to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170 Approved by: https://github.com/albanD	2023-04-18 15:15:46 +00:00
PyTorch MergeBot	1c042a2137	Revert "Reland python ops (#99170 )" This reverts commit `d4de64ae8d`. Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-18 11:37:43 +00:00
Rodrigo Kumpera	d4de64ae8d	Reland python ops (#99170 ) Waiting for the revert to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170 Approved by: https://github.com/albanD	2023-04-17 21:53:41 +00:00
Nikita Karetnikov	106ccf4a2a	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 21:21:45 +00:00
PyTorch MergeBot	f957334c2b	Revert "[pt2] add meta function for `linalg.cross` (#99279 )" This reverts commit `efc3887ea5`. Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird	2023-04-17 19:33:16 +00:00
Nikita Karetnikov	efc3887ea5	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 03:05:20 +00:00
Rodrigo Kumpera	a910045add	[PATCH] Back out "Move functional collectives implementation to python. (#98595 ) (#99168 ) Summary: Original commit changeset: ba36f8751adc Original Phabricator Diff: D44788697 Test Plan: model loading is fine after reverting the diff Reviewed By: zyan0, sayitmemory Differential Revision: D44921259 --- Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168 Approved by: https://github.com/izaitsevfb	2023-04-14 23:48:19 +00:00
XiaobingSuper	9c98f2ceb7	inductor: rewrite mkldnn fx fusion using pattern_matcher(binary) (#97141 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97141 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel	2023-04-12 06:23:03 +00:00
XiaobingSuper	c214c50355	inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_unary) (#97007 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97007 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel	2023-04-12 05:52:54 +00:00
Guang Yang	c377a8590b	Add `nonzero_static()` op to pytorch to unblock export (#97417 ) Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR Example: Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])` call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])` call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded) call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated) Test Plan: Unit Tests ``` buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static' ``` PT2 Export with `nonzero_static()` Example of `GraphModule` in the exported graph ``` def forward(self, x): arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec) nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4); arg0 = None return pytree.tree_unflatten([nonzero_static_default], self._out_spec) ``` Differential Revision: D44324808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417 Approved by: https://github.com/ezyang	2023-04-11 05:13:36 +00:00
Nikita Karetnikov	b411238d76	[pt2] add meta function for `logcumsumexp` (#98683 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683 Approved by: https://github.com/ezyang	2023-04-09 01:26:37 +00:00
Rodrigo Kumpera	24d9001527	Move functional collectives implementation to python. (#98595 ) This simplifies a lot the work we need to add new ops. This relands the previous PR, not sure why it was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595 Approved by: https://github.com/wconstab	2023-04-07 21:48:05 +00:00
Nikita Karetnikov	1c226f5aad	[pt2] add meta functions for `cummax` and `cummin` (#98552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552 Approved by: https://github.com/Chillee	2023-04-07 17:58:28 +00:00
albanD	0210481dcb	Fix _like meta registrations (#98160 ) The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!). zeros_like is special due to sparse and is fixed directly by always filling it with zeros. Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions. I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal. test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160 Approved by: https://github.com/ezyang	2023-04-06 18:44:34 +00:00
PyTorch MergeBot	67d1a77086	Revert "Move functional collectives implementation to python. (#98315 )" This reverts commit `8b0374f83c`. Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace	2023-04-06 16:49:40 +00:00
Nikita Karetnikov	7b25976323	[pt2] add meta function for `take` (#98451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451 Approved by: https://github.com/ezyang	2023-04-06 14:48:35 +00:00
Rodrigo Kumpera	8b0374f83c	Move functional collectives implementation to python. (#98315 ) This simplifies a lot the work we need to add new ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315 Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze	2023-04-06 14:06:16 +00:00
PyTorch MergeBot	fa08e546f3	Revert "Add all_reduce_coalesced functional collective (#97157 )" This reverts commit `a3fc3531f5`. Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk	2023-04-04 01:50:49 +00:00
Rodrigo Kumpera	a3fc3531f5	Add all_reduce_coalesced functional collective (#97157 ) Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else. Might not work if any outputs is unused. Test code: ```python import torch import torch.distributed as dist import torch.nn.functional as F from functorch import make_fx import os import torch.distributed._functional_collectives as ft_c from torch.testing._internal.common_distributed import ( spawn_threads_and_init_comms, ) from torch._inductor.compile_fx import compile_fx_inner def my_fun(a, b): c = a * 3 tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0]) return ((tensors[1] + tensors[0] + tensors[2]).sum(), ) @spawn_threads_and_init_comms(world_size=1) def inductor_main(self): x = torch.arange(4).cuda() * (dist.get_rank() + 1) y = torch.arange(4).cuda() * (dist.get_rank() + 1) x = x.to(torch.float) y = y.to(torch.float) * 0.5 res = make_fx(my_fun)(x, y) print(f"fx graph:\n{res.graph}") ind = compile_fx_inner(res, [x, y]) print(f"inductor done:\n{ind}") os.environ["PROXY_TENSOR_TRACING"] = "1" os.environ["TORCH_COMPILE_DEBUG"] = "1" torch._dynamo.config.output_code = True if __name__ == "__main__": inductor_main(None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157 Approved by: https://github.com/fegin	2023-04-04 01:13:18 +00:00
Shen Li	e8d39606eb	[SPMD] Enable fused Adam in full train step tracing (#98113 ) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Shen Li	bccf2ef0ce	Format DTensor dispatch.py and _meta_registrations.py (#98114 ) Format-only changes with black and lintrunner to prepare for the commit on top. Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Shen Li	9ec6fdb29b	Enable adam foreach in full train step tracing (#97897 ) Main changes: 1. Registered several foreach ops to both meta and DTensor 2. Skip redundant getitem node when expanding foreach ops with DTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-03-30 16:47:10 +00:00
Shen Li	379fb47654	[SPMD] Support foreach optimizers with functionalization (#97853 ) My first attempt was to apply the same solution as how proxy_tensor.py handles other inplace ops. However, foreach is different in the way that it's schema is `native_functions.yaml` does not return anything, whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks bdhirsh for teaching me this!). As a result, the proxy output during tracing does not wrap anything, and hence we cannot correctly connect it with subsequent operators. Modifying `native_functions.yaml` is not a preferred solution. After discussing with bdhirsh, the temporary solution is to do foreach functionalization as a graph pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852 is addressed, we will switch to default functionalization. Edit: the latest version follows @bdhirsh 's suggestion on using `make_fx` `decomposition_table` instead of implementing manual fx.Graph tranforms to functionalize `_foreach_add_`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853 Approved by: https://github.com/fegin, https://github.com/wanchaol	2023-03-30 11:27:10 +00:00
Christian Puhrsch	9d37cefcb0	Resubmit _int_mm (#96685 ) Avoids any changes to gemm_and_bias Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685 Approved by: https://github.com/drisspg, https://github.com/ngimel	2023-03-27 16:14:07 +00:00
Shen Li	a8f7e0b213	[Easy] Improve error message for meta_mm (#97533 ) Differential Revision: [D44376381](https://our.internmc.facebook.com/intern/diff/D44376381) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97533 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-03-25 01:09:41 +00:00
Driss Guessous	98a5cf090d	[SDPA] Remove the chunk_grad from mem-eff attention (#96880 ) # Summary There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized. I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880 Approved by: https://github.com/cpuhrsch	2023-03-17 21:28:25 +00:00
Nikita Karetnikov	bf08d1387c	[primTorch] handle out in `sort` meta function (#96719 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96719 Approved by: https://github.com/ezyang	2023-03-16 07:38:53 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Nikita Karetnikov	ec536232a3	[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612 Approved by: https://github.com/ezyang	2023-03-14 06:51:42 +00:00
PyTorch MergeBot	be220690d9	Revert "[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 )" This reverts commit `fe180596b8`. Reverted https://github.com/pytorch/pytorch/pull/96612 on behalf of https://github.com/malfet due to broke lint	2023-03-13 03:07:23 +00:00
Nikita Karetnikov	fe180596b8	[primTorch] add meta implementation for `upsample_nearest2d_backward` (#96612 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612 Approved by: https://github.com/ezyang	2023-03-13 00:25:23 +00:00
Nikita Karetnikov	cb7c796b4b	Enable `min.unary_out` (#96441 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96441 Approved by: https://github.com/ngimel	2023-03-11 19:23:33 +00:00
Nikita Karetnikov	0d7c44096a	Add `baddbmm` meta function (#96548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548 Approved by: https://github.com/ezyang	2023-03-11 19:09:24 +00:00
Nikita Karetnikov	8e0d5bf538	[primTorch] add meta implementation for `aten.min.dim` (#96442 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442 Approved by: https://github.com/ngimel	2023-03-11 18:51:51 +00:00
Driss Guessous	11aab72dc9	[SDPA] Add an optional scale kwarg (#95259 ) # Summary This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention() The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well. Will reduce the complexity of: #94729 and has been asked for by a couple of users. # Review Highlights - As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right? - I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename. - 'scale' is interpreted as `Q@K.T * (scale)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259 Approved by: https://github.com/cpuhrsch	2023-03-08 18:07:40 +00:00
Wonjoo Lee	3095c95828	Fixes for PyTorch/XLA functionalization integration (#94537 ) Fixes for PyTorch/XLA functionalization integration --- Some notable changes include: - More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output - Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them - Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns - Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see. - Update MetaConverter to exclude XLA tensors in raising NotImplemented… - Add `_propagate_xla_data` op - Add meta tensor support for some ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537 Approved by: https://github.com/bdhirsh	2023-03-02 23:02:34 +00:00
Wanchao Liang	f397d1700f	Inductor reduce_scatter_tensor (#95764 ) This adds reduce_scatter to the functional collective and adds the inductor lowering support Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764 Approved by: https://github.com/kumpera	2023-03-02 22:05:30 +00:00
Will Constable	cc6da7b901	Inductor allgather_into_tensor (#95530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530 Approved by: https://github.com/kumpera	2023-02-27 21:38:36 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
HELSON	f43ce9553b	[meta_tensor] polish error strings in meta registrations (#95052 ) I found some error message should be formatted for detailed information. So I polished those error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95052 Approved by: https://github.com/bdhirsh	2023-02-27 20:12:09 +00:00
Edward Z. Yang	4833e47feb	Add support for nonzero, some improvements to reduce guards (#95387 ) This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit# It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1. What's in the PR: * nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question. * The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise. * PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`) * Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`) * I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful. * Added `constrain_unify` to let you specify two unbacked SymInts must have the same value Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387 Approved by: https://github.com/voznesenskym	2023-02-24 00:27:45 +00:00
Yanan Cao (PyTorch)	039b4c8809	Add meta function for _upsample_bilinear2d_aa (#94982 ) Differential Revision: D43353000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982 Approved by: https://github.com/ezyang	2023-02-19 07:11:20 +00:00
Rodrigo Kumpera	e22d791287	[PTD] Introduce tracing friendly collectives. (#93990 ) This change adds torch.distributed.traceable_collectives. This experimental API enables collectives to be fully traced by dynamo and FX. See #93173 for the RFC Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990 Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang	2023-02-16 15:35:01 +00:00
PyTorch MergeBot	641dc0b844	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `782e4f5c02`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv	2023-02-13 09:20:37 +00:00
Jerry Zhang	782e4f5c02	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-10 01:40:12 +00:00
PyTorch MergeBot	3a5a762443	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `3fd46a2f9c`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace `3fd46a2f9c`. Please rebase and re-land it	2023-02-08 18:29:10 +00:00
Jerry Zhang	3fd46a2f9c	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-08 17:26:01 +00:00
Peter Bell	5817695bfa	[pt2] Fix arange to match ATen behavior (#93353 ) Fixes #92676 `arange` infers the output dtype from the argument types, but in order to reduce falling back to ATen, inductor preferred to cast whole number float arguments to int which gave the wrong output dtype. Instead, this decomposes floating point arange into the prim equivalent for integers. This also changes the signature of `prims.arange` to ```python prims.iota(length, , start, step, *factory_kwargs) ``` which only supports integers arguments. This is done because calculating the output size from `start, end, step` is surprisingly complex and liable to off by one errors so should not be duplicated in each backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93353 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-02-03 00:44:32 +00:00
Michael Suo	4e4293f15f	Add meta registration for bucketize (#93893 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893 Approved by: https://github.com/zhxchen17	2023-02-02 21:03:08 +00:00
Driss Guessous	653dc73df0	[SDPA] Wire up FlashAttention's backward (#92917 ) # Summary This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml. The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](`33e0860c9c/flash_attn/flash_attn_interface.py (L126)`) natively in PyTorch. One thing that we don't have access to is ctx.save_for_backward in native PyTorch so in order to save these variables I extended the returned objects from the forward functions. ### MetaFunctions I also updated the FlashAttention meta functions to mirror the real outputs now. As well I added a meta registration for backwards. I have an XLMR training script and while eager training now works with FlashAttention compiling this module fails with the inductor error down below. ### Questions? Performance issues vs mem efficient when using torch.nn.mha_forward TorchCompile -> See purposed solution below. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92917 Approved by: https://github.com/cpuhrsch	2023-02-02 04:02:30 +00:00
Driss Guessous	df14650f0b	[SDPA] Update SDPA API and make function Public (#92189 ) # Summary In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function. ## Changes ### API Previously the the function signature was: `scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)` Updated signature: `scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor` This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor. #### Reasoning: The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g. (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated. The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing. Discussed with folks at FAIR/Xformers and +1 this API change. #### Make function Public In preparation for the pt 2.0 launch we make the function public to start to generate user feedback Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189 Approved by: https://github.com/cpuhrsch	2023-01-23 20:50:46 +00:00
Brian Hirsh	76cb2d0ede	fix incorrect _embedding_bag meta (#92549 ) Fixes https://github.com/pytorch/pytorch/issues/92286. See the issue for diagnosis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92549 Approved by: https://github.com/albanD, https://github.com/eellison	2023-01-18 22:50:31 +00:00
Avik Chaudhuri	bb11e072ae	Squash and merge linalg meta kernels (#92335 ) Squashed changes from https://github.com/pytorch/pytorch/pull/92021 and https://github.com/pytorch/pytorch/pull/92020 and https://github.com/pytorch/pytorch/pull/92019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92335 Approved by: https://github.com/avikchaudhuri	2023-01-18 05:55:52 +00:00
yanbing-j	94a7c01159	Enable oneDNN implementation in LSTM op (#91158 ) ### Description This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported. ### Performance improvement In CPX 28C, with setting iomp and jemalloc. We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original. In single socket: ![image](https://user-images.githubusercontent.com/61222868/211182994-833debec-518a-4b35-8504-6b0fadb17930.png) ![image](https://user-images.githubusercontent.com/61222868/211183012-31e1253f-2c60-4c92-a656-c239a971b453.png) In single core: ![image](https://user-images.githubusercontent.com/61222868/211183017-186e5d47-cb9a-4c1e-914f-fa718e769f1c.png) ![image](https://user-images.githubusercontent.com/61222868/211183022-53266857-5a9e-4a95-b300-33fa34811d08.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91158 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-01-18 04:41:18 +00:00
Driss Guessous	f219970990	Return empty attention weights when need_atten_weights = False (#91782 ) # Summary This PR updates the second return value from SDPA to return an empty tensor of size 0 not what it would be if need_attn_weights is True. Also updates the meta function to account for this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91782 Approved by: https://github.com/cpuhrsch	2023-01-06 19:06:48 +00:00
Xia, Weiwen	de9c82f41a	[Meta] Register aten.pixel_shuffle.default for meta (#91605 ) Summary Fixes #91551 `aten.pixel_shuffle.default` is not registered for meta and it always generates contiguous (channels-first) layout of outputs. It can be reproduced by `torch.compile` (as described in the issue #91551) and running in FakeTensorMode. Test plan python test/inductor/test_torchinductor.py -k test_pixel_shuffle_channels_last python test/test_proxy_tensor.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/91605 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/anijain2305	2023-01-06 00:45:14 +00:00
Peter Bell	ad7aefb608	Fix Meta tests for FFT functions (#91628 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91628 Approved by: https://github.com/kit1980	2023-01-05 00:58:26 +00:00
XiaobingSuper	dfb651452a	inductor: meta registration for mkldnn ops (#91299 ) Fix https://github.com/pytorch/torchdynamo/issues/198, which supports Meta tensor for conv/linear fused ops to reduce the compilation time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91299 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel	2023-01-03 14:24:36 +00:00
Joel Schlosser	1c40ec46ff	Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260 ) Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260 Approved by: https://github.com/ezyang	2022-12-28 16:03:25 +00:00
Yanbo Liang	789b1437e9	Fix meta registration for aten._cudnn_rnn (#91333 ) Found this issue from [weekly running 7k github models](https://github.com/pytorch/torchdynamo/issues/1884). This caused regression on pass rate, there are 25 models failed due to this issue. The reason is argument ```cx``` of ```aten._cudnn_rnn``` can be ```None```, but it doesn't handle well in meta registration, so throws the following error: ``` Traceback (most recent call last): File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 1059, in run_node return nnmodule(args, kwargs) File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/module.py", line 1482, in _call_impl return forward_call(args, *kwargs) File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/rnn.py", line 477, in forward result = _VF.rnn_tanh(input, hx, self._flat_weights, self.bias, self.num_layers, File "/scratch/ybliang/work/repos/pytorch/torch/_subclasses/fake_tensor.py", line 916, in __torch_dispatch__ r = func(args, *kwargs) File "/scratch/ybliang/work/repos/pytorch/torch/_ops.py", line 284, in __call__ return self._op(args, **kwargs or {}) File "/scratch/ybliang/work/repos/pytorch/torch/_meta_registrations.py", line 2108, in _cudnn_rnn cy = cx.new_empty(0 if cx is None else cell_shape) AttributeError: 'NoneType' object has no attribute 'new_empty' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91333 Approved by: https://github.com/ezyang	2022-12-23 22:59:31 +00:00
Brian Hirsh	c47bdd7522	_scatter ops should preserve input stride/storage_offset (#91029 ) It turns out that we do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029 Approved by: https://github.com/ezyang	2022-12-22 19:41:53 +00:00
Joel Schlosser	3226209636	LSTM SymInt-aware changes & meta registration (cuDNN) (#90944 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90944 Approved by: https://github.com/ezyang	2022-12-16 21:42:32 +00:00
Joel Schlosser	b0cda0b38c	LSTM SymInt-aware changes & meta registration (non-cuDNN CUDA) (#90701 ) Adds meta registrations for cuDNN and vanilla CUDA ops underneath `lstm()` and makes the logic SymInt-aware. TODO: * cuDNN side does some [nasty stuff](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/RNN.cpp#L1567) with buffers; this needs larger redesign to figure out * Indicate that AOT Autograd can be used when an LSTM is present (remove the check for this once it's fully supported) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90701 Approved by: https://github.com/ezyang	2022-12-16 18:08:45 +00:00
Driss Guessous	51c6c5e156	[SDPA] Standardizes the return shape for dense tensor of SDPA regardless of fused kernel called (#90776 ) # Summary Continues to fix up the meta output story of SDPA to be more correct Pull Request resolved: https://github.com/pytorch/pytorch/pull/90776 Approved by: https://github.com/cpuhrsch	2022-12-14 18:08:02 +00:00
Driss Guessous	42a5f6ee5d	Create stub function for doing SDPA cpp and cuda dispatch (#90576 ) ## Summary Torch.compile was previously not working for transformerencoder because torch.SDPA calls a native function on tensors that returns an int. This PR instead creates a dispatch stub for the function called in order to not create a separate fx node for this native function. As well this pr adds meta functions for the fused kerenels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90576 Approved by: https://github.com/cpuhrsch	2022-12-13 03:19:40 +00:00
Peter Bell	79406378ae	[primTorch] Add prim and ref for as_strided_scatter (#88426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88426 Approved by: https://github.com/mruberry	2022-12-08 00:17:39 +00:00
Sherlock Huang	42705bd7b3	Disallow registering meta function for CompositeImplicitAutograd ops (#90222 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90222 Approved by: https://github.com/ezyang	2022-12-06 04:22:31 +00:00
Yanbo Liang	e1532af0bb	Fix meta registration for aten._cdist_forward (#90042 ) Error from [7k github model](https://github.com/pytorch/torchdynamo/issues/1884). Pull Request resolved: https://github.com/pytorch/pytorch/pull/90042 Approved by: https://github.com/ezyang, https://github.com/eellison	2022-12-02 21:13:52 +00:00
Edward Z. Yang	6fb6eb0a74	Support unspecialized integers with dynamic shapes (#89639 ) Previously, we hackily wrapped unspecialized integers into tensors and treated them as tensor inputs. Sometimes, downstream operations would not be able to deal with the tensor input. Now, we wrap them into SymInt, so more correct overload selection occurs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639 Approved by: https://github.com/anjali411	2022-11-24 22:46:42 +00:00
anjali411	9c0bf9387c	Meta impl for linalg_cholesky and linalg_cholesky_ex (#89430 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89430 Approved by: https://github.com/ezyang	2022-11-22 17:05:34 +00:00
Edward Z. Yang	5582001bd5	Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038 ) (#89143 )" (#89346 ) This reverts commit `8e4c9828f4`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346 Approved by: https://github.com/wconstab	2022-11-19 21:14:31 +00:00
PyTorch MergeBot	8e4c9828f4	Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038 )" (#89143 )" This reverts commit `e686b8c3ba`. Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs	2022-11-17 17:02:36 +00:00
Edward Z. Yang	e686b8c3ba	Reland "Towards unifying symbolic and non symbolic fake tensor (#89038 )" (#89143 ) This reverts commit `cf6003f046`. Differential Revision: [D41363992](https://our.internmc.facebook.com/intern/diff/D41363992) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89143 Approved by: https://github.com/albanD	2022-11-17 13:55:06 +00:00
PyTorch MergeBot	cf6003f046	Revert "Towards unifying symbolic and non symbolic fake tensor (#89038 )" This reverts commit `37d54239c7`. Reverted https://github.com/pytorch/pytorch/pull/89038 on behalf of https://github.com/ezyang due to executorch segfaults	2022-11-16 16:52:47 +00:00
Edward Z. Yang	37d54239c7	Towards unifying symbolic and non symbolic fake tensor (#89038 ) Fake tensor behaves pretty differently depending on if you have symbolic shapes or not. This leads to bugs; for example, we weren't getting correct convolution_backward strides because we bypassed the correct stride logic in fake tensor on symbolic shapes. This PR attempts to unify the two codepaths. I don't manage to unify everything, but I get most of it. The algorithm is delicate and I'm still hosing down test failures. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89038 Approved by: https://github.com/anjali411	2022-11-16 14:02:43 +00:00
anjali411	dc40d3f93f	Add meta impl for grid_sampler_2d_backward (#88745 ) TODO: add an OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88745 Approved by: https://github.com/ezyang	2022-11-16 13:01:47 +00:00
anjali411	52be0c42ab	meta function for max_pool2d_with_indices_backward (#88743 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88743 Approved by: https://github.com/lezcano, https://github.com/ezyang	2022-11-13 18:31:56 +00:00
anjali411	d615d12289	Add meta impl for topk (#88694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694 Approved by: https://github.com/ezyang	2022-11-11 15:28:41 +00:00
anjali411	fc9e36dd42	Add meta support for scalar_tensor and argmax (#88590 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88590 Approved by: https://github.com/albanD	2022-11-11 01:31:00 +00:00
Sherlock Huang	133e61af7a	OpOverload is_view (#88722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88722 Approved by: https://github.com/ezyang	2022-11-09 19:03:12 +00:00
Edward Z. Yang	d81797e845	Meta function for aten.sort and aten.scatter* (#88705 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88705 Approved by: https://github.com/ezyang	2022-11-09 17:47:14 +00:00
Fabio Rocha	652af5ec15	upsample_*.vec ops are now CompositeImplicit (#85638 ) It was previously CompositeExplicit but it was not really necessary. See discussion in https://github.com/pytorch/pytorch/issues/85405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638 Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel	2022-11-09 09:58:04 +00:00
Edward Z. Yang	f0e6cea2ed	Meta registrations for inplace operators (#88678 ) Also, handle non-default alpha correctly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88678 Approved by: https://github.com/SherlockNoMad, https://github.com/albanD	2022-11-09 01:27:01 +00:00
Edward Z. Yang	a880ddc164	Meta implementation for unsqueeze_ (#88675 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88675 Approved by: https://github.com/SherlockNoMad	2022-11-09 01:27:01 +00:00
Edward Z. Yang	1dab35ca1b	Meta implementation for bernoulli (#88676 ) For some reason bernoulli uses legacy memory format, see linked issue. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88676 Approved by: https://github.com/SherlockNoMad	2022-11-09 01:26:58 +00:00
Edward Z. Yang	6bb7f4f29f	Minor error message improvements on meta functions (#88677 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88677 Approved by: https://github.com/SherlockNoMad	2022-11-08 19:16:29 +00:00
Edward Z. Yang	245144a636	Propagate layout and pin memory in randint to inner constructor (#88673 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88673 Approved by: https://github.com/anjali411	2022-11-08 18:22:30 +00:00
Sherlock Huang	95d57b54e0	Handle pin_memory in refs.randn (#88473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88473 Approved by: https://github.com/mruberry	2022-11-07 20:25:56 +00:00
Elias Ellison	7d95b1e344	Run all fallback kernels with FakeTensor (#88248 ) This improves the memory compression of resnet18 from .84 -> .94 on inductor no-cudagraphs. It does mean that any extern kernel which incorrectly computes strides will be a hard error at runtime, but that's an issue we are going to have to face with dynamic shapes anyway. CC @ezyang, @SherlockNoMad Pull Request resolved: https://github.com/pytorch/pytorch/pull/88248 Approved by: https://github.com/ezyang	2022-11-04 02:06:38 +00:00
Brian Hirsh	70782981f0	aot_dispatch test fix: always use functionalization in symbolic tests (#87647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87647 Approved by: https://github.com/ezyang, https://github.com/Chillee	2022-11-02 14:36:49 +00:00
Tugsbayasgalan Manlaibaatar	2c7de4a144	Add meta implementation for aten.max.dim (#88005 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88005 Approved by: https://github.com/Chillee, https://github.com/bdhirsh	2022-11-01 18:37:24 +00:00
Sherlock Huang	c368c0faf0	Fix meta for aten.fill, constant_pad_nd, _adaptive_avg_pool2d (#88069 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88069 Approved by: https://github.com/ngimel, https://github.com/malfet	2022-11-01 15:36:06 +00:00
Sherlock Huang	c7ac333430	Fix args for meta__fused_moving_avg_obs_fq_helper (#88058 ) Fixes https://github.com/pytorch/torchdynamo/issues/1802 There are a few problems, 1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test 2. self.empty_like() is not a valid call. it should be torch.empty_like(self) 3. python meta function has some unexplained behavior for arguments with default value of bool type? In particular, problem 3 is the most concerning one. UPDATE: This is expected behavior, see discussion below for explanation. Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor. ``` meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant' ``` I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function. ``` print("per_row_fake_quant", per_row_fake_quant) print("symmetric_quant", symmetric_quant) ``` When default values are False, printed value correctly reflect the args value populated from call site. When default values are True, printed value is ALWAYS True, regardless of the populated value from call site. When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'. I also verify that this bug also affect for other meta function with default args.... My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ? I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing. cc @ezyang cc @bdhirsh. I know you had a fix in the symbolic shape branch... cc @yanboliang who reported this bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/88058 Approved by: https://github.com/bdhirsh, https://github.com/yanboliang	2022-10-31 19:00:16 +00:00
Sherlock Huang	0a4ca9d083	Fix meta for aten.angle and aten.index_copy (#88066 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88066 Approved by: https://github.com/albanD	2022-10-31 17:11:29 +00:00
Sherlock Huang	e8a97a3721	FakeTensorMode and Prims.add/sub/mul/div support scalar only inputs (#87759 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87759 Approved by: https://github.com/ngimel, https://github.com/mruberry, https://github.com/eellison	2022-10-28 04:34:25 +00:00
Sherlock Huang	b21fe312c0	Fix meta for index_add and index_put (#87775 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87775 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-10-26 20:33:23 +00:00
Sherlock Huang	0b162f5b49	Fix stride for prims.where (#87563 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87563 Approved by: https://github.com/ngimel, https://github.com/mruberry	2022-10-25 21:22:50 +00:00
Sherlock Huang	ece3758afc	Fix _refs for aten.zeros/ones/empty/randn (#87569 ) refs for aten.zeros/ones/empty/randn doesn't support .names overload. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87569 Approved by: https://github.com/ngimel	2022-10-25 20:06:57 +00:00
Sherlock Huang	eb99c1efce	Prefer python meta function over c++ meta function (#87426 ) This is a policy update for meta registration. We now prefer python meta implementation over C++ meta function. This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist. Here's the meta registration process: 1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`. However, they will NOT register them into dispatcher. 2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd. 3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases - 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta - 2. the op is a view op, as the MetaTensor doesn't support aliased storage - 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op) Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5 cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426 Approved by: https://github.com/ezyang, https://github.com/jansel	2022-10-25 16:49:02 +00:00
lezcano	faf9c47abb	Simplify a few diagonal-related functions (#87180 ) `diag` was unnecessarily implemented as a kernel rather than as a composite function, which made it unnecessarily difficult (explicit backward + all it entails). We also change a few uses of `diag` on 2D tensors for `diagonal()`. The latter returns a view rather than creating a new tensor. We also upgrade its meta implementation to a fully-fledged decomposition I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry	2022-10-24 06:11:53 +00:00
Sherlock Huang	f3f1b44778	Fix meta for meta_fill_ (#87493 ) Existing meta_fill_ doesn't correctly reflect the aliasing relationship for aten.fill. A new MetaTensor should be return instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87493 Approved by: https://github.com/eellison, https://github.com/bdhirsh	2022-10-22 12:41:03 +00:00
Edward Z. Yang	d73d4aa7de	Audit for error prone isinstance int/float and add lint (#87345 ) We recently fixed a bug on symbolic-shapes branch where an isinstance(x, int) test failed when passed a SymIntNode. To prevent this, I've added a lint for all the codepaths where we may pass SymInt/SymFloat directly to reject direct isinstance int/float tests, and instead use one of the aliases. The lint rule explains the options. I then go and fix all of them. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-10-21 15:55:24 +00:00
Sherlock Huang	f7da9db9c1	Unify decomp registries into global_decomposition_table (#86857 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857 Approved by: https://github.com/ezyang	2022-10-20 21:29:05 +00:00
albanD	254b681dc6	Convert torch.Size() argument to sym size in test_proxy_tensor (#87304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87304 Approved by: https://github.com/ezyang	2022-10-20 14:20:19 +00:00
anjali411	6351220573	Add meta support for _adaptive_avg_pool2d_backward (#86359 ) (#87074 ) This reverts commit `3edf79dc03`. Reland of https://github.com/pytorch/pytorch/pull/86359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87074 Approved by: https://github.com/ezyang	2022-10-17 16:15:04 +00:00
albanD	b40f4434ac	conv backward impl (#87047 ) ~~Waiting for test run to see if this backward is actually exercised. If not, I will add test before merging.~~ Test updated. Ready to go now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87047 Approved by: https://github.com/ezyang	2022-10-17 13:14:12 +00:00
albanD	86c2e44cb6	meta funcs for avg_pool2d and avg_pool2d_backward (#87043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87043 Approved by: https://github.com/ezyang	2022-10-17 13:14:10 +00:00
albanD	3a4c0900c7	Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795 ) Take 3 Contains: - symintification of split* - floor support on SymFloat - pad_backward, gather, scatter meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795 Approved by: https://github.com/z-a-f	2022-10-17 02:09:40 +00:00
Brian Hirsh	34c86adec4	symintify all of derivatives.yaml (#86610 ) Big-bang PR to symintify all .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing. * with the exception of `split()`, which is tougher to land because it requires internal changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610 Approved by: https://github.com/albanD	2022-10-14 20:15:48 +00:00

1 2 3 4 5 ...

326 Commits