Commit Graph

326 Commits

Author SHA1 Message Date
Ivan Zaitsev
821493715c Back out "Remove check from _prims_common, replace with torch._check* (#102219)", Back out "Forwatd fix for D46427687" (#103128)
Test Plan: revertitparrot

Reviewed By: malfet

Differential Revision: D46506433

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128
Approved by: https://github.com/malfet
2023-06-07 01:41:41 +00:00
Nikita Karetnikov
ec0aa965da [pt2] add meta for _linalg_solve_ex (#102454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102454
Approved by: https://github.com/lezcano
2023-06-06 08:06:55 +00:00
Nikita Karetnikov
4bda4a7e4d [pt2] add meta for lu_unpack (#102937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102937
Approved by: https://github.com/lezcano
2023-06-06 08:06:53 +00:00
Nikita Karetnikov
6ac3352a37 [pt2] add meta for _linalg_slogdet (#102464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102464
Approved by: https://github.com/ezyang
2023-06-05 03:17:08 +00:00
Kurt Mohler
a84bb2709a Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-03 02:23:21 +00:00
Shunting Zhang
86c7652503 [inductor] layout optimization for conv (#99773)
convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much.

Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16)
- TB: 1.64x -> 1.69x
- HF: 1.79x -> 1.78x (random noise)
- TIMM: 1.51x -> 1.65x

Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773
Approved by: https://github.com/jansel
2023-06-02 21:08:18 +00:00
PyTorch MergeBot
a7efa0ce35 Revert "Remove check from _prims_common, replace with torch._check* (#102219)"
This reverts commit fb79d43649.

Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))
2023-06-02 20:00:48 +00:00
Kurt Mohler
fb79d43649 Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-02 19:13:45 +00:00
Nikita Karetnikov
0f1621df1a [pt2] fix typos in checkFloatingOrComplex errors (#102456)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102456
Approved by: https://github.com/lezcano
2023-05-30 11:18:50 +00:00
Nikita Karetnikov
c3ea8cc58b [pt2] convert out params in register_meta (#101344)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344
Approved by: https://github.com/lezcano
2023-05-27 18:38:52 +00:00
Michael Lazos
69c7f710ba Add meta registrations for some foreach ops (#102225)
as title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102225
Approved by: https://github.com/ngimel
2023-05-25 02:59:11 +00:00
Peter Bell
ce42010722 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-24 22:17:32 +00:00
Nikita Karetnikov
42b974e8f7 [pt2] add meta for linalg_lu_solve (#101836)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836
Approved by: https://github.com/lezcano
2023-05-24 00:21:50 +00:00
PyTorch MergeBot
5147fe4969 Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)"
This reverts commit b9721bd705.

Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))
2023-05-23 23:06:21 +00:00
Peter Bell
b9721bd705 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-22 20:39:18 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Angela Yi
72a73ef67b Add aten.searchsorted.Tensor meta kernel (#101637)
Test Plan: CI

Differential Revision: D45933187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637
Approved by: https://github.com/ezyang
2023-05-18 06:55:11 +00:00
Peter Bell
66e398951a [inductor/decomp] Add aten._unsafe_index to disable range checks (#101602)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101602
Approved by: https://github.com/lezcano, https://github.com/ngimel
2023-05-17 23:36:24 +00:00
Nikita Karetnikov
42e65a2587 [pt2] add meta for linalg_lu_factor_ex (#101375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101375
Approved by: https://github.com/lezcano
2023-05-16 20:56:54 +00:00
kshitij12345
afea1a9fe9 [meta] error checking for inplace ops (#101532)
Fixes #100753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101532
Approved by: https://github.com/lezcano
2023-05-16 17:26:59 +00:00
Nikita Karetnikov
9eb1748b2b [pt2] add meta and SymInt support for linalg_lu (#101372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
ac4cc63ae2 [pt2] add meta for linalg_ldl_solve (#101367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101367
Approved by: https://github.com/lezcano
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
7dd8e08817 [pt2] add meta for linalg_ldl_factor_ex (#101362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101362
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Nikita Karetnikov
a8964d6377 [pt2] add meta and SymInt support for linalg_householder_product (#101315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Natalia Gimelshein
15a51e2012 simplify sdpa backward meta registration (#101128)
Per title.

there's an off chance that query_reshaped etc was actually discontiguous after reshape, but even in that case I'm pretty sure the computed gradients would still be contiguous, and we are properly transposing output gradients to produce correct strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101128
Approved by: https://github.com/drisspg
2023-05-11 03:30:07 +00:00
Nikita Karetnikov
c0d33f66c9 [pt2] remove unused meta_linalg_eigh (#100965)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100965
Approved by: https://github.com/ezyang
2023-05-10 15:45:36 +00:00
Nikita Karetnikov
6abde61f8e [pt2] add meta function for _linalg_eigh (#100964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100964
Approved by: https://github.com/ezyang
2023-05-10 15:45:15 +00:00
Natalia Gimelshein
bfe5f5bbe1 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-08 16:19:18 +00:00
Nikita Karetnikov
1e591a8b64 [pt2] add meta function for solve_triangular (#100829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829
Approved by: https://github.com/ezyang
2023-05-08 13:48:15 +00:00
Nikita Karetnikov
266c84e3ab [pt2] add meta function for linalg_qr (#100714)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100714
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-06 15:04:02 +00:00
Nikita Karetnikov
37f1be041a [pt2] enable svd in fake_tensor (#100130)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-05 06:27:59 +00:00
Michael Voznesensky
fe3ecfe0cf Add AotAutogradFallbackTests to dynamic suite (#100454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454
Approved by: https://github.com/ezyang
2023-05-04 04:28:45 +00:00
PyTorch MergeBot
c3aa59c8f5 Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196)"
This reverts commit 32615618e4.

Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Justin Chu
e779a30d50 [BE] Fix SIM109 compare-with-tuple (#100337)
Use {replacement} instead of multiple equality comparisons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100337
Approved by: https://github.com/Skylion007
2023-04-30 19:51:32 +00:00
Tugsbayasgalan Manlaibaatar
d4bf76c2a4 Persist torch.assert in aten graph (#100101)
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101
Approved by: https://github.com/jansel
2023-04-28 07:31:43 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Xiaodong Wang
cc01568efd [pt2] Register meta func to randperm.default (#99593)
Summary:
Looks we're missing the meta func for randperm.default. I get complaints like this when I compile randperm with dynamic shape which I think is because it gets into the real implementation but not the meta func.

```
RuntimeError: expected int but got s0
Exception raised from expect_int at fbcode/caffe2/c10/core/SymInt.h:128 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 5  at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const
# 6  at::_ops::randperm::redispatch(c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 7  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 8  c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)

```

Differential Revision: D45137851

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99593
Approved by: https://github.com/ezyang
2023-04-25 08:55:43 +00:00
Wanchao Liang
ca24a96216 minor fix to fused adam meta registration (#99436)
This PR fixes the registration by adding `max_exp_avg_sqs` to the
output shape list too, and fix some type check issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99436
Approved by: https://github.com/mrshenli
2023-04-24 22:50:02 +00:00
Edward Z. Yang
10c938abef Handle meta['val'] for tuple of lists. (#99724)
Fixes https://github.com/pytorch/pytorch/issues/99356

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724
Approved by: https://github.com/wanchaol
2023-04-21 22:33:21 +00:00
Rodrigo Kumpera
38e964056b Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-18 15:15:46 +00:00
PyTorch MergeBot
1c042a2137 Revert "Reland python ops (#99170)"
This reverts commit d4de64ae8d.

Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 11:37:43 +00:00
Rodrigo Kumpera
d4de64ae8d Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-17 21:53:41 +00:00
Nikita Karetnikov
106ccf4a2a [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 21:21:45 +00:00
PyTorch MergeBot
f957334c2b Revert "[pt2] add meta function for linalg.cross (#99279)"
This reverts commit efc3887ea5.

Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird
2023-04-17 19:33:16 +00:00
Nikita Karetnikov
efc3887ea5 [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 03:05:20 +00:00
Rodrigo Kumpera
a910045add [PATCH] Back out "Move functional collectives implementation to python. (#98595) (#99168)
Summary:
Original commit changeset: ba36f8751adc

Original Phabricator Diff: D44788697

Test Plan: model loading is fine after reverting the diff

Reviewed By: zyan0, sayitmemory

Differential Revision: D44921259
---

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168
Approved by: https://github.com/izaitsevfb
2023-04-14 23:48:19 +00:00
XiaobingSuper
9c98f2ceb7 inductor: rewrite mkldnn fx fusion using pattern_matcher(binary) (#97141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97141
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 06:23:03 +00:00
XiaobingSuper
c214c50355 inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_unary) (#97007)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97007
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 05:52:54 +00:00
Guang Yang
c377a8590b Add nonzero_static() op to pytorch to unblock export (#97417)
Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR

Example:

Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])`

call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`
call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)
call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)

Test Plan:
**Unit Tests**
```
buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'
```

**PT2 Export with `nonzero_static()`**
Example of `GraphModule` in the exported graph
```
def forward(self, x):
    arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
    nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4);  arg0 = None
    return pytree.tree_unflatten([nonzero_static_default], self._out_spec)
```

Differential Revision: D44324808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417
Approved by: https://github.com/ezyang
2023-04-11 05:13:36 +00:00
Nikita Karetnikov
b411238d76 [pt2] add meta function for logcumsumexp (#98683)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683
Approved by: https://github.com/ezyang
2023-04-09 01:26:37 +00:00
Rodrigo Kumpera
24d9001527 Move functional collectives implementation to python. (#98595)
This simplifies a lot the work we need to add new ops.

This relands the previous PR, not sure why it was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595
Approved by: https://github.com/wconstab
2023-04-07 21:48:05 +00:00
Nikita Karetnikov
1c226f5aad [pt2] add meta functions for cummax and cummin (#98552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552
Approved by: https://github.com/Chillee
2023-04-07 17:58:28 +00:00
albanD
0210481dcb Fix _like meta registrations (#98160)
The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!).
zeros_like is special due to sparse and is fixed directly by always filling it with zeros.
Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions.
I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal.

test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160
Approved by: https://github.com/ezyang
2023-04-06 18:44:34 +00:00
PyTorch MergeBot
67d1a77086 Revert "Move functional collectives implementation to python. (#98315)"
This reverts commit 8b0374f83c.

Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace
2023-04-06 16:49:40 +00:00
Nikita Karetnikov
7b25976323 [pt2] add meta function for take (#98451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451
Approved by: https://github.com/ezyang
2023-04-06 14:48:35 +00:00
Rodrigo Kumpera
8b0374f83c Move functional collectives implementation to python. (#98315)
This simplifies a lot the work we need to add new ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315
Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze
2023-04-06 14:06:16 +00:00
PyTorch MergeBot
fa08e546f3 Revert "Add all_reduce_coalesced functional collective (#97157)"
This reverts commit a3fc3531f5.

Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
Rodrigo Kumpera
a3fc3531f5 Add all_reduce_coalesced functional collective (#97157)
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.

Might not work if any outputs is unused.

Test code:

```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os

import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
    spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner

def my_fun(a, b):
    c = a * 3
    tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
    return ((tensors[1] + tensors[0] + tensors[2]).sum(), )

@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):

    x = torch.arange(4).cuda() * (dist.get_rank() + 1)
    y = torch.arange(4).cuda() * (dist.get_rank() + 1)
    x = x.to(torch.float)
    y = y.to(torch.float) * 0.5
    res = make_fx(my_fun)(x, y)
    print(f"fx graph:\n{res.graph}")
    ind = compile_fx_inner(res, [x, y])
    print(f"inductor done:\n{ind}")

os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True

if __name__ == "__main__":
    inductor_main(None)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Shen Li
e8d39606eb [SPMD] Enable fused Adam in full train step tracing (#98113)
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
bccf2ef0ce Format DTensor dispatch.py and _meta_registrations.py (#98114)
Format-only changes with black and lintrunner to prepare for the commit on top.

Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
9ec6fdb29b Enable adam foreach in full train step tracing (#97897)
Main changes:

1. Registered several foreach ops to both meta and DTensor
2. Skip redundant getitem node when expanding foreach ops with DTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-03-30 16:47:10 +00:00
Shen Li
379fb47654 [SPMD] Support foreach optimizers with functionalization (#97853)
My first attempt was to apply the same solution as how proxy_tensor.py
handles other inplace ops. However, foreach is different in the way
that it's schema is `native_functions.yaml` does not return anything,
whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks
bdhirsh for teaching me this!). As a result, the proxy output
during tracing does not wrap anything, and hence we cannot correctly
connect it with subsequent operators. Modifying `native_functions.yaml`
is not a preferred solution. After discussing with bdhirsh, the
temporary solution is to do foreach functionalization as a graph
pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852
is addressed, we will switch to default functionalization.

Edit: the latest version follows @bdhirsh 's suggestion on using
`make_fx` `decomposition_table` instead of implementing manual
fx.Graph tranforms to functionalize `_foreach_add_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-03-30 11:27:10 +00:00
Christian Puhrsch
9d37cefcb0 Resubmit _int_mm (#96685)
Avoids any changes to gemm_and_bias

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685
Approved by: https://github.com/drisspg, https://github.com/ngimel
2023-03-27 16:14:07 +00:00
Shen Li
a8f7e0b213 [Easy] Improve error message for meta_mm (#97533)
Differential Revision: [D44376381](https://our.internmc.facebook.com/intern/diff/D44376381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97533
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2023-03-25 01:09:41 +00:00
Driss Guessous
98a5cf090d [SDPA] Remove the chunk_grad from mem-eff attention (#96880)
# Summary

There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized.

I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880
Approved by: https://github.com/cpuhrsch
2023-03-17 21:28:25 +00:00
Nikita Karetnikov
bf08d1387c [primTorch] handle out in sort meta function (#96719)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96719
Approved by: https://github.com/ezyang
2023-03-16 07:38:53 +00:00
Christian Puhrsch
0a53c9624a Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)" (#96885)
Summary:
Backing out  _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885
Approved by: https://github.com/drisspg
2023-03-16 05:32:55 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Nikita Karetnikov
ec536232a3 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-14 06:51:42 +00:00
PyTorch MergeBot
be220690d9 Revert "[primTorch] add meta implementation for upsample_nearest2d_backward (#96612)"
This reverts commit fe180596b8.

Reverted https://github.com/pytorch/pytorch/pull/96612 on behalf of https://github.com/malfet due to broke lint
2023-03-13 03:07:23 +00:00
Nikita Karetnikov
fe180596b8 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-13 00:25:23 +00:00
Nikita Karetnikov
cb7c796b4b Enable min.unary_out (#96441)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96441
Approved by: https://github.com/ngimel
2023-03-11 19:23:33 +00:00
Nikita Karetnikov
0d7c44096a Add baddbmm meta function (#96548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548
Approved by: https://github.com/ezyang
2023-03-11 19:09:24 +00:00
Nikita Karetnikov
8e0d5bf538 [primTorch] add meta implementation for aten.min.dim (#96442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442
Approved by: https://github.com/ngimel
2023-03-11 18:51:51 +00:00
Driss Guessous
11aab72dc9 [SDPA] Add an optional scale kwarg (#95259)
# Summary
This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention()
The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well.

Will reduce the complexity of: #94729 and has been asked for by a couple of users.

# Review Highlights
- As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right?
- I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename.
- 'scale' is interpreted as `Q@K.T * (scale)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259
Approved by: https://github.com/cpuhrsch
2023-03-08 18:07:40 +00:00
Wonjoo Lee
3095c95828 Fixes for PyTorch/XLA functionalization integration (#94537)
Fixes for PyTorch/XLA functionalization integration

---
Some notable changes include:
- More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
- Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them
- Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
- Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
- Update MetaConverter to exclude XLA tensors in raising NotImplemented…
- Add `_propagate_xla_data` op
- Add meta tensor support for some ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537
Approved by: https://github.com/bdhirsh
2023-03-02 23:02:34 +00:00
Wanchao Liang
f397d1700f Inductor reduce_scatter_tensor (#95764)
This adds reduce_scatter to the functional collective and adds the
inductor lowering support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764
Approved by: https://github.com/kumpera
2023-03-02 22:05:30 +00:00
Will Constable
cc6da7b901 Inductor allgather_into_tensor (#95530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530
Approved by: https://github.com/kumpera
2023-02-27 21:38:36 +00:00
Christian Puhrsch
1fe2a9d122 Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)
Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339
Approved by: https://github.com/ngimel, https://github.com/jansel
2023-02-27 20:27:25 +00:00
HELSON
f43ce9553b [meta_tensor] polish error strings in meta registrations (#95052)
I found some error message should be formatted for detailed information. So I polished those error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95052
Approved by: https://github.com/bdhirsh
2023-02-27 20:12:09 +00:00
Edward Z. Yang
4833e47feb Add support for nonzero, some improvements to reduce guards (#95387)
This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#

It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1.

What's in the PR:

* nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question.
* The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise.
* PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`)
* Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`)
* I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful.
* Added `constrain_unify` to let you specify two unbacked SymInts must have the same value

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387
Approved by: https://github.com/voznesenskym
2023-02-24 00:27:45 +00:00
Yanan Cao (PyTorch)
039b4c8809 Add meta function for _upsample_bilinear2d_aa (#94982)
Differential Revision: D43353000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982
Approved by: https://github.com/ezyang
2023-02-19 07:11:20 +00:00
Rodrigo Kumpera
e22d791287 [PTD] Introduce tracing friendly collectives. (#93990)
This change adds torch.distributed.traceable_collectives.

This experimental API enables collectives to be fully traced by dynamo and FX.

See #93173 for the RFC

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990
Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang
2023-02-16 15:35:01 +00:00
PyTorch MergeBot
641dc0b844 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 782e4f5c02.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv
2023-02-13 09:20:37 +00:00
Jerry Zhang
782e4f5c02 [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-10 01:40:12 +00:00
PyTorch MergeBot
3a5a762443 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 3fd46a2f9c.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace 3fd46a2f9c.  Please rebase and re-land it
2023-02-08 18:29:10 +00:00
Jerry Zhang
3fd46a2f9c [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-08 17:26:01 +00:00
Peter Bell
5817695bfa [pt2] Fix arange to match ATen behavior (#93353)
Fixes #92676

`arange` infers the output dtype from the argument types, but in order to reduce
falling back to ATen, inductor preferred to cast whole number float arguments to
int which gave the wrong output dtype. Instead, this decomposes floating point
arange into the prim equivalent for integers.

This also changes the signature of `prims.arange` to

```python
prims.iota(length, *, start, step, **factory_kwargs)
```

which only supports integers arguments. This is done because calculating the
output size from `start, end, step` is surprisingly complex and liable to off by
one errors so should not be duplicated in each backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93353
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-02-03 00:44:32 +00:00
Michael Suo
4e4293f15f Add meta registration for bucketize (#93893)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893
Approved by: https://github.com/zhxchen17
2023-02-02 21:03:08 +00:00
Driss Guessous
653dc73df0 [SDPA] Wire up FlashAttention's backward (#92917)
# Summary
This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml.

The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](33e0860c9c/flash_attn/flash_attn_interface.py (L126)) natively in PyTorch.  One thing that we don't have access to is ctx.save_for_backward in native PyTorch so in order to save these variables I extended the returned objects from the forward functions.

### MetaFunctions
I also updated the FlashAttention meta functions to mirror the real outputs now. As well I added a meta registration for backwards. I have an XLMR training script and while eager training now works with FlashAttention compiling this module fails with the inductor error down below.

### Questions?
Performance issues vs mem efficient when using torch.nn.mha_forward

TorchCompile -> See purposed solution below.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92917
Approved by: https://github.com/cpuhrsch
2023-02-02 04:02:30 +00:00
Driss Guessous
df14650f0b [SDPA] Update SDPA API and make function Public (#92189)
# Summary
In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function.

## Changes
### API
Previously the the function signature was:
`scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)`
Updated signature:
`scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor`

This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor.

#### Reasoning:
The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g.  (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated.

The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing.

Discussed with folks at FAIR/Xformers and +1 this API change.

#### Make function Public
In preparation for the pt 2.0 launch we make the function public to start to generate user feedback

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189
Approved by: https://github.com/cpuhrsch
2023-01-23 20:50:46 +00:00
Brian Hirsh
76cb2d0ede fix incorrect _embedding_bag meta (#92549)
Fixes https://github.com/pytorch/pytorch/issues/92286. See the issue for diagnosis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92549
Approved by: https://github.com/albanD, https://github.com/eellison
2023-01-18 22:50:31 +00:00
Avik Chaudhuri
bb11e072ae Squash and merge linalg meta kernels (#92335)
Squashed changes from https://github.com/pytorch/pytorch/pull/92021 and https://github.com/pytorch/pytorch/pull/92020 and https://github.com/pytorch/pytorch/pull/92019

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92335
Approved by: https://github.com/avikchaudhuri
2023-01-18 05:55:52 +00:00
yanbing-j
94a7c01159 Enable oneDNN implementation in LSTM op (#91158)
### Description
This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported.

### Performance improvement
In CPX 28C, with setting iomp and jemalloc.
We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original.

In single socket:
![image](https://user-images.githubusercontent.com/61222868/211182994-833debec-518a-4b35-8504-6b0fadb17930.png)

![image](https://user-images.githubusercontent.com/61222868/211183012-31e1253f-2c60-4c92-a656-c239a971b453.png)

In single core:
![image](https://user-images.githubusercontent.com/61222868/211183017-186e5d47-cb9a-4c1e-914f-fa718e769f1c.png)

![image](https://user-images.githubusercontent.com/61222868/211183022-53266857-5a9e-4a95-b300-33fa34811d08.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91158
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-01-18 04:41:18 +00:00
Driss Guessous
f219970990 Return empty attention weights when need_atten_weights = False (#91782)
# Summary
This PR updates the second return value from SDPA to return an empty tensor of size 0 not what it would be if need_attn_weights is True. Also updates the meta function to account for this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91782
Approved by: https://github.com/cpuhrsch
2023-01-06 19:06:48 +00:00
Xia, Weiwen
de9c82f41a [Meta] Register aten.pixel_shuffle.default for meta (#91605)
**Summary**
Fixes #91551
`aten.pixel_shuffle.default` is not registered for meta and it always generates contiguous (channels-first) layout of outputs. It can be reproduced by `torch.compile` (as described in the issue #91551) and running in FakeTensorMode.

**Test plan**
python test/inductor/test_torchinductor.py -k test_pixel_shuffle_channels_last
python test/test_proxy_tensor.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91605
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/anijain2305
2023-01-06 00:45:14 +00:00
Peter Bell
ad7aefb608 Fix Meta tests for FFT functions (#91628)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91628
Approved by: https://github.com/kit1980
2023-01-05 00:58:26 +00:00
XiaobingSuper
dfb651452a inductor: meta registration for mkldnn ops (#91299)
Fix https://github.com/pytorch/torchdynamo/issues/198, which supports Meta tensor for conv/linear fused ops to reduce the compilation time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91299
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-01-03 14:24:36 +00:00
Joel Schlosser
1c40ec46ff Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260)
Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260
Approved by: https://github.com/ezyang
2022-12-28 16:03:25 +00:00
Yanbo Liang
789b1437e9 Fix meta registration for aten._cudnn_rnn (#91333)
Found this issue from [weekly running 7k github models](https://github.com/pytorch/torchdynamo/issues/1884). This caused  regression on pass rate, there are 25 models failed due to this issue.
The reason is argument ```cx``` of ```aten._cudnn_rnn``` can be ```None```, but it doesn't handle well in meta registration, so throws the following error:
```
Traceback (most recent call last):
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 1059, in run_node
    return nnmodule(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/rnn.py", line 477, in forward
    result = _VF.rnn_tanh(input, hx, self._flat_weights, self.bias, self.num_layers,
  File "/scratch/ybliang/work/repos/pytorch/torch/_subclasses/fake_tensor.py", line 916, in __torch_dispatch__
    r = func(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_ops.py", line 284, in __call__
    return self._op(*args, **kwargs or {})
  File "/scratch/ybliang/work/repos/pytorch/torch/_meta_registrations.py", line 2108, in _cudnn_rnn
    cy = cx.new_empty(0 if cx is None else cell_shape)
AttributeError: 'NoneType' object has no attribute 'new_empty'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91333
Approved by: https://github.com/ezyang
2022-12-23 22:59:31 +00:00
Brian Hirsh
c47bdd7522 *_scatter ops should preserve input stride/storage_offset (#91029)
It turns out that we *do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029
Approved by: https://github.com/ezyang
2022-12-22 19:41:53 +00:00
Joel Schlosser
3226209636 LSTM SymInt-aware changes & meta registration (cuDNN) (#90944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90944
Approved by: https://github.com/ezyang
2022-12-16 21:42:32 +00:00
Joel Schlosser
b0cda0b38c LSTM SymInt-aware changes & meta registration (non-cuDNN CUDA) (#90701)
Adds meta registrations for cuDNN and vanilla CUDA ops underneath `lstm()` and makes the logic SymInt-aware.
TODO:
* cuDNN side does some [nasty stuff](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/RNN.cpp#L1567) with buffers; this needs larger redesign to figure out
* Indicate that AOT Autograd can be used when an LSTM is present (remove the check for this once it's fully supported)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90701
Approved by: https://github.com/ezyang
2022-12-16 18:08:45 +00:00
Driss Guessous
51c6c5e156 [SDPA] Standardizes the return shape for dense tensor of SDPA regardless of fused kernel called (#90776)
# Summary
Continues to fix up the meta output story of SDPA to be more correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90776
Approved by: https://github.com/cpuhrsch
2022-12-14 18:08:02 +00:00
Driss Guessous
42a5f6ee5d Create stub function for doing SDPA cpp and cuda dispatch (#90576)
## Summary
Torch.compile was previously not working for transformerencoder because torch.SDPA calls a native function on tensors that returns an int. This PR instead creates a dispatch stub for the function called in order to not create a separate fx node for this native function.
As well this pr adds meta functions for the fused kerenels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90576
Approved by: https://github.com/cpuhrsch
2022-12-13 03:19:40 +00:00
Peter Bell
79406378ae [primTorch] Add prim and ref for as_strided_scatter (#88426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88426
Approved by: https://github.com/mruberry
2022-12-08 00:17:39 +00:00
Sherlock Huang
42705bd7b3 Disallow registering meta function for CompositeImplicitAutograd ops (#90222)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90222
Approved by: https://github.com/ezyang
2022-12-06 04:22:31 +00:00
Yanbo Liang
e1532af0bb Fix meta registration for aten._cdist_forward (#90042)
Error from [7k github model](https://github.com/pytorch/torchdynamo/issues/1884).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90042
Approved by: https://github.com/ezyang, https://github.com/eellison
2022-12-02 21:13:52 +00:00
Edward Z. Yang
6fb6eb0a74 Support unspecialized integers with dynamic shapes (#89639)
Previously, we hackily wrapped unspecialized integers into
tensors and treated them as tensor inputs.  Sometimes, downstream
operations would not be able to deal with the tensor input.  Now,
we wrap them into SymInt, so more correct overload selection occurs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
Approved by: https://github.com/anjali411
2022-11-24 22:46:42 +00:00
anjali411
9c0bf9387c Meta impl for linalg_cholesky and linalg_cholesky_ex (#89430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89430
Approved by: https://github.com/ezyang
2022-11-22 17:05:34 +00:00
Edward Z. Yang
5582001bd5 Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038) (#89143)" (#89346)
This reverts commit 8e4c9828f4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346
Approved by: https://github.com/wconstab
2022-11-19 21:14:31 +00:00
PyTorch MergeBot
8e4c9828f4 Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)"
This reverts commit e686b8c3ba.

Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs
2022-11-17 17:02:36 +00:00
Edward Z. Yang
e686b8c3ba Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)
This reverts commit cf6003f046.

Differential Revision: [D41363992](https://our.internmc.facebook.com/intern/diff/D41363992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89143
Approved by: https://github.com/albanD
2022-11-17 13:55:06 +00:00
PyTorch MergeBot
cf6003f046 Revert "Towards unifying symbolic and non symbolic fake tensor (#89038)"
This reverts commit 37d54239c7.

Reverted https://github.com/pytorch/pytorch/pull/89038 on behalf of https://github.com/ezyang due to executorch segfaults
2022-11-16 16:52:47 +00:00
Edward Z. Yang
37d54239c7 Towards unifying symbolic and non symbolic fake tensor (#89038)
Fake tensor behaves pretty differently depending on if you have
symbolic shapes or not.  This leads to bugs; for example, we
weren't getting correct convolution_backward strides because we
bypassed the correct stride logic in fake tensor on symbolic
shapes.

This PR attempts to unify the two codepaths.  I don't manage to
unify everything, but I get most of it.  The algorithm is delicate
and I'm still hosing down test failures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89038
Approved by: https://github.com/anjali411
2022-11-16 14:02:43 +00:00
anjali411
dc40d3f93f Add meta impl for grid_sampler_2d_backward (#88745)
TODO: add an OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88745
Approved by: https://github.com/ezyang
2022-11-16 13:01:47 +00:00
anjali411
52be0c42ab meta function for max_pool2d_with_indices_backward (#88743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88743
Approved by: https://github.com/lezcano, https://github.com/ezyang
2022-11-13 18:31:56 +00:00
anjali411
d615d12289 Add meta impl for topk (#88694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694
Approved by: https://github.com/ezyang
2022-11-11 15:28:41 +00:00
anjali411
fc9e36dd42 Add meta support for scalar_tensor and argmax (#88590)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88590
Approved by: https://github.com/albanD
2022-11-11 01:31:00 +00:00
Sherlock Huang
133e61af7a OpOverload is_view (#88722)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88722
Approved by: https://github.com/ezyang
2022-11-09 19:03:12 +00:00
Edward Z. Yang
d81797e845 Meta function for aten.sort and aten.scatter* (#88705)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88705
Approved by: https://github.com/ezyang
2022-11-09 17:47:14 +00:00
Fabio Rocha
652af5ec15 upsample_*.vec ops are now CompositeImplicit (#85638)
It was previously CompositeExplicit but it was not really necessary.
See discussion in https://github.com/pytorch/pytorch/issues/85405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638
Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel
2022-11-09 09:58:04 +00:00
Edward Z. Yang
f0e6cea2ed Meta registrations for inplace operators (#88678)
Also, handle non-default alpha correctly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88678
Approved by: https://github.com/SherlockNoMad, https://github.com/albanD
2022-11-09 01:27:01 +00:00
Edward Z. Yang
a880ddc164 Meta implementation for unsqueeze_ (#88675)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88675
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:27:01 +00:00
Edward Z. Yang
1dab35ca1b Meta implementation for bernoulli (#88676)
For some reason bernoulli uses legacy memory format, see linked issue.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88676
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:26:58 +00:00
Edward Z. Yang
6bb7f4f29f Minor error message improvements on meta functions (#88677)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88677
Approved by: https://github.com/SherlockNoMad
2022-11-08 19:16:29 +00:00
Edward Z. Yang
245144a636 Propagate layout and pin memory in randint to inner constructor (#88673)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88673
Approved by: https://github.com/anjali411
2022-11-08 18:22:30 +00:00
Sherlock Huang
95d57b54e0 Handle pin_memory in refs.randn (#88473)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88473
Approved by: https://github.com/mruberry
2022-11-07 20:25:56 +00:00
Elias Ellison
7d95b1e344 Run all fallback kernels with FakeTensor (#88248)
This improves the memory compression of resnet18 from .84 -> .94 on inductor no-cudagraphs. It does mean that any extern kernel which incorrectly computes strides will be a hard error at runtime, but that's an issue we are going to have to face with dynamic shapes anyway. CC @ezyang, @SherlockNoMad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88248
Approved by: https://github.com/ezyang
2022-11-04 02:06:38 +00:00
Brian Hirsh
70782981f0 aot_dispatch test fix: always use functionalization in symbolic tests (#87647)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87647
Approved by: https://github.com/ezyang, https://github.com/Chillee
2022-11-02 14:36:49 +00:00
Tugsbayasgalan Manlaibaatar
2c7de4a144 Add meta implementation for aten.max.dim (#88005)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88005
Approved by: https://github.com/Chillee, https://github.com/bdhirsh
2022-11-01 18:37:24 +00:00
Sherlock Huang
c368c0faf0 Fix meta for aten.fill, constant_pad_nd, _adaptive_avg_pool2d (#88069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88069
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-11-01 15:36:06 +00:00
Sherlock Huang
c7ac333430 Fix args for meta__fused_moving_avg_obs_fq_helper (#88058)
Fixes https://github.com/pytorch/torchdynamo/issues/1802

There are a few problems,
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
**UPDATE: This is expected behavior, see discussion below for explanation.**

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor.
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang  who reported this bug
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88058
Approved by: https://github.com/bdhirsh, https://github.com/yanboliang
2022-10-31 19:00:16 +00:00
Sherlock Huang
0a4ca9d083 Fix meta for aten.angle and aten.index_copy (#88066)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88066
Approved by: https://github.com/albanD
2022-10-31 17:11:29 +00:00
Sherlock Huang
e8a97a3721 FakeTensorMode and Prims.add/sub/mul/div support scalar only inputs (#87759)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87759
Approved by: https://github.com/ngimel, https://github.com/mruberry, https://github.com/eellison
2022-10-28 04:34:25 +00:00
Sherlock Huang
b21fe312c0 Fix meta for index_add and index_put (#87775)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87775
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-10-26 20:33:23 +00:00
Sherlock Huang
0b162f5b49 Fix stride for prims.where (#87563)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87563
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-10-25 21:22:50 +00:00
Sherlock Huang
ece3758afc Fix _refs for aten.zeros/ones/empty/randn (#87569)
refs for aten.zeros/ones/empty/randn doesn't support .names overload.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87569
Approved by: https://github.com/ngimel
2022-10-25 20:06:57 +00:00
Sherlock Huang
eb99c1efce Prefer python meta function over c++ meta function (#87426)
This is a policy update for meta registration. **We now prefer python meta implementation over C++ meta function.**  This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist.

Here's the meta registration process:
1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`.  However, they will NOT register them into dispatcher.
2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd.
3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases
- 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta
- 2. the op is a view op, as the MetaTensor doesn't support aliased storage
- 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op)

Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5

cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-10-25 16:49:02 +00:00
lezcano
faf9c47abb Simplify a few diagonal-related functions (#87180)
`diag` was unnecessarily implemented as a kernel rather than as a composite
function, which made it unnecessarily difficult (explicit backward + all it entails).

We also change a few uses of `diag` on 2D tensors for `diagonal()`. The
latter returns a view rather than creating a new tensor.

We also upgrade its meta implementation to a fully-fledged
decomposition

I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry
2022-10-24 06:11:53 +00:00
Sherlock Huang
f3f1b44778 Fix meta for meta_fill_ (#87493)
Existing meta_fill_ doesn't correctly reflect the aliasing relationship for aten.fill. A new MetaTensor should be return instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87493
Approved by: https://github.com/eellison, https://github.com/bdhirsh
2022-10-22 12:41:03 +00:00
Edward Z. Yang
d73d4aa7de Audit for error prone isinstance int/float and add lint (#87345)
We recently fixed a bug on symbolic-shapes branch where
an isinstance(x, int) test failed when passed a SymIntNode.
To prevent this, I've added a lint for all the codepaths
where we may pass SymInt/SymFloat directly to reject
direct isinstance int/float tests, and instead use one of
the aliases.  The lint rule explains the options.  I then
go and fix all of them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-10-21 15:55:24 +00:00
Sherlock Huang
f7da9db9c1 Unify decomp registries into global_decomposition_table (#86857)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857
Approved by: https://github.com/ezyang
2022-10-20 21:29:05 +00:00
albanD
254b681dc6 Convert torch.Size() argument to sym size in test_proxy_tensor (#87304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87304
Approved by: https://github.com/ezyang
2022-10-20 14:20:19 +00:00
anjali411
6351220573 Add meta support for _adaptive_avg_pool2d_backward (#86359) (#87074)
This reverts commit 3edf79dc03.

Reland of https://github.com/pytorch/pytorch/pull/86359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87074
Approved by: https://github.com/ezyang
2022-10-17 16:15:04 +00:00
albanD
b40f4434ac conv backward impl (#87047)
~~Waiting for test run to see if this backward is actually exercised.
If not, I will add test before merging.~~
Test updated. Ready to go now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87047
Approved by: https://github.com/ezyang
2022-10-17 13:14:12 +00:00
albanD
86c2e44cb6 meta funcs for avg_pool2d and avg_pool2d_backward (#87043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87043
Approved by: https://github.com/ezyang
2022-10-17 13:14:10 +00:00
albanD
3a4c0900c7 Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795)
Take 3
Contains:
- symintification of split*
- floor support on SymFloat
- pad_backward, gather, scatter meta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795
Approved by: https://github.com/z-a-f
2022-10-17 02:09:40 +00:00
Brian Hirsh
34c86adec4 symintify all of derivatives.yaml (#86610)
Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing.

* with the exception of `split()`, which is tougher to land because it requires internal changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610
Approved by: https://github.com/albanD
2022-10-14 20:15:48 +00:00