pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
Kazuaki Ishizaki	fde28fdc8c	Fix typo under torch/_decomp directory (#110821 ) This PR fixes typo of comments in files under `torch/_decomp` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821 Approved by: https://github.com/Skylion007	2023-10-08 20:33:49 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
cdzhan	7cc0020a80	[decomp] Fix different return type in threshold_backward vs. eager (#110689 ) due to type promotion with floating point scalar in decompositions.py Fixes part of #100838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689 Approved by: https://github.com/ezyang	2023-10-06 20:59:58 +00:00
chilli	ceb773b68d	Fix #110680 (requires_grad typo in decomp) (#110687 ) Fixes https://github.com/pytorch/pytorch/issues/110680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687 Approved by: https://github.com/voznesenskym, https://github.com/lezcano ghstack dependencies: #110501, #110504, #110591, #110668	2023-10-06 10:36:01 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Peter Bell	be3b16daad	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-28 21:23:44 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Jijie Wei	334ead04a9	Back out "[decomp] Fix baddbmm decomposition (#109714 )" (#109855 ) Summary: Original commit changeset: 95c462a380c9 Original Phabricator Diff: D49484954 this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/ Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus' https://www.internalfb.com/intern/testinfra/testrun/16888498605839953 Differential Revision: D49527271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855 Approved by: https://github.com/yanboliang	2023-09-22 22:01:38 +00:00
Mwiza Kunda	8dedc9dd9b	Add meta tests for layer/group/batch norm backward (#109591 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591 Approved by: https://github.com/ezyang	2023-09-21 18:58:51 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	36a8105f54	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-20 18:40:21 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Jez Ng	db48bc80d9	Check index size during decomp of index_add (#108826 ) This partially fixes the `test_index_add_correctness` test (#108181) when run under inductor: it causes an exception to be raised [here][1] as expected. The test as a whole still cannot be made to pass under inductor because the [last assert][2] still fails, likely due to #108798. [1]: `dec2b267d4/test/test_torch.py (L6049)` [2]: `dec2b267d4/test/test_torch.py (L6051)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108826 Approved by: https://github.com/eellison	2023-09-13 13:06:26 +00:00
Edward Z. Yang	9f37aec964	Add torch._check_is_size (#108685 ) Check comments for what it does. The key distinction is that if you feed it an unbacked SymInt, we will also apply >= 2 assumption at compile time. This will get exercised when I reland https://github.com/pytorch/pytorch/pull/107788 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108685 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-07 12:48:39 +00:00
Huy Do	5a4fe05a15	Revert "Force synced KJT to trace unbacked SymInt (#107788 )" (#108684 ) This reverts commit `3b92ef814d`. So let's manually revert it instead. (Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684 Approved by: https://github.com/ezyang	2023-09-06 19:15:45 +00:00
Kimish Patel	ebed490c2f	[sdpa decomp] change sdpa decomp to be consistent with flash attention (#108608 ) Summary: See the comment in code for the reasons of the change Test Plan: buck2 test executorch/examples/export/test:test_export -- test_vit_export_to_executorch Differential Revision: D48992180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108608 Approved by: https://github.com/larryliu0820	2023-09-06 15:34:03 +00:00
Edward Z. Yang	3b92ef814d	Force synced KJT to trace unbacked SymInt (#107788 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788 Approved by: https://github.com/voznesenskym	2023-09-06 03:18:26 +00:00
Kimish Patel	cc50e654d4	[aten decomp] Update sdpa decom (#108371 ) Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48917461](https://our.internmc.facebook.com/intern/diff/D48917461) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108371 Approved by: https://github.com/larryliu0820	2023-09-03 15:17:08 +00:00
lezcano	239ee76177	Add refs/decomps for dot/vdot (#108194 ) Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194 Approved by: https://github.com/peterbell10 ghstack dependencies: #108188	2023-08-31 15:30:23 +00:00
rzou	0e4752bafc	Allow registering decomps for HigherOrderOp; add decomp for out_dtype (#108080 ) We allow registering decomps for HigherOrderOp via the existing decomp mechanisms: - I refactored those APIs to accept torch._ops.OperatorBase, which is the base class for torch.ops.HigherOrderOperator and torch.ops.OpOverload - HigherOrderOps must directly call maybe_handle_decomp in their ProxyTorchDispatchMode handling in order to resolve decompositions. We can change this in the future so that they do not need to do this. Next, we add an inductor decomp for out_dtype. This decomp shouldn't be generally available because we want to preserve out_dtype to the backend for other use cases (i.e. executorch). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080 Approved by: https://github.com/HDCharles	2023-08-31 03:15:38 +00:00
Mengwei Liu	0fb1c05c5a	[pytorch] Add decomp rule for scaled_dot_product_attention (#108180 ) `scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity. However recent PRs (#103826 #105131) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor. Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180 Approved by: https://github.com/SherlockNoMad	2023-08-30 15:52:08 +00:00
vfdev-5	0cfc5899f9	[inductor] Improved grid_sampler_2d decomposition for cuda (#104710 ) Description: - Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two Related to https://github.com/pytorch/pytorch/issues/104296 Perfs: - speed-up on cuda (~x5) and cpu (~x2) for bicubic mode ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly" [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 38.010 (+-0.118) \| 51.466 (+-1.257) \| 47.867 (+-0.124) \| 0.930 (+-0.000) \| 33.654 (+-0.411) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 35.532 (+-0.236) \| 52.189 (+-0.093) \| 58.979 (+-0.206) \| 1.130 (+-0.000) \| 32.543 (+-0.198) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 38.187 (+-0.112) \| 47.892 (+-0.117) \| 45.833 (+-0.081) \| 0.957 (+-0.000) \| 33.752 (+-0.116) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 36.708 (+-0.244) \| 51.680 (+-0.104) \| 58.360 (+-0.108) \| 1.129 (+-0.000) \| 32.576 (+-0.751) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 24.201 (+-0.088) \| 27.451 (+-0.059) \| 27.937 (+-0.081) \| 1.018 (+-0.000) \| 24.367 (+-0.074) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 19.266 (+-0.105) \| 26.070 (+-0.085) \| 26.092 (+-0.054) \| 1.001 (+-0.000) \| 20.144 (+-0.064) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 24.293 (+-0.125) \| 26.085 (+-0.064) \| 26.575 (+-0.061) \| 1.019 (+-0.000) \| 24.515 (+-0.095) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 19.440 (+-0.075) \| 25.252 (+-0.059) \| 25.259 (+-0.051) \| 1.000 (+-0.000) \| 19.770 (+-0.070) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 114.900 (+-0.508) \| 113.416 (+-1.271) \| 248.679 (+-1.431) \| 2.193 (+-0.000) \| 114.609 (+-0.515) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 115.973 (+-0.555) \| 124.711 (+-1.596) \| 282.187 (+-2.418) \| 2.263 (+-0.000) \| 115.368 (+-0.652) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 111.730 (+-0.562) \| 110.914 (+-0.865) \| 253.899 (+-2.226) \| 2.289 (+-0.000) \| 111.285 (+-1.226) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 112.859 (+-0.487) \| 131.696 (+-1.298) \| 294.124 (+-1.963) \| 2.233 (+-0.000) \| 110.910 (+-0.969) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 228.811 (+-0.037) \| 92.990 (+-0.446) \| 92.648 (+-0.286) \| 0.996 (+-0.000) \| 228.274 (+-0.067) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 222.107 (+-0.076) \| 93.247 (+-0.387) \| 92.528 (+-0.423) \| 0.992 (+-0.000) \| 221.922 (+-0.297) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 235.654 (+-0.055) \| 75.781 (+-0.566) \| 115.865 (+-0.419) \| 1.529 (+-0.000) \| 236.032 (+-0.111) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 226.752 (+-0.088) \| 76.312 (+-0.328) \| 116.468 (+-0.477) \| 1.526 (+-0.000) \| 226.950 (+-0.027) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 225.540 (+-0.013) \| 75.638 (+-0.341) \| 72.621 (+-0.292) \| 0.960 (+-0.000) \| 225.937 (+-0.017) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 217.425 (+-0.024) \| 75.484 (+-0.545) \| 73.518 (+-0.296) \| 0.974 (+-0.000) \| 217.793 (+-0.008) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 231.474 (+-0.020) \| 75.972 (+-0.339) \| 73.030 (+-0.387) \| 0.961 (+-0.000) \| 231.991 (+-0.184) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 223.408 (+-0.016) \| 75.622 (+-0.279) \| 73.542 (+-0.336) \| 0.973 (+-0.000) \| 223.893 (+-0.021) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 319.382 (+-0.023) \| 149.060 (+-0.190) \| 772.116 (+-0.266) \| 5.180 (+-0.000) \| 320.549 (+-0.387) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 319.987 (+-0.134) \| 154.443 (+-0.014) \| 797.651 (+-0.232) \| 5.165 (+-0.000) \| 320.665 (+-0.397) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 326.138 (+-0.439) \| 149.092 (+-0.036) \| 772.508 (+-0.259) \| 5.181 (+-0.000) \| 325.751 (+-0.398) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 326.024 (+-0.118) \| 154.452 (+-0.209) \| 797.756 (+-0.229) \| 5.165 (+-0.000) \| 326.870 (+-0.372) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710 Approved by: https://github.com/lezcano	2023-08-29 05:54:24 +00:00
Sam Larsen	20f3808aa2	Implement decomposition for aten.tensor_split.tensor_indices_or_sections (#107251 ) Summary: Before this change, the tensor_indices_or_sections variant of aten.tensor_split causes a `RuntimeError: The tensor has a non-zero number of elements` due to that operation needing to introspect data. Decomposing into one of the other two tensor_split variants fixes the problem. Test Plan: Enabled tensor_split tests in test/inductor/test_torchinductor_opinfo.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/107251 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-08-28 17:01:23 +00:00
Vishwa Raj Singh	35de780aa6	Fix Inplace tensor update on transpose (#104689 ) Fixes #https://github.com/pytorch/pytorch/issues/103650 - To align with HPU device backend architecture. Ensure all non-view ops return contiguous fake tensor outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104689 Approved by: https://github.com/ezyang	2023-08-24 16:58:50 +00:00
Andrew Or	64d5851b1f	make python decomp for native_batch_norm CompositeImplicitAutograd, remove native_batch_norm from core aten opset (#107791 ) Summary: (From Brian Hirsh) Description copied from what I put in a comment in this PR: https://github.com/pytorch/pytorch/pull/106329 So, the slightly-contentious idea behind this PR is that lower in the stack, I updated torch._decomps.get_decomps() to check not only the decomp table to see if a given op has a decomposition available, but to also check the dispatcher for any decomps registered to the CompositeImplicitAutograd key (link: https://github.com/pytorch/pytorch/pull/105865/files#diff-7008e894af47c01ee6b8eb94996363bd6c5a43a061a2c13a472a2f8a9242ad43R190) There's one problem though: we don't actually make any hard guarantees that a given key in the dispatcher points does or does not point to a decomposition. We do rely pretty heavily, however, on the fact that everything registered to the CompositeImplicitAutograd key is in fact a decomposition into other ops. QAT would like this API to faithfully return "the set of all decomps that would have run if we had traced through the dispatcher". However, native_batch_norm is an example of an op that has a pre-autograd decomp registered to it (through op.py_impl(), but the decomp is registered directly to the Autograd key instead of being registered to the CompositeImplicitAutograd key. If we want to provide a guarantee to QAT that they can programatically access all decomps that would have run during tracing, then we need to make sure that every decomp we register to the Autograd key is also registered to the CompositeImplicitAutograd key. This might sound kind of painful (since it requires auditing), but I think in practice this basically only applies to native_batch_norm. Test Plan: python test/test_decomp.py Differential Revision: D48607575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107791 Approved by: https://github.com/jerryzh168, https://github.com/SherlockNoMad	2023-08-24 15:19:07 +00:00
Sherlock Huang	ee4b99cc3a	Decomp for aten.dropout (#106274 ) When exporting dropout with cpu tensor, we get following graph module ``` class GraphModule(torch.nn.Module): def forward(self, arg0_1: f32[512, 10]): empty_memory_format: f32[512, 10] = torch.ops.aten.empty.memory_format([512, 10], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False, memory_format = torch.contiguous_format) bernoulli_p: f32[512, 10] = torch.ops.aten.bernoulli.p(empty_memory_format, 0.9); empty_memory_format = None div_scalar: f32[512, 10] = torch.ops.aten.div.Scalar(bernoulli_p, 0.9); bernoulli_p = None mul_tensor: f32[512, 10] = torch.ops.aten.mul.Tensor(arg0_1, div_scalar); arg0_1 = div_scalar = None return (mul_tensor,) ``` In addition, if we export with eval() mode, we will have an empty graph. However, when exporting with cuda tensor, we got ``` class GraphModule(torch.nn.Module): def forward(self, arg0_1: f32[512, 10]): native_dropout_default = torch.ops.aten.native_dropout.default(arg0_1, 0.1, True); arg0_1 = None getitem: f32[512, 10] = native_dropout_default[0]; native_dropout_default = None return (getitem,) ``` and exporting under eval() mode will still have a dropout node in graph. This PR make exporting with CPU tensor also produce aten.native_dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106274 Approved by: https://github.com/ezyang	2023-08-23 21:12:37 +00:00
Edward Z. Yang	5673c0874c	Use expect_true to make split with unbacked sizes work. (#106788 ) This pattern shows up in torchrec KeyedJaggedTensor. Most of the change in this PR is mechanical: whenever we failed an unbacked symint test due to just error checking, replace the conditional with something that calls expect_true (e.g., torch._check or TORCH_SYM_CHECK). Some of the changes are a bit more nuanced, I've commented on the PR accordingly. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106788 Approved by: https://github.com/lezcano ghstack dependencies: #106720	2023-08-15 20:31:30 +00:00
lezcano	2c5f96deac	[Inductor] Make softshrink composite implicit (#107052 ) The backward is pretty much equivalent to the one we had written Pull Request resolved: https://github.com/pytorch/pytorch/pull/107052 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038, #107039, #107051	2023-08-14 21:01:50 +00:00
lezcano	3b1254e800	Make hardshrink's decomp composite implicit (#107039 ) The generated code is the same Pull Request resolved: https://github.com/pytorch/pytorch/pull/107039 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038	2023-08-14 21:01:50 +00:00
Sam Larsen	e165938853	Implement decomposition for aten.rrelu_with_noise (#106812 ) Test Plan: * Primarily, added new test in test/test_decomp.py * Updated existing tests, e.g., to NOT expect failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812 Approved by: https://github.com/eellison	2023-08-11 19:18:29 +00:00
Stephen Jia	8c8477e55a	Add _unsafe_index decomp (#106814 ) Summary: Redirect `aten._unsafe_index` to `aten.index` through a decomposition. Also add it to the list of core decompositions. Test Plan: contbuild and OSS CI (similar to D40075277) Differential Revision: D48163393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106814 Approved by: https://github.com/SherlockNoMad	2023-08-10 23:23:37 +00:00
vfdev-5	35a1913370	[inductor] Added affine_grid_generator decomposition (#104709 ) Description: - Added affine_grid_generator decomposition Related to https://github.com/pytorch/pytorch/issues/104296 Fixes https://github.com/pytorch/pytorch/issues/105565 Perfs: - speed-up on cuda with bilinear and nearest modes ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git3ed904e) PR-afgg" and "Compiled (2.1.0a0+gitbcdd413) Nightly" [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cpu ------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 7.467 (+-0.036) \| 11.905 (+-0.276) \| 13.391 (+-0.051) \| 1.125 (+-0.000) \| 7.343 (+-0.036) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 7.722 (+-0.168) \| 14.371 (+-0.035) \| 15.899 (+-0.038) \| 1.106 (+-0.000) \| 7.870 (+-0.043) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 7.710 (+-0.051) \| 11.354 (+-0.053) \| 13.376 (+-0.045) \| 1.178 (+-0.000) \| 7.698 (+-0.061) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 7.870 (+-0.050) \| 13.744 (+-0.237) \| 15.206 (+-0.102) \| 1.106 (+-0.000) \| 7.912 (+-0.039) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 4.738 (+-0.015) \| 4.508 (+-0.005) \| 6.566 (+-0.027) \| 1.456 (+-0.000) \| 4.630 (+-0.022) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 4.391 (+-0.010) \| 4.860 (+-0.390) \| 6.438 (+-0.047) \| 1.325 (+-0.000) \| 4.458 (+-0.010) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 4.279 (+-0.008) \| 4.127 (+-0.010) \| 6.598 (+-0.709) \| 1.599 (+-0.000) \| 5.064 (+-0.025) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 4.537 (+-0.010) \| 4.593 (+-0.006) \| 6.365 (+-0.104) \| 1.386 (+-0.000) \| 4.480 (+-0.011) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 26.411 (+-0.066) \| 62.275 (+-0.436) \| 64.486 (+-0.353) \| 1.035 (+-0.000) \| 26.210 (+-0.110) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 26.457 (+-0.096) \| 72.887 (+-0.247) \| 74.207 (+-0.337) \| 1.018 (+-0.000) \| 25.995 (+-0.120) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 26.457 (+-0.086) \| 64.110 (+-0.233) \| 66.340 (+-0.406) \| 1.035 (+-0.000) \| 26.145 (+-0.085) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 26.536 (+-0.094) \| 73.742 (+-0.483) \| 71.946 (+-0.460) \| 0.976 (+-0.000) \| 26.457 (+-0.166) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cuda -----------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 91.971 (+-0.253) \| 90.570 (+-0.193) \| 137.206 (+-0.214) \| 1.515 (+-0.000) \| 84.280 (+-0.241) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 91.893 (+-0.361) \| 89.866 (+-0.170) \| 136.678 (+-0.471) \| 1.521 (+-0.000) \| 84.573 (+-0.214) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 116.967 (+-0.481) \| 110.468 (+-0.326) \| 223.770 (+-0.334) \| 2.026 (+-0.000) \| 108.098 (+-0.392) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 117.563 (+-0.546) \| 111.438 (+-0.212) \| 223.101 (+-0.350) \| 2.002 (+-0.000) \| 108.225 (+-0.395) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 80.706 (+-0.289) \| 70.525 (+-0.204) \| 143.697 (+-0.311) \| 2.038 (+-0.000) \| 74.485 (+-0.258) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 80.955 (+-0.208) \| 69.986 (+-0.250) \| 143.658 (+-0.244) \| 2.053 (+-0.000) \| 74.163 (+-0.238) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 117.576 (+-0.435) \| 71.179 (+-0.412) \| 178.515 (+-0.539) \| 2.508 (+-0.000) \| 108.394 (+-0.473) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 117.441 (+-0.205) \| 70.313 (+-0.170) \| 178.664 (+-0.555) \| 2.541 (+-0.000) \| 108.098 (+-0.416) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 92.962 (+-0.509) \| 1740.964 (+-0.597) \| 1785.401 (+-0.369) \| 1.026 (+-0.000) \| 92.638 (+-0.539) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 92.928 (+-0.493) \| 1401.146 (+-0.732) \| 1453.229 (+-0.628) \| 1.037 (+-0.000) \| 92.458 (+-0.428) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 118.152 (+-0.442) \| 1740.644 (+-0.480) \| 1793.475 (+-0.458) \| 1.030 (+-0.000) \| 107.962 (+-0.548) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 118.182 (+-0.425) \| 1400.621 (+-0.624) \| 1461.796 (+-0.630) \| 1.044 (+-0.000) \| 107.894 (+-0.994) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230801-220216-affine-grid-sampler-PR-afgg-vs-Nightly-speedup.md), [script](https://github.com/vfdev-5/pth-inductor-dev/blob/master/perf_affine_grid_sampler.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104709 Approved by: https://github.com/lezcano	2023-08-10 09:52:48 +00:00
Andy Rock	aa1b2f16c5	fix `upsample_nearest` decompositions for `uint8` tensors (#106675 ) Fixes #106674. This PR aligns the implementation of `_compute_upsample_nearest_indices` with `UpSampleKernel.cpp`: `68cb854d73/aten/src/ATen/native/cpu/UpSampleKernel.cpp (L1388-L1393)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106675 Approved by: https://github.com/albanD	2023-08-07 01:52:41 +00:00
Kshiteej K	a899333ffc	fix: nll_loss batch rule with negative ignore_idx (#106118 ) We use python decompositions instead of writing our own for batching rules. Fixes https://github.com/pytorch/pytorch/issues/105736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106118 Approved by: https://github.com/lezcano, https://github.com/zou3519	2023-08-04 07:43:02 +00:00
chunyuan	cb6c3cbc91	inductor: enable weight prepack for LSTM (#103071 ) - Enabled LSTM weight prepack in inductor. - Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different. - Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input. Update 2023-07-26: - https://github.com/pytorch/pytorch/pull/103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in `3b207f7f1c (diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3)`). LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level. - Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module. When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here: `76fb72e24a/torch/nn/utils/stateless.py (L125-L128)` The forward function of LSTM will be called: `76fb72e24a/torch/_functorch/aot_autograd.py (L3379-L3381)` In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`: `76fb72e24a/torch/nn/modules/rnn.py (L775-L778)` The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here: `76fb72e24a/torch/nn/utils/stateless.py (L130-L142)` But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](`76fb72e24a/torch/_functorch/aot_autograd.py (L3695-L3698)`)). At this moment, if we need to recompile the model, `deepcopy` will be called: `76fb72e24a/torch/_dynamo/utils.py (L915-L917)` And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect: `76fb72e24a/torch/_subclasses/meta_utils.py (L514)` Added a fix in the `__get_state__` to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103071 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-07-28 13:54:32 +00:00
lezcano	36ae359655	Update matmul decomp to match eager (#105850 ) The decomposition was not updated after https://github.com/pytorch/pytorch/pull/95261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105850 Approved by: https://github.com/Chillee	2023-07-26 09:24:51 +00:00
Nikita Karetnikov	45e4706aff	[pt2] add decomps for `multilabel_margin_loss_forward` ops (#105302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302 Approved by: https://github.com/ezyang	2023-07-23 02:16:29 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
Yanbo Liang	8daed86e4e	[Inductor] aten.dist decomposition (#105586 ) Fixes #105557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105586 Approved by: https://github.com/desertfire, https://github.com/Chillee	2023-07-20 06:42:44 +00:00
Justin Chu	8a688277a2	[BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432 Approved by: https://github.com/ezyang	2023-07-19 13:48:44 +00:00
QSHLGZ	07108ff1e8	Fix typos under _decomp directory (#105210 ) Fix typos under _decomp directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/105210 Approved by: https://github.com/ezyang, https://github.com/Neilblaze	2023-07-17 11:41:30 +00:00
Nikita Karetnikov	7e72126487	[pt2] add decomps for `multi_margin_loss` ops (#104578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-07-14 21:16:09 +00:00
Adnan Akhundov	4911b80b8e	[inductor] addmm + ReLU / GELU fusion pass (#104132 ) Summary: Add a new path in `post_grad.py` for replacing addmm + ReLU / GELU activation with the corresponding `_addmm_activation` call (with `use_gelu=False` or `True`, respectively). The replacement is done only on `max_autotune_gemm=False` and when the activation is fusible. Test Plan: $ python test/inductor/test_pattern_matcher.py -k test_addmm_activation -v (__main__.TestPaternMatcher.test_addmm_activation) ... /data/users/aakhundov/pytorch/torch/_inductor/compile_fx.py:128: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( Using FallbackKernel: aten._addmm_activation.default Using FallbackKernel: aten._addmm_activation.default /data/users/aakhundov/pytorch/torch/_dynamo/eval_frame.py:373: UserWarning: changing options to `torch.compile()` may require calling `torch._dynamo.reset()` to take effect warnings.warn( frames [('total', 1), ('ok', 1)] stats [('calls_captured', 2), ('unique_graphs', 1)] aot_autograd [('total', 1), ('ok', 1)] inductor [] ok ---------------------------------------------------------------------- Ran 1 test in 13.415s OK Reviewers: @eellison Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104132 Approved by: https://github.com/eellison, https://github.com/jansel	2023-07-10 16:44:14 +00:00
Jerry Zhang	1a661639f7	[quant] Support integer implementations for adaptive_avg_pool2d (#104226 ) Summary: This is needed for representing quantized model in pt2 export quantization flow Test Plan: tested by opinfo, python test/test_ops.py Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104226 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-07-07 19:36:31 +00:00
XiaobingSuper	d3589c9456	reduce computation of batch_norm when weight or bias is none (#104616 ) For batch_norm decomposition, if weight or bias is None, we can skip some computations for better performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104616 Approved by: https://github.com/lezcano, https://github.com/desertfire, https://github.com/jgong5	2023-07-06 00:47:41 +00:00
Peter Bell	61cd605813	[decomp] Don't call .item() in aten.fill.Tensor decomp (#103880 ) Currently calling the fill.Tensor overload under `torch.compile` results in a `DataDependentOutputException` due to the `.item()` call. This instead does a device-device copy which can then be inlined into subsequent inductor kernels as you would expect, e.g. ```python def fn(a): result = torch.deg2rad(a).sin() return torch.empty((128, 128), device=a.device).fill_(result) ``` generates the single kernel ```python @triton.jit def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 16384 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (0)) tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) tmp2 = 0.017453292519943295 tmp3 = tmp1 * tmp2 tmp4 = tl.sin(tmp3) tl.store(out_ptr0 + (x0), tmp4, None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103880 Approved by: https://github.com/Chillee	2023-06-21 18:45:04 +00:00
Kurt Mohler	ee83c646bb	Replace `_prims_common.check` with `torch._check` (#103240 ) This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240 Approved by: https://github.com/albanD	2023-06-21 00:46:17 +00:00
Ivan Zaitsev	821493715c	Back out "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )", Back out "Forwatd fix for D46427687" (#103128 ) Test Plan: revertitparrot Reviewed By: malfet Differential Revision: D46506433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128 Approved by: https://github.com/malfet	2023-06-07 01:41:41 +00:00
Kurt Mohler	a84bb2709a	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-03 02:23:21 +00:00
PyTorch MergeBot	a7efa0ce35	Revert "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )" This reverts commit `fb79d43649`. Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))	2023-06-02 20:00:48 +00:00
Kurt Mohler	fb79d43649	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-02 19:13:45 +00:00
Aleksandar Samardžić	51e0f9e858	Add missing decompositons/lowerings for logical/bitwise operators (#102566 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102566 Approved by: https://github.com/lezcano, https://github.com/alexsio27444, https://github.com/jgong5	2023-06-02 14:27:17 +00:00
Peter Bell	ce42010722	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-24 22:17:32 +00:00
vfdev-5	e3d97b6213	[inductor] Added `smooth_l1_loss` refs (#102077 ) Added `smooth_l1_loss` to refs + tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102077 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-24 15:07:08 +00:00
Matthew Hoffman	29da75cc55	Enable mypy allow redefinition (#102046 ) Related #101528 I tried to enable this in another PR but it uncovered a bunch of type errors: https://github.com/pytorch/pytorch/actions/runs/4999748262/jobs/8956555243?pr=101528#step:10:1305 The goal of this PR is to fix these errors. --- This PR enables [allow_redefinition = True](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern: > Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition. `allow_redefinition` allows mypy to be more flexible by allowing reassignment to an existing variable with a different type... for instance (from the linked PR): `4a1e9230ba/torch/nn/parallel/data_parallel.py (L213)` A `Sequence[Union[int, torch.device]]` is narrowed to `Sequence[int]` thru reassignment to the same variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102046 Approved by: https://github.com/ezyang	2023-05-24 07:05:30 +00:00
PyTorch MergeBot	5147fe4969	Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 )" This reverts commit `b9721bd705`. Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))	2023-05-23 23:06:21 +00:00
Peter Bell	b9721bd705	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-22 20:39:18 +00:00
Jason Ansel	0c6f409cda	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-20 03:43:33 +00:00
lezcano	1930428d89	Minor improvement on the decomposition of upsample_bilinear (#101682 ) This is how it's done in core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101682 Approved by: https://github.com/ngimel	2023-05-18 16:51:51 +00:00
Peter Bell	66e398951a	[inductor/decomp] Add aten._unsafe_index to disable range checks (#101602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101602 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-17 23:36:24 +00:00
PyTorch MergeBot	5f07c589b0	Revert "[inductor] Refactor RNG operators (#100064 )" This reverts commit `3bbf0683a1`. Reverted https://github.com/pytorch/pytorch/pull/100064 on behalf of https://github.com/izaitsevfb due to breaks inductor tests, see D45936056 ([comment](https://github.com/pytorch/pytorch/pull/100064#issuecomment-1552093728))	2023-05-17 21:16:41 +00:00
Jason Ansel	3bbf0683a1	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-17 01:29:31 +00:00
Thibaut Durand	01da732691	Fix type annotation of `torch.split` (#100655 ) The type annotation indicates `list` but the returned type is `tuple` ```python >>> import torch >>> type(torch.arange(10).split(4)) <class 'tuple'> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100655 Approved by: https://github.com/kit1980	2023-05-16 21:35:41 +00:00
Jiong Gong	788ff0623b	[decomp] fix decomp of batch_norm when weight/bias is not flattened (#101059 ) Fix https://github.com/pytorch/pytorch/issues/100970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101059 Approved by: https://github.com/ezyang	2023-05-16 00:00:34 +00:00
Animesh Jain	e1021ec535	[decomp] Bad accuracy for elu_backward (#100284 ) Accuracy is tested by the full model at https://github.com/pytorch/pytorch/issues/100061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100284 Approved by: https://github.com/ngimel	2023-04-29 04:21:20 +00:00
yhl48	07c02b9e92	Add vmap support for `smooth_l1_loss_backward` (#99429 ) Follow-up of #98357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99429 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-04-28 10:58:07 +00:00
Angela Yi	d06b93b0c7	Decompose arange.default to arange.start_step (#99739 ) The aten op arange.default is not in the core aten IR, and should decompose into the arange.start_step op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99739 Approved by: https://github.com/SherlockNoMad	2023-04-27 19:06:36 +00:00
XiaobingSuper	41069f2faa	inductor: align inductor behavior with eager mode for split_with_sizes (#99702 ) Fix https://github.com/pytorch/pytorch/issues/99686, for eager mode, if the given sizes is not meet requirements, it will report an error, but inductor can run, I think we need align inductor behavior with eager mode, the behavior will be like after this PR: ``` Traceback (most recent call last): File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1267, in run_node return node.target(args, kwargs) File "/home/xiaobing/pytorch-offical/torch/functional.py", line 189, in split return tensor.split(split_size_or_sections, dim) File "/home/xiaobing/pytorch-offical/torch/_tensor.py", line 804, in split return torch._VF.split_with_sizes(self, split_size, dim) File "/home/xiaobing/pytorch-offical/torch/utils/_stats.py", line 20, in wrapper return fn(args, *kwargs) File "/home/xiaobing/pytorch-offical/torch/_subclasses/fake_tensor.py", line 1095, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/home/xiaobing/pytorch-offical/torch/_subclasses/fake_tensor.py", line 1259, in dispatch return decomposition_table[func](args, *kwargs) File "/home/xiaobing/pytorch-offical/torch/_decomp/decompositions.py", line 1102, in split_with_sizes raise ValueError( ValueError: Split sizes don't add up to the tensor's size in the given dimension The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1215, in get_fake_value return wrap_fake_exception( File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 835, in wrap_fake_exception return fn() File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1216, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1279, in run_node raise RuntimeError( RuntimeError: Failed running call_function <function split at 0x7f45b8402ee0>((FakeTensor(..., size=(1, 5)), [2, 1, 1]), **{'dim': 1}): Split sizes don't add up to the tensor's size in the given dimension (scroll up for backtrace) The above exception was the direct cause of the following exception: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99702 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/jansel	2023-04-25 01:13:52 +00:00
Nikita Karetnikov	ff825de442	[primTorch] add ref for `cumprod` (#98670 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98670 Approved by: https://github.com/ezyang	2023-04-09 15:22:28 +00:00
albanD	0210481dcb	Fix _like meta registrations (#98160 ) The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!). zeros_like is special due to sparse and is fixed directly by always filling it with zeros. Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions. I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal. test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160 Approved by: https://github.com/ezyang	2023-04-06 18:44:34 +00:00
Kiersten Stokes	cea13ad9fa	Improve size mismatch error messaging referencing mat/vet sizes (#96863 ) Fixes #94841 This fixes the error messages in the following files, the same as those referenced in the linked issue. I was not able to find any additional examples, but am happy to add commits for any that I may have missed! ``` aten/src/ATen/native/Blas.cpp: "size mismatch, got ", self.size(0), ", ", mat.size(0), "x", mat.size(1), ",", vec.size(0)); torch/_decomp/decompositions.py: lambda: f"size mismatch, got {self.size(0)}x{self.size(1)},{vec.size(0)}", ``` Example output for `Blas.cpp` before: ``` size mismatch, got 3, 3x4,1 ``` The new error messages have the following format: ``` aten/src/ATen/native/Blas.cpp: "size mismatch, got bias (", self.size(0), "), matrix (", mat.size(0), "x", mat.size(1), "), vector (", vec.size(0), ")"); torch/_decomp/decompositions.py: lambda: f"size mismatch, got matrix ({self.size(0)}x{self.size(1)}), vector ({vec.size(0)})", ``` Example output for `Blas.cpp` after: ``` size mismatch, got bias (3), matrix (3x4), vector (1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96863 Approved by: https://github.com/albanD	2023-03-17 21:07:48 +00:00
Rohan Gupta	b01d6f2cdb	addmv decomp #2 (#96264 ) Fixes #94617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96264 Approved by: https://github.com/ngimel, https://github.com/ezyang	2023-03-16 23:09:45 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
mingfeima	6d62134f2c	fix aminmax output resize issue when input is a zero dimension tensor (#96171 ) Fix https://github.com/pytorch/pytorch/issues/96042 ### before ``` >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=True) __main__:1: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.) torch.return_types.aminmax( min=tensor([1]), max=tensor([1])) >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=False) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) ``` ### after ``` >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=True) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=False) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) ``` Marked the following test as expected_fail: `test_vmap.py TestVmapOperatorsOpInfoCPU.test_op_has_batch_rule_aminmax_cpu_float32` Given input shape of (2), the loop out is shape (2), the batched vmap out is (2, 1), which mismatched. The loop out will calculate twice on a tensor shape of ( ): without this patch, the output is (1), and then stacked into (2, 1); with this patch, the output is ( ), then stacked into (2). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96171 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/zou3519	2023-03-15 22:44:13 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Jason Ansel	5dd52e250f	[inductor] Add some simple decomps (#96039 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96039 Approved by: https://github.com/ngimel	2023-03-05 17:07:56 +00:00
Natalia Gimelshein	3a7fd20108	fix nll loss decomposition to properly ignore ignore_index (#95833 ) Fixes #95794 This is a hotfix for decomposition only (that is currently used by inductor), reference still accesses invalid indices. Perhaps `_nll_loss_nd` and this decomp should be unified, cc @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/95833 Approved by: https://github.com/lezcano, https://github.com/Chillee	2023-03-02 08:37:56 +00:00
Brian Hirsh	ddd6b53d80	fix embedding_backward_dense decomp with broadcasting (#95499 ) Fixes https://github.com/pytorch/pytorch/issues/95182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95499 Approved by: https://github.com/ezyang, https://github.com/ngimel	2023-02-28 00:24:40 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
Yanan Cao (PyTorch)	039b4c8809	Add meta function for _upsample_bilinear2d_aa (#94982 ) Differential Revision: D43353000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982 Approved by: https://github.com/ezyang	2023-02-19 07:11:20 +00:00
Brian Hirsh	68600fc7c6	avoid extra copies in batchnorm inference by introducing a new op, _native_batch_norm_legit_no_training (#94946 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94946 Approved by: https://github.com/ezyang	2023-02-16 11:41:20 +00:00
Peter Bell	e22e323bea	[decomp] Use var_mean in native_batch_norm decomposition (#94140 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94140 Approved by: https://github.com/ngimel	2023-02-10 15:19:46 +00:00
Horace He	e844120b2f	Fix embedding_dense_backward to not cast indiices to floats (#94572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94572 Approved by: https://github.com/ngimel	2023-02-10 12:44:03 +00:00
lezcano	fe0e28ab87	[decompositions] GRU decompositon with and without packed sequence (#91466 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91466 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	5a7c1b7894	[decompositions] LSTM with packed input (#91465 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91465 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	bef61225c3	[decompositions] add decomposition for RNN with packed sequence (#91281 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91281 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	e5f6e1f660	[decompositions] add LSTM decomp (#91124 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91124 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	20d01d2dc9	[expanded weights] add RNN support via decomp (#91807 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91807 Approved by: https://github.com/albanD	2023-02-08 14:16:30 +00:00
lezcano	c2a92687e0	[decompositions] add RNN decomp and testing (#91123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91123 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
Natalia Gimelshein	8ecda19607	fix upsampling decompositions to have integer output sizes (#94123 ) This allows unet to be compiled with symbolic shapes (but it still fails accuracy, lol). Output sizes are always integer, there's no need to pretend they are ever float. Recomputing scale factors still used nominally float sizes converted to int, we might as well do it from the start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94123 Approved by: https://github.com/ezyang	2023-02-05 04:56:07 +00:00
Joel Schlosser	e5fd7e6d8f	Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 ) For the `crossvit_9_240` model - it works now with dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854 Approved by: https://github.com/ezyang	2023-01-25 05:08:02 +00:00
PyTorch MergeBot	01f1097770	Revert "Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 )" This reverts commit `d49187bf88`. Reverted https://github.com/pytorch/pytorch/pull/92854 on behalf of https://github.com/malfet due to Resulted in 50+% flaky failures in dynamo, reverting	2023-01-25 00:10:14 +00:00
Joel Schlosser	d49187bf88	Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 ) For the `crossvit_9_240` model - it works now with dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854 Approved by: https://github.com/ezyang	2023-01-24 21:36:17 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8f3600b966	[RELAND] Add metadata coverage for unsafe_split and unsafe_split_with_sizes (#92802 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92802 Approved by: https://github.com/soumith	2023-01-23 10:57:10 +00:00
PyTorch MergeBot	0d9de46d9c	Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 )" This reverts commit `36e1f7bc2b`. Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ezyang due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (main.TestEagerFusionOpInfoCPU) is now xpass	2023-01-22 13:57:31 +00:00
Tugsbayasgalan Manlaibaatar	36e1f7bc2b	Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608 Approved by: https://github.com/ngimel	2023-01-22 07:12:29 +00:00
Peter Bell	dd760c98f8	[decomp] Use new squeeze.dims overload in decompositions (#91602 ) This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602 Approved by: https://github.com/ngimel	2023-01-20 18:08:18 +00:00
PyTorch MergeBot	2891cecd8d	Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 )" This reverts commit `4386f317b9`. Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ZainRizvi due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (__main__.TestEagerFusionOpInfoCPU) is failing consistently since this PR was merged	2023-01-20 17:17:35 +00:00
Tugsbayasgalan Manlaibaatar	4386f317b9	Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608 Approved by: https://github.com/ngimel	2023-01-20 12:39:56 +00:00
lezcano	8b861544f9	Remove lowering and decompositions of zero_, zero, zeros_like... in favour of their references (#92071 ) The generated triton code is identical. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92071 Approved by: https://github.com/ngimel	2023-01-18 23:22:36 +00:00
Peter Bell	8770a7ed6f	Decompose more inplace ops (#90967 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90967 Approved by: https://github.com/anijain2305	2023-01-18 21:07:47 +00:00
Peter Bell	4058dedf21	Replace log(1 + x) with log1p(x) (#92114 ) `log1p` offers better precision near zero since `(1 + x) - 1` truncates any values less than the float epsilon to zero. For `soft_margin_loss` this also requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup on CUDA and a 1.1x speedup on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-01-18 10:43:56 +00:00
lezcano	da58f9eb8f	Rewrite out-of-place decompositions in terms of out-of-place ops (#92003 ) Fixes https://github.com/pytorch/torchdynamo/issues/1863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92003 Approved by: https://github.com/ngimel	2023-01-17 16:53:27 +00:00
vfdev-5	5f55335c2e	Fixed output memory format mismatch for bicubic2d (#90470 ) Description: - output memory format is matching input for bicubic2d Problem: output tensor's memory format does not match input format for bicubic2d ```python import torch i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last) assert i.is_contiguous(memory_format=torch.channels_last) o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic") assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})" > AssertionError: Should be channels last but given channels first (True) ``` Related PR fixing bilinear ops: https://github.com/pytorch/pytorch/pull/53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh ) Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev - Updated code to match grad input / output memory formats - temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl` - Updated tests - Added missing forward AD support for bicubic with antialiasing Pull Request resolved: https://github.com/pytorch/pytorch/pull/90470 Approved by: https://github.com/NicolasHug, https://github.com/lezcano	2023-01-12 19:52:28 +00:00
min-jean-cho	af242eedfb	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-11 23:23:42 +00:00
David Berard	d7dc1c2fd5	Support zero dimensions in softmax decompositions (#91322 ) The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including: * decompositions & refs (this was causing dynamo failures) * forward AD for logsumexp * MPS log_softmax_backward This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos. example of "computation along zero dimensions": ```python # example of where import torch t = torch.rand((4, 0, 0)) print("~") print(torch.nn.functional.softmax(t, dim=-1)) # this passes print("~") torch._refs.softmax(t, dim=-1) # this fails print("~") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322 Approved by: https://github.com/lezcano	2023-01-11 09:35:43 +00:00
XiaobingSuper	3790b50505	inductor: fix .to(memort_format) issue which doesn't generate right stride (#91948 ) Motivation: for .to(memory_format), the inductor doesn't generate the right stride, see the following example: ``` class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() def forward(self, x): x = x.to(memory_format=torch.contiguous_format) return x ``` the generated code doesn't do the memory format change and gets a wrong stride (802816, 1, 14336, 256), it is not a contiguous stride. ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() async_compile.wait(globals()) del async_compile def call(args): arg0_1, = args args.clear() return (arg0_1, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1])) ``` After this PR, the will have a memory format change: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0) { #pragma omp parallel num_threads(40) { { #pragma omp for for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<256; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3136; i2+=1) { auto tmp0 = in_ptr0[i1 + (256i2) + (802816i0)]; out_ptr0[i2 + (3136i1) + (802816i0)] = tmp0; } } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, = args args.clear() buf1 = empty_strided((128, 256, 56, 56), (802816, 3136, 56, 1), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) del arg0_1 return (buf1, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91948 Approved by: https://github.com/ngimel	2023-01-11 08:23:26 +00:00
min-jean-cho	364f526b9c	[Inductor] assert generator for random, dropout (#91833 ) See comment https://github.com/pytorch/pytorch/pull/90869#discussion_r1063731541 , https://github.com/pytorch/pytorch/pull/91673#discussion_r1061099337. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91833 Approved by: https://github.com/jansel	2023-01-11 03:24:10 +00:00
PyTorch MergeBot	43050b8301	Revert "[Inductor] Added aten.uniform_ decomp (#90869 )" This reverts commit `c55293d640`. Reverted https://github.com/pytorch/pytorch/pull/90869 on behalf of https://github.com/huydhn due to Crossref error cannot just simply be ignored because it would break trunk for every commits after this, i.e. `fd0030fe74`. The failure would need to be handled gracefully, i.e. adding an XFAIL for example	2023-01-11 01:18:11 +00:00
min-jean-cho	c55293d640	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-10 23:05:01 +00:00
Nikita Karetnikov	00e5f3a9c5	[primTorch] Move `logsumexp` decomp to refs (#91860 ) Fixes #91843. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91860 Approved by: https://github.com/lezcano	2023-01-09 17:00:43 +00:00
Natalia Gimelshein	2c00064113	remove unnecessary decomps (#91828 ) in favor of refs. Generated triton code is the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91828 Approved by: https://github.com/lezcano, https://github.com/soumith	2023-01-07 20:37:12 +00:00
PyTorch MergeBot	c73147f741	Revert "[decomp] Use new squeeze.dims overload in decompositions (#91602 )" This reverts commit `9262ffc692`. Reverted https://github.com/pytorch/pytorch/pull/91602 on behalf of https://github.com/clee2000 due to stacked pr was reverted, this is dependent	2023-01-05 20:39:52 +00:00
Peter Bell	9262ffc692	[decomp] Use new squeeze.dims overload in decompositions (#91602 ) This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602 Approved by: https://github.com/ngimel	2023-01-05 17:59:32 +00:00
lezcano	484dd40022	Implement PReLU in a compositional way (#91238 ) The PReLU implementation was all over the place. This lead to a number of bugs like https://github.com/pytorch/pytorch/issues/68760. We fix it by: - Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel - This second kernel is just a good-ol' pointwise kernel. - We implement the derivative for the pointwise kernel via TI as well for speed. - We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally This fixes a number of issues: - We don't perform copies any more when the inputs are not contiguous - The derivatives are now correct - We fix vmap and many other functorch-related issues. - CPU and CUDA now share the relevant broadcasting logic - The implementation is about 1/3 the length. Fixes https://github.com/pytorch/pytorch/issues/68760 Fixes https://github.com/pytorch/pytorch/issues/89895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238 Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD	2022-12-30 10:42:30 +00:00
Joel Schlosser	8b55b86dbd	Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317 ) This PR moves the definitions for: * `sym_int` * `sym_ceil` (used only for `sym_int`) * `sym_floor` (used only for `sym_int`) * `sym_float` from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined. This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way! Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2022-12-28 16:08:16 +00:00
Joel Schlosser	1c40ec46ff	Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260 ) Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260 Approved by: https://github.com/ezyang	2022-12-28 16:03:25 +00:00
Nikita Shulga	fd3a7264ae	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (take 2) (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-22 08:54:37 +00:00
PyTorch MergeBot	645eda0a00	Revert "[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 )" This reverts commit `371716eb36`. Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops	2022-12-21 19:37:43 +00:00
Nikita Shulga	371716eb36	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Depends on https://github.com/pytorch/pytorch/pull/91203 Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-21 17:33:27 +00:00
Nikita Shulga	46f64117db	[BE] Use `aten` global var (#91188 ) s/torch.ops.aten/aten/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/91188 Approved by: https://github.com/ngimel	2022-12-21 02:28:51 +00:00
Peter Bell	e670c261c5	Decompose fill, zero, and zeros_like (#90968 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90968 Approved by: https://github.com/ngimel	2022-12-21 00:59:50 +00:00
Natalia Gimelshein	e689c50922	Don't recompute var in bn decomp (#90984 ) Fixes https://github.com/pytorch/torchdynamo/issues/1988 Repeated `var` computation is not CSE'd for some reason. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90984 Approved by: https://github.com/Chillee	2022-12-16 21:38:49 +00:00
Brian Hirsh	7a683eaeb8	aot_autograd: add assert for functional-only graph (#88816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-12-16 21:04:36 +00:00
soulitzer	98a9235dce	Fix prelu ref when a.ndim < 2 (#89809 ) Fixes https://github.com/pytorch/pytorch/issues/89560 Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures: - forward AD (fixed in this PR) - vmap (filed https://github.com/pytorch/pytorch/issues/89895) - ref/meta (fixed this PR, though this also regresses nvFuser support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809 Approved by: https://github.com/ngimel	2022-12-12 23:55:31 +00:00
Bin Bao	282dfe8ba4	[inductor][Reland] Use decomposition for _to_copy (#90494 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90494 Approved by: https://github.com/ngimel	2022-12-09 16:51:50 +00:00
PyTorch MergeBot	e89685b0b5	Revert "[inductor] Use decomposition for _to_copy (#90314 )" This reverts commit `3fdb5f2dda`. Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert	2022-12-08 18:29:06 +00:00
Bin Bao	3fdb5f2dda	[inductor] Use decomposition for _to_copy (#90314 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314 Approved by: https://github.com/ngimel	2022-12-08 15:25:44 +00:00
Peter Bell	e6a7278753	Give std/var correction overloads proper defaults (#56398 ) The correction overloads defaults were left off for forward compatibility reasons, but this FC window expired well over a year ago at this point. Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398 Approved by: https://github.com/mruberry	2022-12-07 15:15:00 +00:00
Yanbo Liang	25f39c1bce	Fix uniform ref implementation (#90094 ) Fixes https://github.com/pytorch/torchdynamo/issues/1954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094 Approved by: https://github.com/ngimel	2022-12-06 21:28:17 +00:00
Animesh Jain	c1950620c5	[decomp] Fix native_batch_norm_backward dtype of dweight and dbias (#89740 ) Discovered while debugging an accuracy issue for Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89740 Approved by: https://github.com/soumith, https://github.com/ngimel	2022-11-29 03:15:20 +00:00
Brian Hirsh	e20ec44544	fixes for inductor <> batch norm (#89603 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89603 Approved by: https://github.com/albanD	2022-11-29 02:16:52 +00:00
Jane Xu	8695f0cced	Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697 ) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) Calling for name suggestions! ## test plan I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. ## next steps Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD	2022-11-23 23:23:17 +00:00
Elias Ellison	a8d6b82167	Fix norm decomp when dtype is passed in (#89508 ) Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508 Approved by: https://github.com/anijain2305	2022-11-23 20:49:09 +00:00
Elias Ellison	72110d7833	Fix Upsample Decomp Striding For Small Channels (#89528 ) Fix for https://github.com/pytorch/torchdynamo/issues/623. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528 Approved by: https://github.com/ngimel, https://github.com/anijain2305	2022-11-23 20:47:39 +00:00
lezcano	154e58c032	Add most in-place references/decompositions (#88117 ) We add most in-place references in a generic way. We also implement a wrapper to implement the annoying interface that `nn.functional` nonlinearities have. We fix along the way a couple decompositions for some non-linearities by extending the arguments that the references have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117 Approved by: https://github.com/mruberry	2022-11-18 14:59:46 +00:00
lezcano	3320915303	Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204 ) See the title Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204 Approved by: https://github.com/Chillee	2022-11-16 17:46:54 +00:00
Sherlock Huang	5faa2792fa	Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang	2022-11-15 13:34:45 +00:00
PyTorch MergeBot	eea506aee1	Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 )" This reverts commit `9eabcc370f`. Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken `9eabcc370f`	2022-11-14 01:58:47 +00:00
Sherlock Huang	9eabcc370f	Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang	2022-11-13 21:30:53 +00:00
Horace He	37c5b42fa6	Fix matmul decomp to use reshape instead of contiguous().view() (#88832 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832 Approved by: https://github.com/bertmaher, https://github.com/ngimel	2022-11-12 00:15:42 +00:00
Ryan Spring	534ae6ae47	[primTorch] Implement group norm reference (#87054 ) Add group norm reference Split from #81191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87054 Approved by: https://github.com/mruberry	2022-11-11 01:08:20 +00:00
Sherlock Huang	c00c34fb69	Fix meta for aten.upsample_bilinear2d.vec (#88158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158 Approved by: https://github.com/ngimel	2022-11-02 16:58:29 +00:00
Sherlock Huang	de1f641f11	Fix meta function for aten.addmm (#88068 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068 Approved by: https://github.com/albanD	2022-11-01 17:05:48 +00:00
lezcano	fd27246c16	Fix decomposition for std (#87181 ) The previous implementation was lacking a few features and incurred on a pretty large error cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87181 Approved by: https://github.com/ngimel, https://github.com/peterbell10	2022-10-28 00:50:29 +00:00
Sherlock Huang	eb99c1efce	Prefer python meta function over c++ meta function (#87426 ) This is a policy update for meta registration. We now prefer python meta implementation over C++ meta function. This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist. Here's the meta registration process: 1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`. However, they will NOT register them into dispatcher. 2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd. 3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases - 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta - 2. the op is a view op, as the MetaTensor doesn't support aliased storage - 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op) Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5 cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426 Approved by: https://github.com/ezyang, https://github.com/jansel	2022-10-25 16:49:02 +00:00

1 2 3 4 5 ...

391 Commits