pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit `9606cda64e`. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
Stephen Jia	ff96f6d04f	[core IR][reland] Add `split.Tensor` and `unbind` decompositions to core ATen decomp table (#110323 ) Summary: This is a reland of [github PR #110102]( https://github.com/pytorch/pytorch/pull/110102). The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions. Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](`13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386)`). Therefore decomposing `lift_fresh` will interfere with constant propogation during export. Test Plan: Github CI and internal CI Differential Revision: D49761321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323 Approved by: https://github.com/jansel	2023-10-03 14:35:04 +00:00
PyTorch MergeBot	e0b035c220	Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 )" This reverts commit `22e706f768`. Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671))	2023-09-28 19:03:25 +00:00
SS-JIA	22e706f768	[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 ) ## Context Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly. One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102 Approved by: https://github.com/jansel	2023-09-28 01:21:45 +00:00
SS-JIA	dec140f1ea	[core IR] Add a core decomposition for aten.all (#110093 ) ## Context Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093 Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano	2023-09-27 01:31:41 +00:00
SS-JIA	9928c10e71	[core IR] Add glu as a core decomposition (#110043 ) ## Context Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043 Approved by: https://github.com/peterbell10, https://github.com/lezcano ghstack dependencies: #110046	2023-09-27 00:23:05 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Mwiza Kunda	6b7b9c796e	Fix registering jit decompositions for jvp for out wrapped decomps (#109367 ) Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since: - `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter - `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367 Approved by: https://github.com/lezcano	2023-09-21 16:36:51 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	9e629dd73c	[decomp] Add all std and std_mean overloads to core decompostions (#109667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667 Approved by: https://github.com/lezcano	2023-09-20 18:45:56 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Salil Desai	d0cc623192	[Decomposition] _unsafe_view (#108713 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091 Differential Revision: D48619079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-19 13:37:35 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Salil Desai	ae66d0b3bf	[Decomposition] clamp_max (#108718 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855 Differential Revision: D48880026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:25:35 +00:00
Salil Desai	fc47ba2794	[Decomposition] clamp_min (#108717 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1846 Differential Revision: D48880080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108717 Approved by: https://github.com/SherlockNoMad	2023-09-18 12:43:58 +00:00
Salil Desai	a6d4cca7c0	[Decomposition] unsafe_split.Tensor (#108544 ) Summary: Include decomp in core_aten_decompositions Decomp already exists https://www.internalfb.com/code/fbsource/[03ff511cad587fc27ed8fd6a54b87845246e8e0c]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=1209 Test Plan: OSS + Phabricator Tests Differential Revision: D48940445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108544 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 12:43:07 +00:00
Salil Desai	af93b29c5e	[Decomposition] std.correction (#108733 ) Summary: Include decomp in core_aten_decompositions Decomp: https://www.internalfb.com/code/fbsource/[e69bf00ff87a55c9a30bd7905881661ff05fa211]/fbcode/caffe2/torch/_refs/__init__.py?lines=2398 Differential Revision: D48940402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108733 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 11:38:23 +00:00
Ken Jin	c458fa0d35	Decompose/add reference for `view_as_complex` (#108005 ) Aten source: `d4a99631dd/aten/src/ATen/native/ComplexHelper.h (L78)` Documentation reference: https://pytorch.org/docs/stable/generated/torch.view_as_complex.html Note: this adds a new primitive `view_of_dtype`, which is trivially implemented, as its meta function is already implemented elsewhere. Finally, this is not registered as a decomposition (yet), because TorchInductor does not yet support complex types. It should be added once we do. Closes https://github.com/pytorch/pytorch/issues/108020 as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108005 Approved by: https://github.com/peterbell10, https://github.com/ezyang	2023-09-07 23:49:20 +00:00
Sam Larsen	27fe45eaf6	[inductor][easy] Enable Mypy Checking for torch/_inductor/decomposition.py (#108682 ) Summary: Looks like one simple type mismatch between `get_decompositions()` and `remove_decompositions()` Test Plan: `lintrunner torch/_inductor/decomposition.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108682 Approved by: https://github.com/eellison	2023-09-07 00:48:55 +00:00
lezcano	239ee76177	Add refs/decomps for dot/vdot (#108194 ) Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194 Approved by: https://github.com/peterbell10 ghstack dependencies: #108188	2023-08-31 15:30:23 +00:00
rzou	0e4752bafc	Allow registering decomps for HigherOrderOp; add decomp for out_dtype (#108080 ) We allow registering decomps for HigherOrderOp via the existing decomp mechanisms: - I refactored those APIs to accept torch._ops.OperatorBase, which is the base class for torch.ops.HigherOrderOperator and torch.ops.OpOverload - HigherOrderOps must directly call maybe_handle_decomp in their ProxyTorchDispatchMode handling in order to resolve decompositions. We can change this in the future so that they do not need to do this. Next, we add an inductor decomp for out_dtype. This decomp shouldn't be generally available because we want to preserve out_dtype to the backend for other use cases (i.e. executorch). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080 Approved by: https://github.com/HDCharles	2023-08-31 03:15:38 +00:00
chilli	39130c7433	Add reinplacing pass for scatters + incremental fake tensor updating (#106192 ) mutation for params) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106192 Approved by: https://github.com/jansel, https://github.com/eellison	2023-08-30 20:41:37 +00:00
Mengwei Liu	0fb1c05c5a	[pytorch] Add decomp rule for scaled_dot_product_attention (#108180 ) `scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity. However recent PRs (#103826 #105131) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor. Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180 Approved by: https://github.com/SherlockNoMad	2023-08-30 15:52:08 +00:00
vfdev-5	0cfc5899f9	[inductor] Improved grid_sampler_2d decomposition for cuda (#104710 ) Description: - Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two Related to https://github.com/pytorch/pytorch/issues/104296 Perfs: - speed-up on cuda (~x5) and cpu (~x2) for bicubic mode ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly" [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 38.010 (+-0.118) \| 51.466 (+-1.257) \| 47.867 (+-0.124) \| 0.930 (+-0.000) \| 33.654 (+-0.411) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 35.532 (+-0.236) \| 52.189 (+-0.093) \| 58.979 (+-0.206) \| 1.130 (+-0.000) \| 32.543 (+-0.198) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 38.187 (+-0.112) \| 47.892 (+-0.117) \| 45.833 (+-0.081) \| 0.957 (+-0.000) \| 33.752 (+-0.116) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 36.708 (+-0.244) \| 51.680 (+-0.104) \| 58.360 (+-0.108) \| 1.129 (+-0.000) \| 32.576 (+-0.751) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 24.201 (+-0.088) \| 27.451 (+-0.059) \| 27.937 (+-0.081) \| 1.018 (+-0.000) \| 24.367 (+-0.074) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 19.266 (+-0.105) \| 26.070 (+-0.085) \| 26.092 (+-0.054) \| 1.001 (+-0.000) \| 20.144 (+-0.064) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 24.293 (+-0.125) \| 26.085 (+-0.064) \| 26.575 (+-0.061) \| 1.019 (+-0.000) \| 24.515 (+-0.095) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 19.440 (+-0.075) \| 25.252 (+-0.059) \| 25.259 (+-0.051) \| 1.000 (+-0.000) \| 19.770 (+-0.070) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 114.900 (+-0.508) \| 113.416 (+-1.271) \| 248.679 (+-1.431) \| 2.193 (+-0.000) \| 114.609 (+-0.515) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 115.973 (+-0.555) \| 124.711 (+-1.596) \| 282.187 (+-2.418) \| 2.263 (+-0.000) \| 115.368 (+-0.652) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 111.730 (+-0.562) \| 110.914 (+-0.865) \| 253.899 (+-2.226) \| 2.289 (+-0.000) \| 111.285 (+-1.226) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 112.859 (+-0.487) \| 131.696 (+-1.298) \| 294.124 (+-1.963) \| 2.233 (+-0.000) \| 110.910 (+-0.969) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 228.811 (+-0.037) \| 92.990 (+-0.446) \| 92.648 (+-0.286) \| 0.996 (+-0.000) \| 228.274 (+-0.067) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 222.107 (+-0.076) \| 93.247 (+-0.387) \| 92.528 (+-0.423) \| 0.992 (+-0.000) \| 221.922 (+-0.297) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 235.654 (+-0.055) \| 75.781 (+-0.566) \| 115.865 (+-0.419) \| 1.529 (+-0.000) \| 236.032 (+-0.111) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 226.752 (+-0.088) \| 76.312 (+-0.328) \| 116.468 (+-0.477) \| 1.526 (+-0.000) \| 226.950 (+-0.027) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 225.540 (+-0.013) \| 75.638 (+-0.341) \| 72.621 (+-0.292) \| 0.960 (+-0.000) \| 225.937 (+-0.017) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 217.425 (+-0.024) \| 75.484 (+-0.545) \| 73.518 (+-0.296) \| 0.974 (+-0.000) \| 217.793 (+-0.008) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 231.474 (+-0.020) \| 75.972 (+-0.339) \| 73.030 (+-0.387) \| 0.961 (+-0.000) \| 231.991 (+-0.184) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 223.408 (+-0.016) \| 75.622 (+-0.279) \| 73.542 (+-0.336) \| 0.973 (+-0.000) \| 223.893 (+-0.021) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 319.382 (+-0.023) \| 149.060 (+-0.190) \| 772.116 (+-0.266) \| 5.180 (+-0.000) \| 320.549 (+-0.387) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 319.987 (+-0.134) \| 154.443 (+-0.014) \| 797.651 (+-0.232) \| 5.165 (+-0.000) \| 320.665 (+-0.397) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 326.138 (+-0.439) \| 149.092 (+-0.036) \| 772.508 (+-0.259) \| 5.181 (+-0.000) \| 325.751 (+-0.398) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 326.024 (+-0.118) \| 154.452 (+-0.209) \| 797.756 (+-0.229) \| 5.165 (+-0.000) \| 326.870 (+-0.372) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710 Approved by: https://github.com/lezcano	2023-08-29 05:54:24 +00:00
ssjia	86f9fec3ac	Avoid decomposing `_unsafe_index` in Inductor (#107882 ) `_unsafe_index` was previously added to the core ATen decomp table in https://github.com/pytorch/pytorch/pull/106814, but this has performance ramifications for Inductor. Therefore, this diff removes it from the decomposition table used by Inductor. Differential Revision: [D48649210](https://our.internmc.facebook.com/intern/diff/D48649210/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107882 Approved by: https://github.com/SherlockNoMad	2023-08-25 04:51:53 +00:00
lezcano	2c5f96deac	[Inductor] Make softshrink composite implicit (#107052 ) The backward is pretty much equivalent to the one we had written Pull Request resolved: https://github.com/pytorch/pytorch/pull/107052 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038, #107039, #107051	2023-08-14 21:01:50 +00:00
lezcano	3b1254e800	Make hardshrink's decomp composite implicit (#107039 ) The generated code is the same Pull Request resolved: https://github.com/pytorch/pytorch/pull/107039 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038	2023-08-14 21:01:50 +00:00
Sam Larsen	e165938853	Implement decomposition for aten.rrelu_with_noise (#106812 ) Test Plan: * Primarily, added new test in test/test_decomp.py * Updated existing tests, e.g., to NOT expect failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812 Approved by: https://github.com/eellison	2023-08-11 19:18:29 +00:00
Stephen Jia	8c8477e55a	Add _unsafe_index decomp (#106814 ) Summary: Redirect `aten._unsafe_index` to `aten.index` through a decomposition. Also add it to the list of core decompositions. Test Plan: contbuild and OSS CI (similar to D40075277) Differential Revision: D48163393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106814 Approved by: https://github.com/SherlockNoMad	2023-08-10 23:23:37 +00:00
vfdev-5	35a1913370	[inductor] Added affine_grid_generator decomposition (#104709 ) Description: - Added affine_grid_generator decomposition Related to https://github.com/pytorch/pytorch/issues/104296 Fixes https://github.com/pytorch/pytorch/issues/105565 Perfs: - speed-up on cuda with bilinear and nearest modes ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git3ed904e) PR-afgg" and "Compiled (2.1.0a0+gitbcdd413) Nightly" [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cpu ------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 7.467 (+-0.036) \| 11.905 (+-0.276) \| 13.391 (+-0.051) \| 1.125 (+-0.000) \| 7.343 (+-0.036) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 7.722 (+-0.168) \| 14.371 (+-0.035) \| 15.899 (+-0.038) \| 1.106 (+-0.000) \| 7.870 (+-0.043) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 7.710 (+-0.051) \| 11.354 (+-0.053) \| 13.376 (+-0.045) \| 1.178 (+-0.000) \| 7.698 (+-0.061) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 7.870 (+-0.050) \| 13.744 (+-0.237) \| 15.206 (+-0.102) \| 1.106 (+-0.000) \| 7.912 (+-0.039) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 4.738 (+-0.015) \| 4.508 (+-0.005) \| 6.566 (+-0.027) \| 1.456 (+-0.000) \| 4.630 (+-0.022) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 4.391 (+-0.010) \| 4.860 (+-0.390) \| 6.438 (+-0.047) \| 1.325 (+-0.000) \| 4.458 (+-0.010) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 4.279 (+-0.008) \| 4.127 (+-0.010) \| 6.598 (+-0.709) \| 1.599 (+-0.000) \| 5.064 (+-0.025) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 4.537 (+-0.010) \| 4.593 (+-0.006) \| 6.365 (+-0.104) \| 1.386 (+-0.000) \| 4.480 (+-0.011) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 26.411 (+-0.066) \| 62.275 (+-0.436) \| 64.486 (+-0.353) \| 1.035 (+-0.000) \| 26.210 (+-0.110) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 26.457 (+-0.096) \| 72.887 (+-0.247) \| 74.207 (+-0.337) \| 1.018 (+-0.000) \| 25.995 (+-0.120) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 26.457 (+-0.086) \| 64.110 (+-0.233) \| 66.340 (+-0.406) \| 1.035 (+-0.000) \| 26.145 (+-0.085) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 26.536 (+-0.094) \| 73.742 (+-0.483) \| 71.946 (+-0.460) \| 0.976 (+-0.000) \| 26.457 (+-0.166) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cuda -----------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 91.971 (+-0.253) \| 90.570 (+-0.193) \| 137.206 (+-0.214) \| 1.515 (+-0.000) \| 84.280 (+-0.241) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 91.893 (+-0.361) \| 89.866 (+-0.170) \| 136.678 (+-0.471) \| 1.521 (+-0.000) \| 84.573 (+-0.214) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 116.967 (+-0.481) \| 110.468 (+-0.326) \| 223.770 (+-0.334) \| 2.026 (+-0.000) \| 108.098 (+-0.392) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 117.563 (+-0.546) \| 111.438 (+-0.212) \| 223.101 (+-0.350) \| 2.002 (+-0.000) \| 108.225 (+-0.395) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 80.706 (+-0.289) \| 70.525 (+-0.204) \| 143.697 (+-0.311) \| 2.038 (+-0.000) \| 74.485 (+-0.258) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 80.955 (+-0.208) \| 69.986 (+-0.250) \| 143.658 (+-0.244) \| 2.053 (+-0.000) \| 74.163 (+-0.238) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 117.576 (+-0.435) \| 71.179 (+-0.412) \| 178.515 (+-0.539) \| 2.508 (+-0.000) \| 108.394 (+-0.473) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 117.441 (+-0.205) \| 70.313 (+-0.170) \| 178.664 (+-0.555) \| 2.541 (+-0.000) \| 108.098 (+-0.416) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 92.962 (+-0.509) \| 1740.964 (+-0.597) \| 1785.401 (+-0.369) \| 1.026 (+-0.000) \| 92.638 (+-0.539) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 92.928 (+-0.493) \| 1401.146 (+-0.732) \| 1453.229 (+-0.628) \| 1.037 (+-0.000) \| 92.458 (+-0.428) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 118.152 (+-0.442) \| 1740.644 (+-0.480) \| 1793.475 (+-0.458) \| 1.030 (+-0.000) \| 107.962 (+-0.548) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 118.182 (+-0.425) \| 1400.621 (+-0.624) \| 1461.796 (+-0.630) \| 1.044 (+-0.000) \| 107.894 (+-0.994) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230801-220216-affine-grid-sampler-PR-afgg-vs-Nightly-speedup.md), [script](https://github.com/vfdev-5/pth-inductor-dev/blob/master/perf_affine_grid_sampler.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104709 Approved by: https://github.com/lezcano	2023-08-10 09:52:48 +00:00
Nikita Karetnikov	45e4706aff	[pt2] add decomps for `multilabel_margin_loss_forward` ops (#105302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302 Approved by: https://github.com/ezyang	2023-07-23 02:16:29 +00:00
angelayi	fed8d3608d	Update core aten decomp table (#105673 ) Updated the decomposition table based on the existing [Core ATen IR](https://pytorch.org/docs/stable/ir.html) list, and moved rest of decompositions to inductor's decomposition table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105673 Approved by: https://github.com/SherlockNoMad	2023-07-21 02:45:37 +00:00
Peter Bell	9adfaf8807	[inductor] Add lowering for aten.unfold (#105165 ) The decomposition for unfold uses `as_strided` which forces the input to be realized. Instead, this implements it as a `GenericView` with reindexing which removes the need to realize, though it does call `mark_reuse` incase the input computation is expensive and the windows overlap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105165 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-07-16 13:09:23 +00:00
William Wen	5cd861fcf7	Add empty/empty_like to core aten decomps (#105158 ) Fixes https://github.com/pytorch/pytorch/issues/104871 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105158 Approved by: https://github.com/SherlockNoMad	2023-07-15 18:48:55 +00:00
Nikita Karetnikov	7e72126487	[pt2] add decomps for `multi_margin_loss` ops (#104578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-07-14 21:16:09 +00:00
Peter Bell	5c580a9846	[decomp] Add test tracking core ATen operators (#104262 ) This adds an expect-test that finds the set of core ATen operators by subtracting the operators with decomposition in core_aten_decompositions from the set of all operators that have decompositions and could be decomposed. This is useful because if you add a new decomposition but forget to add it to the list of core decompositions, it will appear in the PR diff. Also, by going through this list I have identified some operators where the functional variant is decomposed, but not the inplace variant which must be an oversight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104262 Approved by: https://github.com/lezcano	2023-07-04 16:41:44 +00:00
David Berard	0b62aca726	Don't decompose aten.bucketize (#104396 ) torch.bucketize takes a tensor of values, and a "boundaries" tensor, which is a sorted list of values that represent buckets. It returns the bucket that each value lies in. E.g. if values = [1, 5, 3, 6] and boundaries=[0, 2, 4, 6, 8], the output will be [1, 3, 2, 4]. The current decomposition of this op doesn't work well with dynamic shapes. It performs a binary search, which bakes in the number of iterations in the binary search and requires recompiling (I don't completely understand why/where this happens). I can't think if whether there's a good way to write a decomposition for this op that will work with dynamic shapes. Use case: this op is very similar to some operations needed by jagged tensors. As a first step, I want to add a lowering for aten.bucketize and make use of opinfos. #104007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104396 Approved by: https://github.com/Chillee	2023-06-30 05:05:08 +00:00
Peter Bell	8b418f197c	[decomp] Add decomposition for torch.renorm (#103858 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103858 Approved by: https://github.com/ezyang, https://github.com/nkaretnikov	2023-06-21 20:57:43 +00:00
Peter Bell	591981c5e2	[inductor] Lower diagonal, diagonal_copy and diagonal_scatter (#103755 ) Currently these are decomposed into `as_strided`, which forces a buffer to be realized. Instead, this lowers them into a native inductor view node and so doesn't require any buffers to be realized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103755 Approved by: https://github.com/jansel	2023-06-21 20:16:24 +00:00
Peter Bell	a61096fb94	[decomp] Decompose logaddexp2 (#103765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765 Approved by: https://github.com/Chillee	2023-06-21 20:16:24 +00:00
PyTorch MergeBot	7b6dc72ffa	Revert "[decomp] Decompose logaddexp2 (#103765 )" This reverts commit `bab21d20eb`. Reverted https://github.com/pytorch/pytorch/pull/103765 on behalf of https://github.com/ezyang due to looks like land race ([comment](https://github.com/pytorch/pytorch/pull/103765#issuecomment-1599030496))	2023-06-20 15:35:02 +00:00
Peter Bell	bab21d20eb	[decomp] Decompose logaddexp2 (#103765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765 Approved by: https://github.com/Chillee	2023-06-20 09:24:21 +00:00
Nikita Karetnikov	c3ea8cc58b	[pt2] convert `out` params in `register_meta` (#101344 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344 Approved by: https://github.com/lezcano	2023-05-27 18:38:52 +00:00
vfdev-5	e3d97b6213	[inductor] Added `smooth_l1_loss` refs (#102077 ) Added `smooth_l1_loss` to refs + tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102077 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-24 15:07:08 +00:00
Bin Bao	b66d7007d8	Add aten.smooth_l1_loss_backward to core_aten_decompositions (#100267 ) Summary: https://github.com/pytorch/pytorch/pull/100242 didn't cover all test failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/100267 Approved by: https://github.com/jansel	2023-04-28 19:32:17 +00:00
Angela Yi	d06b93b0c7	Decompose arange.default to arange.start_step (#99739 ) The aten op arange.default is not in the core aten IR, and should decompose into the arange.start_step op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99739 Approved by: https://github.com/SherlockNoMad	2023-04-27 19:06:36 +00:00

1 2

89 Commits