PyTorch MergeBot
98c329b19e
Revert "[core ATen IR] Add decompositions for max, min, var_mean ( #110906 )"
...
This reverts commit 9606cda64e .
Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740 ))
2023-10-11 11:41:21 +00:00
SS-JIA
9606cda64e
[core ATen IR] Add decompositions for max, min, var_mean ( #110906 )
...
## Context
Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators:
```
aten.max(x) -> return aten.amax(x), aten.argmax(x)
aten.min(x) -> return aten.amin(x), aten.argmin(x)
aten.var_mean(x) -> return aten.var(x), aten.mean(x)
```
For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano
Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906
Approved by: https://github.com/manuelcandales
2023-10-11 00:06:24 +00:00
Stephen Jia
c2e7a0d689
[core IR] Add decomps for aten.sum and aten.squeeze variants ( #110645 )
...
Summary:
## Context
Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant.
Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10
Test Plan: Github CI + Meta Internal CI
Differential Revision: D49965952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645
Approved by: https://github.com/peterbell10 , https://github.com/digantdesai , https://github.com/manuelcandales
2023-10-07 04:21:51 +00:00
Stephen Jia
ff96f6d04f
[core IR][reland] Add split.Tensor and unbind decompositions to core ATen decomp table ( #110323 )
...
Summary:
This is a reland of [github PR #110102 ]( https://github.com/pytorch/pytorch/pull/110102 ).
The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions.
Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386) ). Therefore decomposing `lift_fresh` will interfere with constant propogation during export.
Test Plan: Github CI and internal CI
Differential Revision: D49761321
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323
Approved by: https://github.com/jansel
2023-10-03 14:35:04 +00:00
PyTorch MergeBot
e0b035c220
Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table ( #110102 )"
...
This reverts commit 22e706f768 .
Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671 ))
2023-09-28 19:03:25 +00:00
SS-JIA
22e706f768
[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table ( #110102 )
...
## Context
Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly.
One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102
Approved by: https://github.com/jansel
2023-09-28 01:21:45 +00:00
SS-JIA
dec140f1ea
[core IR] Add a core decomposition for aten.all ( #110093 )
...
## Context
Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093
Approved by: https://github.com/manuelcandales , https://github.com/peterbell10 , https://github.com/lezcano
2023-09-27 01:31:41 +00:00
SS-JIA
9928c10e71
[core IR] Add glu as a core decomposition ( #110043 )
...
## Context
Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043
Approved by: https://github.com/peterbell10 , https://github.com/lezcano
ghstack dependencies: #110046
2023-09-27 00:23:05 +00:00
SS-JIA
5df8aca994
[core IR] Add a core decomposition for floor_divide ( #110046 )
...
## Context
Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table.
This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition
```
# TorchInductor-only decomposition. It should not be taken to core.
# See https://github.com/pytorch/torchdynamo/pull/1120
```
but couldn't discern the reason why this is the case. cc: @lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046
Approved by: https://github.com/peterbell10
2023-09-26 08:39:21 +00:00
Mwiza Kunda
5c4b5baf21
Fix python decomps for OpOverloadPackets and add tests ( #107707 )
...
- Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments)
- Add out parameter wrappers to python decomps for aten ops that have out overloads
CC. @ezyang @albanD @lezcano
Fixes #107713
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707
Approved by: https://github.com/lezcano
2023-09-25 20:53:30 +00:00
SS-JIA
7de669f2f9
[core IR] Remove trunc decomp and add trunc to core ( #109902 )
...
Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226 ). Remove the decomposition for `trunc`, and add it as a core operator.
Going forward, provide similar treatment for operators that map cleanly to hardware instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902
Approved by: https://github.com/peterbell10
2023-09-25 18:18:06 +00:00
Mwiza Kunda
6b7b9c796e
Fix registering jit decompositions for jvp for out wrapped decomps ( #109367 )
...
Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since:
- `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter
- `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper
The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367
Approved by: https://github.com/lezcano
2023-09-21 16:36:51 +00:00
Peter Bell
6f0cf5a837
[decomp] Decompose unsafe_split{,_with_sizes} into safe variants ( #109668 )
...
The "safety" aspect refers to the output not being registered as aliasing the
input, but after AOTAutograd I don't think this distinction matters. However,
we shouldn't use the same decomposition as the safe variant in case the backend
doesn't want to decompose split.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668
Approved by: https://github.com/lezcano
ghstack dependencies: #109667
2023-09-20 18:45:56 +00:00
Peter Bell
9e629dd73c
[decomp] Add all std and std_mean overloads to core decompostions ( #109667 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667
Approved by: https://github.com/lezcano
2023-09-20 18:45:56 +00:00
Salil Desai
40b2c796dc
[Decomposition] baddbmm ( #108534 )
...
Summary:
Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions
ff38c0e2f9/torch/_inductor/decomposition.py (L203)
Test Plan: Phabricator + OSS Tests
Differential Revision: D48871741
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534
Approved by: https://github.com/SherlockNoMad
2023-09-20 12:49:32 +00:00
Salil Desai
d0cc623192
[Decomposition] _unsafe_view ( #108713 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091
Differential Revision: D48619079
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-19 13:37:35 +00:00
Salil Desai
2e721aab98
[Decomposition] Trunc ( #109319 )
...
Summary:
Add Decomp for Trunc and add it to core_aten_decompositions
Differential Revision: D49042033
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:30:13 +00:00
Salil Desai
ae66d0b3bf
[Decomposition] clamp_max ( #108718 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855
Differential Revision: D48880026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:25:35 +00:00
Salil Desai
fc47ba2794
[Decomposition] clamp_min ( #108717 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1846
Differential Revision: D48880080
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108717
Approved by: https://github.com/SherlockNoMad
2023-09-18 12:43:58 +00:00
Salil Desai
a6d4cca7c0
[Decomposition] unsafe_split.Tensor ( #108544 )
...
Summary:
Include decomp in core_aten_decompositions
Decomp already exists
https://www.internalfb.com/code/fbsource/[03ff511cad587fc27ed8fd6a54b87845246e8e0c]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=1209
Test Plan: OSS + Phabricator Tests
Differential Revision: D48940445
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108544
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-18 12:43:07 +00:00
Salil Desai
af93b29c5e
[Decomposition] std.correction ( #108733 )
...
Summary:
Include decomp in core_aten_decompositions
Decomp:
https://www.internalfb.com/code/fbsource/[e69bf00ff87a55c9a30bd7905881661ff05fa211]/fbcode/caffe2/torch/_refs/__init__.py?lines=2398
Differential Revision: D48940402
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108733
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-18 11:38:23 +00:00
Ken Jin
c458fa0d35
Decompose/add reference for view_as_complex ( #108005 )
...
Aten source: d4a99631dd/aten/src/ATen/native/ComplexHelper.h (L78)
Documentation reference:
https://pytorch.org/docs/stable/generated/torch.view_as_complex.html
Note: this adds a new primitive `view_of_dtype`, which is trivially implemented, as its meta function is already implemented elsewhere.
Finally, this is not registered as a decomposition (yet), because TorchInductor does not yet support complex types. It should be added once we do.
Closes https://github.com/pytorch/pytorch/issues/108020 as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108005
Approved by: https://github.com/peterbell10 , https://github.com/ezyang
2023-09-07 23:49:20 +00:00
Sam Larsen
27fe45eaf6
[inductor][easy] Enable Mypy Checking for torch/_inductor/decomposition.py ( #108682 )
...
Summary: Looks like one simple type mismatch between `get_decompositions()` and `remove_decompositions()`
Test Plan: `lintrunner torch/_inductor/decomposition.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108682
Approved by: https://github.com/eellison
2023-09-07 00:48:55 +00:00
lezcano
239ee76177
Add refs/decomps for dot/vdot ( #108194 )
...
Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194
Approved by: https://github.com/peterbell10
ghstack dependencies: #108188
2023-08-31 15:30:23 +00:00
rzou
0e4752bafc
Allow registering decomps for HigherOrderOp; add decomp for out_dtype ( #108080 )
...
We allow registering decomps for HigherOrderOp via the existing decomp
mechanisms:
- I refactored those APIs to accept torch._ops.OperatorBase, which is the base
class for torch.ops.HigherOrderOperator and torch.ops.OpOverload
- HigherOrderOps must directly call maybe_handle_decomp in their
ProxyTorchDispatchMode handling in order to resolve decompositions. We
can change this in the future so that they do not need to do this.
Next, we add an inductor decomp for out_dtype. This decomp shouldn't be
generally available because we want to preserve out_dtype to the backend
for other use cases (i.e. executorch).
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080
Approved by: https://github.com/HDCharles
2023-08-31 03:15:38 +00:00
chilli
39130c7433
Add reinplacing pass for scatters + incremental fake tensor updating ( #106192 )
...
mutation for params)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106192
Approved by: https://github.com/jansel , https://github.com/eellison
2023-08-30 20:41:37 +00:00
Mengwei Liu
0fb1c05c5a
[pytorch] Add decomp rule for scaled_dot_product_attention ( #108180 )
...
`scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity.
However recent PRs (#103826 #105131 ) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor.
Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180
Approved by: https://github.com/SherlockNoMad
2023-08-30 15:52:08 +00:00
vfdev-5
0cfc5899f9
[inductor] Improved grid_sampler_2d decomposition for cuda ( #104710 )
...
Description:
- Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two
Related to https://github.com/pytorch/pytorch/issues/104296
Perfs:
- speed-up on cuda (~x5) and cpu (~x2) for bicubic mode
```
Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly"
[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 38.010 (+-0.118) | 51.466 (+-1.257) | 47.867 (+-0.124) | 0.930 (+-0.000) | 33.654 (+-0.411)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 35.532 (+-0.236) | 52.189 (+-0.093) | 58.979 (+-0.206) | 1.130 (+-0.000) | 32.543 (+-0.198)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 38.187 (+-0.112) | 47.892 (+-0.117) | 45.833 (+-0.081) | 0.957 (+-0.000) | 33.752 (+-0.116)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 36.708 (+-0.244) | 51.680 (+-0.104) | 58.360 (+-0.108) | 1.129 (+-0.000) | 32.576 (+-0.751)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 24.201 (+-0.088) | 27.451 (+-0.059) | 27.937 (+-0.081) | 1.018 (+-0.000) | 24.367 (+-0.074)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 19.266 (+-0.105) | 26.070 (+-0.085) | 26.092 (+-0.054) | 1.001 (+-0.000) | 20.144 (+-0.064)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 24.293 (+-0.125) | 26.085 (+-0.064) | 26.575 (+-0.061) | 1.019 (+-0.000) | 24.515 (+-0.095)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 19.440 (+-0.075) | 25.252 (+-0.059) | 25.259 (+-0.051) | 1.000 (+-0.000) | 19.770 (+-0.070)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 114.900 (+-0.508) | 113.416 (+-1.271) | 248.679 (+-1.431) | 2.193 (+-0.000) | 114.609 (+-0.515)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 115.973 (+-0.555) | 124.711 (+-1.596) | 282.187 (+-2.418) | 2.263 (+-0.000) | 115.368 (+-0.652)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 111.730 (+-0.562) | 110.914 (+-0.865) | 253.899 (+-2.226) | 2.289 (+-0.000) | 111.285 (+-1.226)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 112.859 (+-0.487) | 131.696 (+-1.298) | 294.124 (+-1.963) | 2.233 (+-0.000) | 110.910 (+-0.969)
Times are in milliseconds (ms).
[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 228.811 (+-0.037) | 92.990 (+-0.446) | 92.648 (+-0.286) | 0.996 (+-0.000) | 228.274 (+-0.067)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 222.107 (+-0.076) | 93.247 (+-0.387) | 92.528 (+-0.423) | 0.992 (+-0.000) | 221.922 (+-0.297)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 235.654 (+-0.055) | 75.781 (+-0.566) | 115.865 (+-0.419) | 1.529 (+-0.000) | 236.032 (+-0.111)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 226.752 (+-0.088) | 76.312 (+-0.328) | 116.468 (+-0.477) | 1.526 (+-0.000) | 226.950 (+-0.027)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 225.540 (+-0.013) | 75.638 (+-0.341) | 72.621 (+-0.292) | 0.960 (+-0.000) | 225.937 (+-0.017)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 217.425 (+-0.024) | 75.484 (+-0.545) | 73.518 (+-0.296) | 0.974 (+-0.000) | 217.793 (+-0.008)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 231.474 (+-0.020) | 75.972 (+-0.339) | 73.030 (+-0.387) | 0.961 (+-0.000) | 231.991 (+-0.184)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 223.408 (+-0.016) | 75.622 (+-0.279) | 73.542 (+-0.336) | 0.973 (+-0.000) | 223.893 (+-0.021)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 319.382 (+-0.023) | 149.060 (+-0.190) | 772.116 (+-0.266) | 5.180 (+-0.000) | 320.549 (+-0.387)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 319.987 (+-0.134) | 154.443 (+-0.014) | 797.651 (+-0.232) | 5.165 (+-0.000) | 320.665 (+-0.397)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 326.138 (+-0.439) | 149.092 (+-0.036) | 772.508 (+-0.259) | 5.181 (+-0.000) | 325.751 (+-0.398)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 326.024 (+-0.118) | 154.452 (+-0.209) | 797.756 (+-0.229) | 5.165 (+-0.000) | 326.870 (+-0.372)
Times are in microseconds (us).
```
[Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710
Approved by: https://github.com/lezcano
2023-08-29 05:54:24 +00:00
ssjia
86f9fec3ac
Avoid decomposing _unsafe_index in Inductor ( #107882 )
...
`_unsafe_index` was previously added to the core ATen decomp table in https://github.com/pytorch/pytorch/pull/106814 , but this has performance ramifications for Inductor. Therefore, this diff removes it from the decomposition table used by Inductor.
Differential Revision: [D48649210](https://our.internmc.facebook.com/intern/diff/D48649210/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107882
Approved by: https://github.com/SherlockNoMad
2023-08-25 04:51:53 +00:00
lezcano
2c5f96deac
[Inductor] Make softshrink composite implicit ( #107052 )
...
The backward is pretty much equivalent to the one we had written
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107052
Approved by: https://github.com/peterbell10
ghstack dependencies: #107038 , #107039 , #107051
2023-08-14 21:01:50 +00:00
lezcano
3b1254e800
Make hardshrink's decomp composite implicit ( #107039 )
...
The generated code is the same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107039
Approved by: https://github.com/peterbell10
ghstack dependencies: #107038
2023-08-14 21:01:50 +00:00
Sam Larsen
e165938853
Implement decomposition for aten.rrelu_with_noise ( #106812 )
...
Test Plan:
* Primarily, added new test in test/test_decomp.py
* Updated existing tests, e.g., to NOT expect failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812
Approved by: https://github.com/eellison
2023-08-11 19:18:29 +00:00
Stephen Jia
8c8477e55a
Add _unsafe_index decomp ( #106814 )
...
Summary:
Redirect `aten._unsafe_index` to `aten.index` through a decomposition.
Also add it to the list of core decompositions.
Test Plan: contbuild and OSS CI (similar to D40075277)
Differential Revision: D48163393
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106814
Approved by: https://github.com/SherlockNoMad
2023-08-10 23:23:37 +00:00
vfdev-5
35a1913370
[inductor] Added affine_grid_generator decomposition ( #104709 )
...
Description:
- Added affine_grid_generator decomposition
Related to https://github.com/pytorch/pytorch/issues/104296
Fixes https://github.com/pytorch/pytorch/issues/105565
Perfs:
- speed-up on cuda with bilinear and nearest modes
```
Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git3ed904e) PR-afgg" and "Compiled (2.1.0a0+gitbcdd413) Nightly"
[------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cpu ------------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git16df542) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+git16df542) Nightly
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 7.467 (+-0.036) | 11.905 (+-0.276) | 13.391 (+-0.051) | 1.125 (+-0.000) | 7.343 (+-0.036)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 7.722 (+-0.168) | 14.371 (+-0.035) | 15.899 (+-0.038) | 1.106 (+-0.000) | 7.870 (+-0.043)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 7.710 (+-0.051) | 11.354 (+-0.053) | 13.376 (+-0.045) | 1.178 (+-0.000) | 7.698 (+-0.061)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 7.870 (+-0.050) | 13.744 (+-0.237) | 15.206 (+-0.102) | 1.106 (+-0.000) | 7.912 (+-0.039)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 4.738 (+-0.015) | 4.508 (+-0.005) | 6.566 (+-0.027) | 1.456 (+-0.000) | 4.630 (+-0.022)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 4.391 (+-0.010) | 4.860 (+-0.390) | 6.438 (+-0.047) | 1.325 (+-0.000) | 4.458 (+-0.010)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 4.279 (+-0.008) | 4.127 (+-0.010) | 6.598 (+-0.709) | 1.599 (+-0.000) | 5.064 (+-0.025)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 4.537 (+-0.010) | 4.593 (+-0.006) | 6.365 (+-0.104) | 1.386 (+-0.000) | 4.480 (+-0.011)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 26.411 (+-0.066) | 62.275 (+-0.436) | 64.486 (+-0.353) | 1.035 (+-0.000) | 26.210 (+-0.110)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 26.457 (+-0.096) | 72.887 (+-0.247) | 74.207 (+-0.337) | 1.018 (+-0.000) | 25.995 (+-0.120)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 26.457 (+-0.086) | 64.110 (+-0.233) | 66.340 (+-0.406) | 1.035 (+-0.000) | 26.145 (+-0.085)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 26.536 (+-0.094) | 73.742 (+-0.483) | 71.946 (+-0.460) | 0.976 (+-0.000) | 26.457 (+-0.166)
Times are in milliseconds (ms).
[------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cuda -----------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git16df542) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+git16df542) Nightly
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 91.971 (+-0.253) | 90.570 (+-0.193) | 137.206 (+-0.214) | 1.515 (+-0.000) | 84.280 (+-0.241)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 91.893 (+-0.361) | 89.866 (+-0.170) | 136.678 (+-0.471) | 1.521 (+-0.000) | 84.573 (+-0.214)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 116.967 (+-0.481) | 110.468 (+-0.326) | 223.770 (+-0.334) | 2.026 (+-0.000) | 108.098 (+-0.392)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 117.563 (+-0.546) | 111.438 (+-0.212) | 223.101 (+-0.350) | 2.002 (+-0.000) | 108.225 (+-0.395)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 80.706 (+-0.289) | 70.525 (+-0.204) | 143.697 (+-0.311) | 2.038 (+-0.000) | 74.485 (+-0.258)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 80.955 (+-0.208) | 69.986 (+-0.250) | 143.658 (+-0.244) | 2.053 (+-0.000) | 74.163 (+-0.238)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 117.576 (+-0.435) | 71.179 (+-0.412) | 178.515 (+-0.539) | 2.508 (+-0.000) | 108.394 (+-0.473)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 117.441 (+-0.205) | 70.313 (+-0.170) | 178.664 (+-0.555) | 2.541 (+-0.000) | 108.098 (+-0.416)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 92.962 (+-0.509) | 1740.964 (+-0.597) | 1785.401 (+-0.369) | 1.026 (+-0.000) | 92.638 (+-0.539)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 92.928 (+-0.493) | 1401.146 (+-0.732) | 1453.229 (+-0.628) | 1.037 (+-0.000) | 92.458 (+-0.428)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 118.152 (+-0.442) | 1740.644 (+-0.480) | 1793.475 (+-0.458) | 1.030 (+-0.000) | 107.962 (+-0.548)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 118.182 (+-0.425) | 1400.621 (+-0.624) | 1461.796 (+-0.630) | 1.044 (+-0.000) | 107.894 (+-0.994)
Times are in microseconds (us).
```
[Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230801-220216-affine-grid-sampler-PR-afgg-vs-Nightly-speedup.md ), [script](https://github.com/vfdev-5/pth-inductor-dev/blob/master/perf_affine_grid_sampler.py )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104709
Approved by: https://github.com/lezcano
2023-08-10 09:52:48 +00:00
Nikita Karetnikov
45e4706aff
[pt2] add decomps for multilabel_margin_loss_forward ops ( #105302 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
angelayi
fed8d3608d
Update core aten decomp table ( #105673 )
...
Updated the decomposition table based on the existing [Core ATen IR](https://pytorch.org/docs/stable/ir.html ) list, and moved rest of decompositions to inductor's decomposition table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105673
Approved by: https://github.com/SherlockNoMad
2023-07-21 02:45:37 +00:00
Peter Bell
9adfaf8807
[inductor] Add lowering for aten.unfold ( #105165 )
...
The decomposition for unfold uses `as_strided` which forces the input to be
realized. Instead, this implements it as a `GenericView` with reindexing
which removes the need to realize, though it does call `mark_reuse` incase
the input computation is expensive and the windows overlap.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105165
Approved by: https://github.com/lezcano , https://github.com/jansel
2023-07-16 13:09:23 +00:00
William Wen
5cd861fcf7
Add empty/empty_like to core aten decomps ( #105158 )
...
Fixes https://github.com/pytorch/pytorch/issues/104871
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105158
Approved by: https://github.com/SherlockNoMad
2023-07-15 18:48:55 +00:00
Nikita Karetnikov
7e72126487
[pt2] add decomps for multi_margin_loss ops ( #104578 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578
Approved by: https://github.com/ezyang , https://github.com/lezcano
2023-07-14 21:16:09 +00:00
Peter Bell
5c580a9846
[decomp] Add test tracking core ATen operators ( #104262 )
...
This adds an expect-test that finds the set of core ATen operators by
subtracting the operators with decomposition in core_aten_decompositions from the
set of all operators that have decompositions and could be decomposed.
This is useful because if you add a new decomposition but forget to add it to
the list of core decompositions, it will appear in the PR diff.
Also, by going through this list I have identified some operators where the
functional variant is decomposed, but not the inplace variant which must be an
oversight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104262
Approved by: https://github.com/lezcano
2023-07-04 16:41:44 +00:00
David Berard
0b62aca726
Don't decompose aten.bucketize ( #104396 )
...
torch.bucketize takes a tensor of values, and a "boundaries" tensor, which is a sorted list of values that represent buckets. It returns the bucket that each value lies in. E.g. if values = [1, 5, 3, 6] and boundaries=[0, 2, 4, 6, 8], the output will be [1, 3, 2, 4].
The current decomposition of this op doesn't work well with dynamic shapes. It performs a binary search, which bakes in the number of iterations in the binary search and requires recompiling (I don't completely understand why/where this happens). I can't think if whether there's a good way to write a decomposition for this op that will work with dynamic shapes.
Use case: this op is very similar to some operations needed by jagged tensors. As a first step, I want to add a lowering for aten.bucketize and make use of opinfos. #104007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104396
Approved by: https://github.com/Chillee
2023-06-30 05:05:08 +00:00
Peter Bell
8b418f197c
[decomp] Add decomposition for torch.renorm ( #103858 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103858
Approved by: https://github.com/ezyang , https://github.com/nkaretnikov
2023-06-21 20:57:43 +00:00
Peter Bell
591981c5e2
[inductor] Lower diagonal, diagonal_copy and diagonal_scatter ( #103755 )
...
Currently these are decomposed into `as_strided`, which forces a buffer to be
realized. Instead, this lowers them into a native inductor view node and so
doesn't require any buffers to be realized.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103755
Approved by: https://github.com/jansel
2023-06-21 20:16:24 +00:00
Peter Bell
a61096fb94
[decomp] Decompose logaddexp2 ( #103765 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765
Approved by: https://github.com/Chillee
2023-06-21 20:16:24 +00:00
PyTorch MergeBot
7b6dc72ffa
Revert "[decomp] Decompose logaddexp2 ( #103765 )"
...
This reverts commit bab21d20eb .
Reverted https://github.com/pytorch/pytorch/pull/103765 on behalf of https://github.com/ezyang due to looks like land race ([comment](https://github.com/pytorch/pytorch/pull/103765#issuecomment-1599030496 ))
2023-06-20 15:35:02 +00:00
Peter Bell
bab21d20eb
[decomp] Decompose logaddexp2 ( #103765 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765
Approved by: https://github.com/Chillee
2023-06-20 09:24:21 +00:00
Nikita Karetnikov
c3ea8cc58b
[pt2] convert out params in register_meta ( #101344 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344
Approved by: https://github.com/lezcano
2023-05-27 18:38:52 +00:00
vfdev-5
e3d97b6213
[inductor] Added smooth_l1_loss refs ( #102077 )
...
Added `smooth_l1_loss` to refs + tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102077
Approved by: https://github.com/lezcano , https://github.com/ngimel
2023-05-24 15:07:08 +00:00
Bin Bao
b66d7007d8
Add aten.smooth_l1_loss_backward to core_aten_decompositions ( #100267 )
...
Summary: https://github.com/pytorch/pytorch/pull/100242 didn't cover all
test failures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100267
Approved by: https://github.com/jansel
2023-04-28 19:32:17 +00:00
Angela Yi
d06b93b0c7
Decompose arange.default to arange.start_step ( #99739 )
...
The aten op arange.default is not in the core aten IR, and should decompose into the arange.start_step op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99739
Approved by: https://github.com/SherlockNoMad
2023-04-27 19:06:36 +00:00