Isuru Fernando
505574c46a
Add decomposition for torch.block_diag ( #115096 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115096
Approved by: https://github.com/peterbell10
2023-12-11 20:04:22 +00:00
Isuru Fernando
d40a7c6026
Add decompositions for replication_pad ( #115113 )
...
Fixes #115395
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115113
Approved by: https://github.com/peterbell10
2023-12-09 02:44:07 +00:00
Isuru Fernando
fb19947962
Add decompositions for reflection_pad{1, 2, 3}d ( #115100 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115100
Approved by: https://github.com/peterbell10
2023-12-08 23:05:57 +00:00
Kurt Mohler
6f32eb7eef
Add decomp for replication_pad2d and use for CUDA deterministic ( #111590 )
...
Fixes #95578
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590
Approved by: https://github.com/peterbell10
2023-12-01 18:56:09 +00:00
PyTorch MergeBot
013675ff59
Revert "Add decomp for replication_pad2d and use for CUDA deterministic ( #111590 )"
...
This reverts commit f1286161a6 .
Reverted https://github.com/pytorch/pytorch/pull/111590 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing XLA job. The job is also failing on the PR, but the log classifier failed to find the failed test which lead to it being marked wrongly as flaky ([comment](https://github.com/pytorch/pytorch/pull/111590#issuecomment-1833004794 ))
2023-11-30 02:28:14 +00:00
Kurt Mohler
f1286161a6
Add decomp for replication_pad2d and use for CUDA deterministic ( #111590 )
...
Fixes #95578
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590
Approved by: https://github.com/peterbell10
2023-11-29 21:50:46 +00:00
PyTorch MergeBot
fe428a284b
Revert "Add torch._lazy_clone to create COW tensors ( #113397 )"
...
This reverts commit 9916d8a9ea .
Reverted https://github.com/pytorch/pytorch/pull/113397 on behalf of https://github.com/DanilBaibak due to Unfortunately, I need to revert your PR because the lower [PR in the stack](https://github.com/pytorch/pytorch/pull/113396 ) is failing a bunch of internal build jobs. ([comment](https://github.com/pytorch/pytorch/pull/113397#issuecomment-1818761224 ))
2023-11-20 10:21:09 +00:00
Kurt Mohler
9916d8a9ea
Add torch._lazy_clone to create COW tensors ( #113397 )
...
Part of #109833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113397
Approved by: https://github.com/ezyang
ghstack dependencies: #113396
2023-11-17 01:58:51 +00:00
Han Qi
5a6f8014c4
Add a decomposition for _weight_norm_interface. ( #112193 )
...
Fixes #112086
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112193
Approved by: https://github.com/ezyang
2023-11-01 19:51:11 +00:00
Peter Bell
04024926f4
Use pytree.tree_map_ everywhere ( #112417 )
...
Wherever we discard the output of `tree_map` it's better to call `tree_map_`
which doesn't unflatten the mapped results and so is a lot cheaper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112417
Approved by: https://github.com/lezcano
ghstack dependencies: #112391 , #112392 , #112393 , #112394
2023-10-31 15:57:06 +00:00
PyTorch MergeBot
98c329b19e
Revert "[core ATen IR] Add decompositions for max, min, var_mean ( #110906 )"
...
This reverts commit 9606cda64e .
Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740 ))
2023-10-11 11:41:21 +00:00
SS-JIA
9606cda64e
[core ATen IR] Add decompositions for max, min, var_mean ( #110906 )
...
## Context
Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators:
```
aten.max(x) -> return aten.amax(x), aten.argmax(x)
aten.min(x) -> return aten.amin(x), aten.argmin(x)
aten.var_mean(x) -> return aten.var(x), aten.mean(x)
```
For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano
Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906
Approved by: https://github.com/manuelcandales
2023-10-11 00:06:24 +00:00
Stephen Jia
c2e7a0d689
[core IR] Add decomps for aten.sum and aten.squeeze variants ( #110645 )
...
Summary:
## Context
Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant.
Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10
Test Plan: Github CI + Meta Internal CI
Differential Revision: D49965952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645
Approved by: https://github.com/peterbell10 , https://github.com/digantdesai , https://github.com/manuelcandales
2023-10-07 04:21:51 +00:00
Stephen Jia
ff96f6d04f
[core IR][reland] Add split.Tensor and unbind decompositions to core ATen decomp table ( #110323 )
...
Summary:
This is a reland of [github PR #110102 ]( https://github.com/pytorch/pytorch/pull/110102 ).
The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions.
Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386) ). Therefore decomposing `lift_fresh` will interfere with constant propogation during export.
Test Plan: Github CI and internal CI
Differential Revision: D49761321
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323
Approved by: https://github.com/jansel
2023-10-03 14:35:04 +00:00
PyTorch MergeBot
e0b035c220
Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table ( #110102 )"
...
This reverts commit 22e706f768 .
Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671 ))
2023-09-28 19:03:25 +00:00
SS-JIA
22e706f768
[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table ( #110102 )
...
## Context
Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly.
One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102
Approved by: https://github.com/jansel
2023-09-28 01:21:45 +00:00
SS-JIA
dec140f1ea
[core IR] Add a core decomposition for aten.all ( #110093 )
...
## Context
Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093
Approved by: https://github.com/manuelcandales , https://github.com/peterbell10 , https://github.com/lezcano
2023-09-27 01:31:41 +00:00
SS-JIA
9928c10e71
[core IR] Add glu as a core decomposition ( #110043 )
...
## Context
Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043
Approved by: https://github.com/peterbell10 , https://github.com/lezcano
ghstack dependencies: #110046
2023-09-27 00:23:05 +00:00
SS-JIA
5df8aca994
[core IR] Add a core decomposition for floor_divide ( #110046 )
...
## Context
Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table.
This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition
```
# TorchInductor-only decomposition. It should not be taken to core.
# See https://github.com/pytorch/torchdynamo/pull/1120
```
but couldn't discern the reason why this is the case. cc: @lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046
Approved by: https://github.com/peterbell10
2023-09-26 08:39:21 +00:00
Mwiza Kunda
5c4b5baf21
Fix python decomps for OpOverloadPackets and add tests ( #107707 )
...
- Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments)
- Add out parameter wrappers to python decomps for aten ops that have out overloads
CC. @ezyang @albanD @lezcano
Fixes #107713
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707
Approved by: https://github.com/lezcano
2023-09-25 20:53:30 +00:00
SS-JIA
7de669f2f9
[core IR] Remove trunc decomp and add trunc to core ( #109902 )
...
Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226 ). Remove the decomposition for `trunc`, and add it as a core operator.
Going forward, provide similar treatment for operators that map cleanly to hardware instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902
Approved by: https://github.com/peterbell10
2023-09-25 18:18:06 +00:00
Mwiza Kunda
6b7b9c796e
Fix registering jit decompositions for jvp for out wrapped decomps ( #109367 )
...
Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since:
- `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter
- `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper
The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367
Approved by: https://github.com/lezcano
2023-09-21 16:36:51 +00:00
Peter Bell
6f0cf5a837
[decomp] Decompose unsafe_split{,_with_sizes} into safe variants ( #109668 )
...
The "safety" aspect refers to the output not being registered as aliasing the
input, but after AOTAutograd I don't think this distinction matters. However,
we shouldn't use the same decomposition as the safe variant in case the backend
doesn't want to decompose split.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668
Approved by: https://github.com/lezcano
ghstack dependencies: #109667
2023-09-20 18:45:56 +00:00
Peter Bell
9e629dd73c
[decomp] Add all std and std_mean overloads to core decompostions ( #109667 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667
Approved by: https://github.com/lezcano
2023-09-20 18:45:56 +00:00
Salil Desai
40b2c796dc
[Decomposition] baddbmm ( #108534 )
...
Summary:
Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions
ff38c0e2f9/torch/_inductor/decomposition.py (L203)
Test Plan: Phabricator + OSS Tests
Differential Revision: D48871741
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534
Approved by: https://github.com/SherlockNoMad
2023-09-20 12:49:32 +00:00
Salil Desai
d0cc623192
[Decomposition] _unsafe_view ( #108713 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091
Differential Revision: D48619079
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-19 13:37:35 +00:00
Salil Desai
2e721aab98
[Decomposition] Trunc ( #109319 )
...
Summary:
Add Decomp for Trunc and add it to core_aten_decompositions
Differential Revision: D49042033
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:30:13 +00:00
Salil Desai
ae66d0b3bf
[Decomposition] clamp_max ( #108718 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855
Differential Revision: D48880026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:25:35 +00:00
Salil Desai
fc47ba2794
[Decomposition] clamp_min ( #108717 )
...
Summary:
Decomp already exists so just add it to core_aten_decompositions
https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1846
Differential Revision: D48880080
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108717
Approved by: https://github.com/SherlockNoMad
2023-09-18 12:43:58 +00:00
Salil Desai
a6d4cca7c0
[Decomposition] unsafe_split.Tensor ( #108544 )
...
Summary:
Include decomp in core_aten_decompositions
Decomp already exists
https://www.internalfb.com/code/fbsource/[03ff511cad587fc27ed8fd6a54b87845246e8e0c]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=1209
Test Plan: OSS + Phabricator Tests
Differential Revision: D48940445
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108544
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-18 12:43:07 +00:00
Salil Desai
af93b29c5e
[Decomposition] std.correction ( #108733 )
...
Summary:
Include decomp in core_aten_decompositions
Decomp:
https://www.internalfb.com/code/fbsource/[e69bf00ff87a55c9a30bd7905881661ff05fa211]/fbcode/caffe2/torch/_refs/__init__.py?lines=2398
Differential Revision: D48940402
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108733
Approved by: https://github.com/larryliu0820 , https://github.com/SherlockNoMad
2023-09-18 11:38:23 +00:00
Ken Jin
c458fa0d35
Decompose/add reference for view_as_complex ( #108005 )
...
Aten source: d4a99631dd/aten/src/ATen/native/ComplexHelper.h (L78)
Documentation reference:
https://pytorch.org/docs/stable/generated/torch.view_as_complex.html
Note: this adds a new primitive `view_of_dtype`, which is trivially implemented, as its meta function is already implemented elsewhere.
Finally, this is not registered as a decomposition (yet), because TorchInductor does not yet support complex types. It should be added once we do.
Closes https://github.com/pytorch/pytorch/issues/108020 as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108005
Approved by: https://github.com/peterbell10 , https://github.com/ezyang
2023-09-07 23:49:20 +00:00
Sam Larsen
27fe45eaf6
[inductor][easy] Enable Mypy Checking for torch/_inductor/decomposition.py ( #108682 )
...
Summary: Looks like one simple type mismatch between `get_decompositions()` and `remove_decompositions()`
Test Plan: `lintrunner torch/_inductor/decomposition.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108682
Approved by: https://github.com/eellison
2023-09-07 00:48:55 +00:00
lezcano
239ee76177
Add refs/decomps for dot/vdot ( #108194 )
...
Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194
Approved by: https://github.com/peterbell10
ghstack dependencies: #108188
2023-08-31 15:30:23 +00:00
rzou
0e4752bafc
Allow registering decomps for HigherOrderOp; add decomp for out_dtype ( #108080 )
...
We allow registering decomps for HigherOrderOp via the existing decomp
mechanisms:
- I refactored those APIs to accept torch._ops.OperatorBase, which is the base
class for torch.ops.HigherOrderOperator and torch.ops.OpOverload
- HigherOrderOps must directly call maybe_handle_decomp in their
ProxyTorchDispatchMode handling in order to resolve decompositions. We
can change this in the future so that they do not need to do this.
Next, we add an inductor decomp for out_dtype. This decomp shouldn't be
generally available because we want to preserve out_dtype to the backend
for other use cases (i.e. executorch).
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080
Approved by: https://github.com/HDCharles
2023-08-31 03:15:38 +00:00
chilli
39130c7433
Add reinplacing pass for scatters + incremental fake tensor updating ( #106192 )
...
mutation for params)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106192
Approved by: https://github.com/jansel , https://github.com/eellison
2023-08-30 20:41:37 +00:00
Mengwei Liu
0fb1c05c5a
[pytorch] Add decomp rule for scaled_dot_product_attention ( #108180 )
...
`scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity.
However recent PRs (#103826 #105131 ) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor.
Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180
Approved by: https://github.com/SherlockNoMad
2023-08-30 15:52:08 +00:00
vfdev-5
0cfc5899f9
[inductor] Improved grid_sampler_2d decomposition for cuda ( #104710 )
...
Description:
- Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two
Related to https://github.com/pytorch/pytorch/issues/104296
Perfs:
- speed-up on cuda (~x5) and cpu (~x2) for bicubic mode
```
Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly"
[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 38.010 (+-0.118) | 51.466 (+-1.257) | 47.867 (+-0.124) | 0.930 (+-0.000) | 33.654 (+-0.411)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 35.532 (+-0.236) | 52.189 (+-0.093) | 58.979 (+-0.206) | 1.130 (+-0.000) | 32.543 (+-0.198)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 38.187 (+-0.112) | 47.892 (+-0.117) | 45.833 (+-0.081) | 0.957 (+-0.000) | 33.752 (+-0.116)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 36.708 (+-0.244) | 51.680 (+-0.104) | 58.360 (+-0.108) | 1.129 (+-0.000) | 32.576 (+-0.751)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 24.201 (+-0.088) | 27.451 (+-0.059) | 27.937 (+-0.081) | 1.018 (+-0.000) | 24.367 (+-0.074)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 19.266 (+-0.105) | 26.070 (+-0.085) | 26.092 (+-0.054) | 1.001 (+-0.000) | 20.144 (+-0.064)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 24.293 (+-0.125) | 26.085 (+-0.064) | 26.575 (+-0.061) | 1.019 (+-0.000) | 24.515 (+-0.095)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 19.440 (+-0.075) | 25.252 (+-0.059) | 25.259 (+-0.051) | 1.000 (+-0.000) | 19.770 (+-0.070)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 114.900 (+-0.508) | 113.416 (+-1.271) | 248.679 (+-1.431) | 2.193 (+-0.000) | 114.609 (+-0.515)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 115.973 (+-0.555) | 124.711 (+-1.596) | 282.187 (+-2.418) | 2.263 (+-0.000) | 115.368 (+-0.652)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 111.730 (+-0.562) | 110.914 (+-0.865) | 253.899 (+-2.226) | 2.289 (+-0.000) | 111.285 (+-1.226)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 112.859 (+-0.487) | 131.696 (+-1.298) | 294.124 (+-1.963) | 2.233 (+-0.000) | 110.910 (+-0.969)
Times are in milliseconds (ms).
[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 228.811 (+-0.037) | 92.990 (+-0.446) | 92.648 (+-0.286) | 0.996 (+-0.000) | 228.274 (+-0.067)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 222.107 (+-0.076) | 93.247 (+-0.387) | 92.528 (+-0.423) | 0.992 (+-0.000) | 221.922 (+-0.297)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 235.654 (+-0.055) | 75.781 (+-0.566) | 115.865 (+-0.419) | 1.529 (+-0.000) | 236.032 (+-0.111)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 226.752 (+-0.088) | 76.312 (+-0.328) | 116.468 (+-0.477) | 1.526 (+-0.000) | 226.950 (+-0.027)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 225.540 (+-0.013) | 75.638 (+-0.341) | 72.621 (+-0.292) | 0.960 (+-0.000) | 225.937 (+-0.017)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 217.425 (+-0.024) | 75.484 (+-0.545) | 73.518 (+-0.296) | 0.974 (+-0.000) | 217.793 (+-0.008)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 231.474 (+-0.020) | 75.972 (+-0.339) | 73.030 (+-0.387) | 0.961 (+-0.000) | 231.991 (+-0.184)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 223.408 (+-0.016) | 75.622 (+-0.279) | 73.542 (+-0.336) | 0.973 (+-0.000) | 223.893 (+-0.021)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 319.382 (+-0.023) | 149.060 (+-0.190) | 772.116 (+-0.266) | 5.180 (+-0.000) | 320.549 (+-0.387)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 319.987 (+-0.134) | 154.443 (+-0.014) | 797.651 (+-0.232) | 5.165 (+-0.000) | 320.665 (+-0.397)
Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 326.138 (+-0.439) | 149.092 (+-0.036) | 772.508 (+-0.259) | 5.181 (+-0.000) | 325.751 (+-0.398)
Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 326.024 (+-0.118) | 154.452 (+-0.209) | 797.756 (+-0.229) | 5.165 (+-0.000) | 326.870 (+-0.372)
Times are in microseconds (us).
```
[Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710
Approved by: https://github.com/lezcano
2023-08-29 05:54:24 +00:00
ssjia
86f9fec3ac
Avoid decomposing _unsafe_index in Inductor ( #107882 )
...
`_unsafe_index` was previously added to the core ATen decomp table in https://github.com/pytorch/pytorch/pull/106814 , but this has performance ramifications for Inductor. Therefore, this diff removes it from the decomposition table used by Inductor.
Differential Revision: [D48649210](https://our.internmc.facebook.com/intern/diff/D48649210/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107882
Approved by: https://github.com/SherlockNoMad
2023-08-25 04:51:53 +00:00
lezcano
2c5f96deac
[Inductor] Make softshrink composite implicit ( #107052 )
...
The backward is pretty much equivalent to the one we had written
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107052
Approved by: https://github.com/peterbell10
ghstack dependencies: #107038 , #107039 , #107051
2023-08-14 21:01:50 +00:00
lezcano
3b1254e800
Make hardshrink's decomp composite implicit ( #107039 )
...
The generated code is the same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107039
Approved by: https://github.com/peterbell10
ghstack dependencies: #107038
2023-08-14 21:01:50 +00:00
Sam Larsen
e165938853
Implement decomposition for aten.rrelu_with_noise ( #106812 )
...
Test Plan:
* Primarily, added new test in test/test_decomp.py
* Updated existing tests, e.g., to NOT expect failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812
Approved by: https://github.com/eellison
2023-08-11 19:18:29 +00:00
Stephen Jia
8c8477e55a
Add _unsafe_index decomp ( #106814 )
...
Summary:
Redirect `aten._unsafe_index` to `aten.index` through a decomposition.
Also add it to the list of core decompositions.
Test Plan: contbuild and OSS CI (similar to D40075277)
Differential Revision: D48163393
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106814
Approved by: https://github.com/SherlockNoMad
2023-08-10 23:23:37 +00:00
vfdev-5
35a1913370
[inductor] Added affine_grid_generator decomposition ( #104709 )
...
Description:
- Added affine_grid_generator decomposition
Related to https://github.com/pytorch/pytorch/issues/104296
Fixes https://github.com/pytorch/pytorch/issues/105565
Perfs:
- speed-up on cuda with bilinear and nearest modes
```
Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git3ed904e) PR-afgg" and "Compiled (2.1.0a0+gitbcdd413) Nightly"
[------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cpu ------------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git16df542) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+git16df542) Nightly
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 7.467 (+-0.036) | 11.905 (+-0.276) | 13.391 (+-0.051) | 1.125 (+-0.000) | 7.343 (+-0.036)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 7.722 (+-0.168) | 14.371 (+-0.035) | 15.899 (+-0.038) | 1.106 (+-0.000) | 7.870 (+-0.043)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 7.710 (+-0.051) | 11.354 (+-0.053) | 13.376 (+-0.045) | 1.178 (+-0.000) | 7.698 (+-0.061)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 7.870 (+-0.050) | 13.744 (+-0.237) | 15.206 (+-0.102) | 1.106 (+-0.000) | 7.912 (+-0.039)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 4.738 (+-0.015) | 4.508 (+-0.005) | 6.566 (+-0.027) | 1.456 (+-0.000) | 4.630 (+-0.022)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 4.391 (+-0.010) | 4.860 (+-0.390) | 6.438 (+-0.047) | 1.325 (+-0.000) | 4.458 (+-0.010)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 4.279 (+-0.008) | 4.127 (+-0.010) | 6.598 (+-0.709) | 1.599 (+-0.000) | 5.064 (+-0.025)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 4.537 (+-0.010) | 4.593 (+-0.006) | 6.365 (+-0.104) | 1.386 (+-0.000) | 4.480 (+-0.011)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 26.411 (+-0.066) | 62.275 (+-0.436) | 64.486 (+-0.353) | 1.035 (+-0.000) | 26.210 (+-0.110)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 26.457 (+-0.096) | 72.887 (+-0.247) | 74.207 (+-0.337) | 1.018 (+-0.000) | 25.995 (+-0.120)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 26.457 (+-0.086) | 64.110 (+-0.233) | 66.340 (+-0.406) | 1.035 (+-0.000) | 26.145 (+-0.085)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 26.536 (+-0.094) | 73.742 (+-0.483) | 71.946 (+-0.460) | 0.976 (+-0.000) | 26.457 (+-0.166)
Times are in milliseconds (ms).
[------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cuda -----------------------------------------------------------------------------------------------------------------------------------]
| Eager (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git1afae24) PR-afgg | Compiled (2.1.0a0+git16df542) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+git16df542) Nightly
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 91.971 (+-0.253) | 90.570 (+-0.193) | 137.206 (+-0.214) | 1.515 (+-0.000) | 84.280 (+-0.241)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 91.893 (+-0.361) | 89.866 (+-0.170) | 136.678 (+-0.471) | 1.521 (+-0.000) | 84.573 (+-0.214)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 116.967 (+-0.481) | 110.468 (+-0.326) | 223.770 (+-0.334) | 2.026 (+-0.000) | 108.098 (+-0.392)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 117.563 (+-0.546) | 111.438 (+-0.212) | 223.101 (+-0.350) | 2.002 (+-0.000) | 108.225 (+-0.395)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 80.706 (+-0.289) | 70.525 (+-0.204) | 143.697 (+-0.311) | 2.038 (+-0.000) | 74.485 (+-0.258)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 80.955 (+-0.208) | 69.986 (+-0.250) | 143.658 (+-0.244) | 2.053 (+-0.000) | 74.163 (+-0.238)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 117.576 (+-0.435) | 71.179 (+-0.412) | 178.515 (+-0.539) | 2.508 (+-0.000) | 108.394 (+-0.473)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 117.441 (+-0.205) | 70.313 (+-0.170) | 178.664 (+-0.555) | 2.541 (+-0.000) | 108.098 (+-0.416)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 92.962 (+-0.509) | 1740.964 (+-0.597) | 1785.401 (+-0.369) | 1.026 (+-0.000) | 92.638 (+-0.539)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 92.928 (+-0.493) | 1401.146 (+-0.732) | 1453.229 (+-0.628) | 1.037 (+-0.000) | 92.458 (+-0.428)
Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 118.152 (+-0.442) | 1740.644 (+-0.480) | 1793.475 (+-0.458) | 1.030 (+-0.000) | 107.962 (+-0.548)
Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 118.182 (+-0.425) | 1400.621 (+-0.624) | 1461.796 (+-0.630) | 1.044 (+-0.000) | 107.894 (+-0.994)
Times are in microseconds (us).
```
[Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230801-220216-affine-grid-sampler-PR-afgg-vs-Nightly-speedup.md ), [script](https://github.com/vfdev-5/pth-inductor-dev/blob/master/perf_affine_grid_sampler.py )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104709
Approved by: https://github.com/lezcano
2023-08-10 09:52:48 +00:00
Nikita Karetnikov
45e4706aff
[pt2] add decomps for multilabel_margin_loss_forward ops ( #105302 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
angelayi
fed8d3608d
Update core aten decomp table ( #105673 )
...
Updated the decomposition table based on the existing [Core ATen IR](https://pytorch.org/docs/stable/ir.html ) list, and moved rest of decompositions to inductor's decomposition table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105673
Approved by: https://github.com/SherlockNoMad
2023-07-21 02:45:37 +00:00
Peter Bell
9adfaf8807
[inductor] Add lowering for aten.unfold ( #105165 )
...
The decomposition for unfold uses `as_strided` which forces the input to be
realized. Instead, this implements it as a `GenericView` with reindexing
which removes the need to realize, though it does call `mark_reuse` incase
the input computation is expensive and the windows overlap.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105165
Approved by: https://github.com/lezcano , https://github.com/jansel
2023-07-16 13:09:23 +00:00
William Wen
5cd861fcf7
Add empty/empty_like to core aten decomps ( #105158 )
...
Fixes https://github.com/pytorch/pytorch/issues/104871
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105158
Approved by: https://github.com/SherlockNoMad
2023-07-15 18:48:55 +00:00
Nikita Karetnikov
7e72126487
[pt2] add decomps for multi_margin_loss ops ( #104578 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578
Approved by: https://github.com/ezyang , https://github.com/lezcano
2023-07-14 21:16:09 +00:00
Peter Bell
5c580a9846
[decomp] Add test tracking core ATen operators ( #104262 )
...
This adds an expect-test that finds the set of core ATen operators by
subtracting the operators with decomposition in core_aten_decompositions from the
set of all operators that have decompositions and could be decomposed.
This is useful because if you add a new decomposition but forget to add it to
the list of core decompositions, it will appear in the PR diff.
Also, by going through this list I have identified some operators where the
functional variant is decomposed, but not the inplace variant which must be an
oversight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104262
Approved by: https://github.com/lezcano
2023-07-04 16:41:44 +00:00