pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit `9606cda64e`. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
Kazuaki Ishizaki	fde28fdc8c	Fix typo under torch/_decomp directory (#110821 ) This PR fixes typo of comments in files under `torch/_decomp` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821 Approved by: https://github.com/Skylion007	2023-10-08 20:33:49 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
cdzhan	7cc0020a80	[decomp] Fix different return type in threshold_backward vs. eager (#110689 ) due to type promotion with floating point scalar in decompositions.py Fixes part of #100838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689 Approved by: https://github.com/ezyang	2023-10-06 20:59:58 +00:00
chilli	ceb773b68d	Fix #110680 (requires_grad typo in decomp) (#110687 ) Fixes https://github.com/pytorch/pytorch/issues/110680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687 Approved by: https://github.com/voznesenskym, https://github.com/lezcano ghstack dependencies: #110501, #110504, #110591, #110668	2023-10-06 10:36:01 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Stephen Jia	ff96f6d04f	[core IR][reland] Add `split.Tensor` and `unbind` decompositions to core ATen decomp table (#110323 ) Summary: This is a reland of [github PR #110102]( https://github.com/pytorch/pytorch/pull/110102). The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions. Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](`13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386)`). Therefore decomposing `lift_fresh` will interfere with constant propogation during export. Test Plan: Github CI and internal CI Differential Revision: D49761321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323 Approved by: https://github.com/jansel	2023-10-03 14:35:04 +00:00
Peter Bell	be3b16daad	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-28 21:23:44 +00:00
PyTorch MergeBot	e0b035c220	Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 )" This reverts commit `22e706f768`. Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671))	2023-09-28 19:03:25 +00:00
SS-JIA	22e706f768	[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 ) ## Context Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly. One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102 Approved by: https://github.com/jansel	2023-09-28 01:21:45 +00:00
SS-JIA	dec140f1ea	[core IR] Add a core decomposition for aten.all (#110093 ) ## Context Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093 Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano	2023-09-27 01:31:41 +00:00
SS-JIA	9928c10e71	[core IR] Add glu as a core decomposition (#110043 ) ## Context Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043 Approved by: https://github.com/peterbell10, https://github.com/lezcano ghstack dependencies: #110046	2023-09-27 00:23:05 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Jijie Wei	334ead04a9	Back out "[decomp] Fix baddbmm decomposition (#109714 )" (#109855 ) Summary: Original commit changeset: 95c462a380c9 Original Phabricator Diff: D49484954 this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/ Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus' https://www.internalfb.com/intern/testinfra/testrun/16888498605839953 Differential Revision: D49527271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855 Approved by: https://github.com/yanboliang	2023-09-22 22:01:38 +00:00
Mwiza Kunda	8dedc9dd9b	Add meta tests for layer/group/batch norm backward (#109591 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591 Approved by: https://github.com/ezyang	2023-09-21 18:58:51 +00:00
Mwiza Kunda	6b7b9c796e	Fix registering jit decompositions for jvp for out wrapped decomps (#109367 ) Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since: - `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter - `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367 Approved by: https://github.com/lezcano	2023-09-21 16:36:51 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	9e629dd73c	[decomp] Add all std and std_mean overloads to core decompostions (#109667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667 Approved by: https://github.com/lezcano	2023-09-20 18:45:56 +00:00
Peter Bell	36a8105f54	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-20 18:40:21 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Salil Desai	d0cc623192	[Decomposition] _unsafe_view (#108713 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091 Differential Revision: D48619079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-19 13:37:35 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Salil Desai	ae66d0b3bf	[Decomposition] clamp_max (#108718 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855 Differential Revision: D48880026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:25:35 +00:00
Salil Desai	fc47ba2794	[Decomposition] clamp_min (#108717 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1846 Differential Revision: D48880080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108717 Approved by: https://github.com/SherlockNoMad	2023-09-18 12:43:58 +00:00
Salil Desai	a6d4cca7c0	[Decomposition] unsafe_split.Tensor (#108544 ) Summary: Include decomp in core_aten_decompositions Decomp already exists https://www.internalfb.com/code/fbsource/[03ff511cad587fc27ed8fd6a54b87845246e8e0c]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=1209 Test Plan: OSS + Phabricator Tests Differential Revision: D48940445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108544 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 12:43:07 +00:00
Salil Desai	af93b29c5e	[Decomposition] std.correction (#108733 ) Summary: Include decomp in core_aten_decompositions Decomp: https://www.internalfb.com/code/fbsource/[e69bf00ff87a55c9a30bd7905881661ff05fa211]/fbcode/caffe2/torch/_refs/__init__.py?lines=2398 Differential Revision: D48940402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108733 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 11:38:23 +00:00
Jez Ng	db48bc80d9	Check index size during decomp of index_add (#108826 ) This partially fixes the `test_index_add_correctness` test (#108181) when run under inductor: it causes an exception to be raised [here][1] as expected. The test as a whole still cannot be made to pass under inductor because the [last assert][2] still fails, likely due to #108798. [1]: `dec2b267d4/test/test_torch.py (L6049)` [2]: `dec2b267d4/test/test_torch.py (L6051)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108826 Approved by: https://github.com/eellison	2023-09-13 13:06:26 +00:00
Ken Jin	c458fa0d35	Decompose/add reference for `view_as_complex` (#108005 ) Aten source: `d4a99631dd/aten/src/ATen/native/ComplexHelper.h (L78)` Documentation reference: https://pytorch.org/docs/stable/generated/torch.view_as_complex.html Note: this adds a new primitive `view_of_dtype`, which is trivially implemented, as its meta function is already implemented elsewhere. Finally, this is not registered as a decomposition (yet), because TorchInductor does not yet support complex types. It should be added once we do. Closes https://github.com/pytorch/pytorch/issues/108020 as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108005 Approved by: https://github.com/peterbell10, https://github.com/ezyang	2023-09-07 23:49:20 +00:00
Edward Z. Yang	9f37aec964	Add torch._check_is_size (#108685 ) Check comments for what it does. The key distinction is that if you feed it an unbacked SymInt, we will also apply >= 2 assumption at compile time. This will get exercised when I reland https://github.com/pytorch/pytorch/pull/107788 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108685 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-07 12:48:39 +00:00
Sam Larsen	27fe45eaf6	[inductor][easy] Enable Mypy Checking for torch/_inductor/decomposition.py (#108682 ) Summary: Looks like one simple type mismatch between `get_decompositions()` and `remove_decompositions()` Test Plan: `lintrunner torch/_inductor/decomposition.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108682 Approved by: https://github.com/eellison	2023-09-07 00:48:55 +00:00
Huy Do	5a4fe05a15	Revert "Force synced KJT to trace unbacked SymInt (#107788 )" (#108684 ) This reverts commit `3b92ef814d`. So let's manually revert it instead. (Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684 Approved by: https://github.com/ezyang	2023-09-06 19:15:45 +00:00
Kimish Patel	ebed490c2f	[sdpa decomp] change sdpa decomp to be consistent with flash attention (#108608 ) Summary: See the comment in code for the reasons of the change Test Plan: buck2 test executorch/examples/export/test:test_export -- test_vit_export_to_executorch Differential Revision: D48992180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108608 Approved by: https://github.com/larryliu0820	2023-09-06 15:34:03 +00:00
Edward Z. Yang	3b92ef814d	Force synced KJT to trace unbacked SymInt (#107788 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788 Approved by: https://github.com/voznesenskym	2023-09-06 03:18:26 +00:00
Kimish Patel	cc50e654d4	[aten decomp] Update sdpa decom (#108371 ) Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48917461](https://our.internmc.facebook.com/intern/diff/D48917461) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108371 Approved by: https://github.com/larryliu0820	2023-09-03 15:17:08 +00:00
lezcano	239ee76177	Add refs/decomps for dot/vdot (#108194 ) Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194 Approved by: https://github.com/peterbell10 ghstack dependencies: #108188	2023-08-31 15:30:23 +00:00
rzou	0e4752bafc	Allow registering decomps for HigherOrderOp; add decomp for out_dtype (#108080 ) We allow registering decomps for HigherOrderOp via the existing decomp mechanisms: - I refactored those APIs to accept torch._ops.OperatorBase, which is the base class for torch.ops.HigherOrderOperator and torch.ops.OpOverload - HigherOrderOps must directly call maybe_handle_decomp in their ProxyTorchDispatchMode handling in order to resolve decompositions. We can change this in the future so that they do not need to do this. Next, we add an inductor decomp for out_dtype. This decomp shouldn't be generally available because we want to preserve out_dtype to the backend for other use cases (i.e. executorch). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080 Approved by: https://github.com/HDCharles	2023-08-31 03:15:38 +00:00
chilli	39130c7433	Add reinplacing pass for scatters + incremental fake tensor updating (#106192 ) mutation for params) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106192 Approved by: https://github.com/jansel, https://github.com/eellison	2023-08-30 20:41:37 +00:00
Mengwei Liu	0fb1c05c5a	[pytorch] Add decomp rule for scaled_dot_product_attention (#108180 ) `scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity. However recent PRs (#103826 #105131) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor. Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180 Approved by: https://github.com/SherlockNoMad	2023-08-30 15:52:08 +00:00
vfdev-5	0cfc5899f9	[inductor] Improved grid_sampler_2d decomposition for cuda (#104710 ) Description: - Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two Related to https://github.com/pytorch/pytorch/issues/104296 Perfs: - speed-up on cuda (~x5) and cpu (~x2) for bicubic mode ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly" [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 38.010 (+-0.118) \| 51.466 (+-1.257) \| 47.867 (+-0.124) \| 0.930 (+-0.000) \| 33.654 (+-0.411) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 35.532 (+-0.236) \| 52.189 (+-0.093) \| 58.979 (+-0.206) \| 1.130 (+-0.000) \| 32.543 (+-0.198) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 38.187 (+-0.112) \| 47.892 (+-0.117) \| 45.833 (+-0.081) \| 0.957 (+-0.000) \| 33.752 (+-0.116) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 36.708 (+-0.244) \| 51.680 (+-0.104) \| 58.360 (+-0.108) \| 1.129 (+-0.000) \| 32.576 (+-0.751) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 24.201 (+-0.088) \| 27.451 (+-0.059) \| 27.937 (+-0.081) \| 1.018 (+-0.000) \| 24.367 (+-0.074) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 19.266 (+-0.105) \| 26.070 (+-0.085) \| 26.092 (+-0.054) \| 1.001 (+-0.000) \| 20.144 (+-0.064) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 24.293 (+-0.125) \| 26.085 (+-0.064) \| 26.575 (+-0.061) \| 1.019 (+-0.000) \| 24.515 (+-0.095) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 19.440 (+-0.075) \| 25.252 (+-0.059) \| 25.259 (+-0.051) \| 1.000 (+-0.000) \| 19.770 (+-0.070) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 114.900 (+-0.508) \| 113.416 (+-1.271) \| 248.679 (+-1.431) \| 2.193 (+-0.000) \| 114.609 (+-0.515) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 115.973 (+-0.555) \| 124.711 (+-1.596) \| 282.187 (+-2.418) \| 2.263 (+-0.000) \| 115.368 (+-0.652) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 111.730 (+-0.562) \| 110.914 (+-0.865) \| 253.899 (+-2.226) \| 2.289 (+-0.000) \| 111.285 (+-1.226) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 112.859 (+-0.487) \| 131.696 (+-1.298) \| 294.124 (+-1.963) \| 2.233 (+-0.000) \| 110.910 (+-0.969) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 228.811 (+-0.037) \| 92.990 (+-0.446) \| 92.648 (+-0.286) \| 0.996 (+-0.000) \| 228.274 (+-0.067) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 222.107 (+-0.076) \| 93.247 (+-0.387) \| 92.528 (+-0.423) \| 0.992 (+-0.000) \| 221.922 (+-0.297) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 235.654 (+-0.055) \| 75.781 (+-0.566) \| 115.865 (+-0.419) \| 1.529 (+-0.000) \| 236.032 (+-0.111) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 226.752 (+-0.088) \| 76.312 (+-0.328) \| 116.468 (+-0.477) \| 1.526 (+-0.000) \| 226.950 (+-0.027) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 225.540 (+-0.013) \| 75.638 (+-0.341) \| 72.621 (+-0.292) \| 0.960 (+-0.000) \| 225.937 (+-0.017) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 217.425 (+-0.024) \| 75.484 (+-0.545) \| 73.518 (+-0.296) \| 0.974 (+-0.000) \| 217.793 (+-0.008) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 231.474 (+-0.020) \| 75.972 (+-0.339) \| 73.030 (+-0.387) \| 0.961 (+-0.000) \| 231.991 (+-0.184) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 223.408 (+-0.016) \| 75.622 (+-0.279) \| 73.542 (+-0.336) \| 0.973 (+-0.000) \| 223.893 (+-0.021) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 319.382 (+-0.023) \| 149.060 (+-0.190) \| 772.116 (+-0.266) \| 5.180 (+-0.000) \| 320.549 (+-0.387) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 319.987 (+-0.134) \| 154.443 (+-0.014) \| 797.651 (+-0.232) \| 5.165 (+-0.000) \| 320.665 (+-0.397) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 326.138 (+-0.439) \| 149.092 (+-0.036) \| 772.508 (+-0.259) \| 5.181 (+-0.000) \| 325.751 (+-0.398) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 326.024 (+-0.118) \| 154.452 (+-0.209) \| 797.756 (+-0.229) \| 5.165 (+-0.000) \| 326.870 (+-0.372) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710 Approved by: https://github.com/lezcano	2023-08-29 05:54:24 +00:00
Sam Larsen	20f3808aa2	Implement decomposition for aten.tensor_split.tensor_indices_or_sections (#107251 ) Summary: Before this change, the tensor_indices_or_sections variant of aten.tensor_split causes a `RuntimeError: The tensor has a non-zero number of elements` due to that operation needing to introspect data. Decomposing into one of the other two tensor_split variants fixes the problem. Test Plan: Enabled tensor_split tests in test/inductor/test_torchinductor_opinfo.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/107251 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-08-28 17:01:23 +00:00
ssjia	86f9fec3ac	Avoid decomposing `_unsafe_index` in Inductor (#107882 ) `_unsafe_index` was previously added to the core ATen decomp table in https://github.com/pytorch/pytorch/pull/106814, but this has performance ramifications for Inductor. Therefore, this diff removes it from the decomposition table used by Inductor. Differential Revision: [D48649210](https://our.internmc.facebook.com/intern/diff/D48649210/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107882 Approved by: https://github.com/SherlockNoMad	2023-08-25 04:51:53 +00:00
Vishwa Raj Singh	35de780aa6	Fix Inplace tensor update on transpose (#104689 ) Fixes #https://github.com/pytorch/pytorch/issues/103650 - To align with HPU device backend architecture. Ensure all non-view ops return contiguous fake tensor outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104689 Approved by: https://github.com/ezyang	2023-08-24 16:58:50 +00:00
Andrew Or	64d5851b1f	make python decomp for native_batch_norm CompositeImplicitAutograd, remove native_batch_norm from core aten opset (#107791 ) Summary: (From Brian Hirsh) Description copied from what I put in a comment in this PR: https://github.com/pytorch/pytorch/pull/106329 So, the slightly-contentious idea behind this PR is that lower in the stack, I updated torch._decomps.get_decomps() to check not only the decomp table to see if a given op has a decomposition available, but to also check the dispatcher for any decomps registered to the CompositeImplicitAutograd key (link: https://github.com/pytorch/pytorch/pull/105865/files#diff-7008e894af47c01ee6b8eb94996363bd6c5a43a061a2c13a472a2f8a9242ad43R190) There's one problem though: we don't actually make any hard guarantees that a given key in the dispatcher points does or does not point to a decomposition. We do rely pretty heavily, however, on the fact that everything registered to the CompositeImplicitAutograd key is in fact a decomposition into other ops. QAT would like this API to faithfully return "the set of all decomps that would have run if we had traced through the dispatcher". However, native_batch_norm is an example of an op that has a pre-autograd decomp registered to it (through op.py_impl(), but the decomp is registered directly to the Autograd key instead of being registered to the CompositeImplicitAutograd key. If we want to provide a guarantee to QAT that they can programatically access all decomps that would have run during tracing, then we need to make sure that every decomp we register to the Autograd key is also registered to the CompositeImplicitAutograd key. This might sound kind of painful (since it requires auditing), but I think in practice this basically only applies to native_batch_norm. Test Plan: python test/test_decomp.py Differential Revision: D48607575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107791 Approved by: https://github.com/jerryzh168, https://github.com/SherlockNoMad	2023-08-24 15:19:07 +00:00
Sherlock Huang	ee4b99cc3a	Decomp for aten.dropout (#106274 ) When exporting dropout with cpu tensor, we get following graph module ``` class GraphModule(torch.nn.Module): def forward(self, arg0_1: f32[512, 10]): empty_memory_format: f32[512, 10] = torch.ops.aten.empty.memory_format([512, 10], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False, memory_format = torch.contiguous_format) bernoulli_p: f32[512, 10] = torch.ops.aten.bernoulli.p(empty_memory_format, 0.9); empty_memory_format = None div_scalar: f32[512, 10] = torch.ops.aten.div.Scalar(bernoulli_p, 0.9); bernoulli_p = None mul_tensor: f32[512, 10] = torch.ops.aten.mul.Tensor(arg0_1, div_scalar); arg0_1 = div_scalar = None return (mul_tensor,) ``` In addition, if we export with eval() mode, we will have an empty graph. However, when exporting with cuda tensor, we got ``` class GraphModule(torch.nn.Module): def forward(self, arg0_1: f32[512, 10]): native_dropout_default = torch.ops.aten.native_dropout.default(arg0_1, 0.1, True); arg0_1 = None getitem: f32[512, 10] = native_dropout_default[0]; native_dropout_default = None return (getitem,) ``` and exporting under eval() mode will still have a dropout node in graph. This PR make exporting with CPU tensor also produce aten.native_dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106274 Approved by: https://github.com/ezyang	2023-08-23 21:12:37 +00:00
Edward Z. Yang	5673c0874c	Use expect_true to make split with unbacked sizes work. (#106788 ) This pattern shows up in torchrec KeyedJaggedTensor. Most of the change in this PR is mechanical: whenever we failed an unbacked symint test due to just error checking, replace the conditional with something that calls expect_true (e.g., torch._check or TORCH_SYM_CHECK). Some of the changes are a bit more nuanced, I've commented on the PR accordingly. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106788 Approved by: https://github.com/lezcano ghstack dependencies: #106720	2023-08-15 20:31:30 +00:00
lezcano	2c5f96deac	[Inductor] Make softshrink composite implicit (#107052 ) The backward is pretty much equivalent to the one we had written Pull Request resolved: https://github.com/pytorch/pytorch/pull/107052 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038, #107039, #107051	2023-08-14 21:01:50 +00:00
lezcano	3b1254e800	Make hardshrink's decomp composite implicit (#107039 ) The generated code is the same Pull Request resolved: https://github.com/pytorch/pytorch/pull/107039 Approved by: https://github.com/peterbell10 ghstack dependencies: #107038	2023-08-14 21:01:50 +00:00

1 2 3 4 5 ...

350 Commits