pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	36e1f7bc2b	Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608 Approved by: https://github.com/ngimel	2023-01-22 07:12:29 +00:00
Peter Bell	dd760c98f8	[decomp] Use new squeeze.dims overload in decompositions (#91602 ) This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602 Approved by: https://github.com/ngimel	2023-01-20 18:08:18 +00:00
PyTorch MergeBot	2891cecd8d	Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 )" This reverts commit `4386f317b9`. Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ZainRizvi due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (__main__.TestEagerFusionOpInfoCPU) is failing consistently since this PR was merged	2023-01-20 17:17:35 +00:00
Tugsbayasgalan Manlaibaatar	4386f317b9	Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608 Approved by: https://github.com/ngimel	2023-01-20 12:39:56 +00:00
lezcano	8b861544f9	Remove lowering and decompositions of zero_, zero, zeros_like... in favour of their references (#92071 ) The generated triton code is identical. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92071 Approved by: https://github.com/ngimel	2023-01-18 23:22:36 +00:00
Peter Bell	8770a7ed6f	Decompose more inplace ops (#90967 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90967 Approved by: https://github.com/anijain2305	2023-01-18 21:07:47 +00:00
Peter Bell	4058dedf21	Replace log(1 + x) with log1p(x) (#92114 ) `log1p` offers better precision near zero since `(1 + x) - 1` truncates any values less than the float epsilon to zero. For `soft_margin_loss` this also requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup on CUDA and a 1.1x speedup on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-01-18 10:43:56 +00:00
lezcano	da58f9eb8f	Rewrite out-of-place decompositions in terms of out-of-place ops (#92003 ) Fixes https://github.com/pytorch/torchdynamo/issues/1863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92003 Approved by: https://github.com/ngimel	2023-01-17 16:53:27 +00:00
vfdev-5	5f55335c2e	Fixed output memory format mismatch for bicubic2d (#90470 ) Description: - output memory format is matching input for bicubic2d Problem: output tensor's memory format does not match input format for bicubic2d ```python import torch i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last) assert i.is_contiguous(memory_format=torch.channels_last) o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic") assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})" > AssertionError: Should be channels last but given channels first (True) ``` Related PR fixing bilinear ops: https://github.com/pytorch/pytorch/pull/53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh ) Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev - Updated code to match grad input / output memory formats - temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl` - Updated tests - Added missing forward AD support for bicubic with antialiasing Pull Request resolved: https://github.com/pytorch/pytorch/pull/90470 Approved by: https://github.com/NicolasHug, https://github.com/lezcano	2023-01-12 19:52:28 +00:00
min-jean-cho	af242eedfb	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-11 23:23:42 +00:00
David Berard	d7dc1c2fd5	Support zero dimensions in softmax decompositions (#91322 ) The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including: * decompositions & refs (this was causing dynamo failures) * forward AD for logsumexp * MPS log_softmax_backward This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos. example of "computation along zero dimensions": ```python # example of where import torch t = torch.rand((4, 0, 0)) print("~") print(torch.nn.functional.softmax(t, dim=-1)) # this passes print("~") torch._refs.softmax(t, dim=-1) # this fails print("~") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322 Approved by: https://github.com/lezcano	2023-01-11 09:35:43 +00:00
XiaobingSuper	3790b50505	inductor: fix .to(memort_format) issue which doesn't generate right stride (#91948 ) Motivation: for .to(memory_format), the inductor doesn't generate the right stride, see the following example: ``` class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() def forward(self, x): x = x.to(memory_format=torch.contiguous_format) return x ``` the generated code doesn't do the memory format change and gets a wrong stride (802816, 1, 14336, 256), it is not a contiguous stride. ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() async_compile.wait(globals()) del async_compile def call(args): arg0_1, = args args.clear() return (arg0_1, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1])) ``` After this PR, the will have a memory format change: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, float* __restrict__ out_ptr0) { #pragma omp parallel num_threads(40) { { #pragma omp for for(long i0=0; i0<128; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<256; i1+=1) { #pragma GCC ivdep for(long i2=0; i2<3136; i2+=1) { auto tmp0 = in_ptr0[i1 + (256i2) + (802816i0)]; out_ptr0[i2 + (3136i1) + (802816i0)] = tmp0; } } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, = args args.clear() buf1 = empty_strided((128, 256, 56, 56), (802816, 3136, 56, 1), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) del arg0_1 return (buf1, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91948 Approved by: https://github.com/ngimel	2023-01-11 08:23:26 +00:00
min-jean-cho	364f526b9c	[Inductor] assert generator for random, dropout (#91833 ) See comment https://github.com/pytorch/pytorch/pull/90869#discussion_r1063731541 , https://github.com/pytorch/pytorch/pull/91673#discussion_r1061099337. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91833 Approved by: https://github.com/jansel	2023-01-11 03:24:10 +00:00
PyTorch MergeBot	43050b8301	Revert "[Inductor] Added aten.uniform_ decomp (#90869 )" This reverts commit `c55293d640`. Reverted https://github.com/pytorch/pytorch/pull/90869 on behalf of https://github.com/huydhn due to Crossref error cannot just simply be ignored because it would break trunk for every commits after this, i.e. `fd0030fe74`. The failure would need to be handled gracefully, i.e. adding an XFAIL for example	2023-01-11 01:18:11 +00:00
min-jean-cho	c55293d640	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-10 23:05:01 +00:00
Nikita Karetnikov	00e5f3a9c5	[primTorch] Move `logsumexp` decomp to refs (#91860 ) Fixes #91843. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91860 Approved by: https://github.com/lezcano	2023-01-09 17:00:43 +00:00
Natalia Gimelshein	2c00064113	remove unnecessary decomps (#91828 ) in favor of refs. Generated triton code is the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91828 Approved by: https://github.com/lezcano, https://github.com/soumith	2023-01-07 20:37:12 +00:00
PyTorch MergeBot	c73147f741	Revert "[decomp] Use new squeeze.dims overload in decompositions (#91602 )" This reverts commit `9262ffc692`. Reverted https://github.com/pytorch/pytorch/pull/91602 on behalf of https://github.com/clee2000 due to stacked pr was reverted, this is dependent	2023-01-05 20:39:52 +00:00
Peter Bell	9262ffc692	[decomp] Use new squeeze.dims overload in decompositions (#91602 ) This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602 Approved by: https://github.com/ngimel	2023-01-05 17:59:32 +00:00
lezcano	484dd40022	Implement PReLU in a compositional way (#91238 ) The PReLU implementation was all over the place. This lead to a number of bugs like https://github.com/pytorch/pytorch/issues/68760. We fix it by: - Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel - This second kernel is just a good-ol' pointwise kernel. - We implement the derivative for the pointwise kernel via TI as well for speed. - We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally This fixes a number of issues: - We don't perform copies any more when the inputs are not contiguous - The derivatives are now correct - We fix vmap and many other functorch-related issues. - CPU and CUDA now share the relevant broadcasting logic - The implementation is about 1/3 the length. Fixes https://github.com/pytorch/pytorch/issues/68760 Fixes https://github.com/pytorch/pytorch/issues/89895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238 Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD	2022-12-30 10:42:30 +00:00
Joel Schlosser	8b55b86dbd	Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317 ) This PR moves the definitions for: * `sym_int` * `sym_ceil` (used only for `sym_int`) * `sym_floor` (used only for `sym_int`) * `sym_float` from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined. This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way! Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2022-12-28 16:08:16 +00:00
Joel Schlosser	1c40ec46ff	Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260 ) Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260 Approved by: https://github.com/ezyang	2022-12-28 16:03:25 +00:00
Nikita Shulga	fd3a7264ae	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (take 2) (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-22 08:54:37 +00:00
PyTorch MergeBot	645eda0a00	Revert "[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 )" This reverts commit `371716eb36`. Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops	2022-12-21 19:37:43 +00:00
Nikita Shulga	371716eb36	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Depends on https://github.com/pytorch/pytorch/pull/91203 Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-21 17:33:27 +00:00
Nikita Shulga	46f64117db	[BE] Use `aten` global var (#91188 ) s/torch.ops.aten/aten/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/91188 Approved by: https://github.com/ngimel	2022-12-21 02:28:51 +00:00
Peter Bell	e670c261c5	Decompose fill, zero, and zeros_like (#90968 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90968 Approved by: https://github.com/ngimel	2022-12-21 00:59:50 +00:00
Natalia Gimelshein	e689c50922	Don't recompute var in bn decomp (#90984 ) Fixes https://github.com/pytorch/torchdynamo/issues/1988 Repeated `var` computation is not CSE'd for some reason. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90984 Approved by: https://github.com/Chillee	2022-12-16 21:38:49 +00:00
Brian Hirsh	7a683eaeb8	aot_autograd: add assert for functional-only graph (#88816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-12-16 21:04:36 +00:00
soulitzer	98a9235dce	Fix prelu ref when a.ndim < 2 (#89809 ) Fixes https://github.com/pytorch/pytorch/issues/89560 Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures: - forward AD (fixed in this PR) - vmap (filed https://github.com/pytorch/pytorch/issues/89895) - ref/meta (fixed this PR, though this also regresses nvFuser support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809 Approved by: https://github.com/ngimel	2022-12-12 23:55:31 +00:00
Bin Bao	282dfe8ba4	[inductor][Reland] Use decomposition for _to_copy (#90494 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90494 Approved by: https://github.com/ngimel	2022-12-09 16:51:50 +00:00
PyTorch MergeBot	e89685b0b5	Revert "[inductor] Use decomposition for _to_copy (#90314 )" This reverts commit `3fdb5f2dda`. Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert	2022-12-08 18:29:06 +00:00
Bin Bao	3fdb5f2dda	[inductor] Use decomposition for _to_copy (#90314 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314 Approved by: https://github.com/ngimel	2022-12-08 15:25:44 +00:00
Peter Bell	e6a7278753	Give std/var correction overloads proper defaults (#56398 ) The correction overloads defaults were left off for forward compatibility reasons, but this FC window expired well over a year ago at this point. Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398 Approved by: https://github.com/mruberry	2022-12-07 15:15:00 +00:00
Yanbo Liang	25f39c1bce	Fix uniform ref implementation (#90094 ) Fixes https://github.com/pytorch/torchdynamo/issues/1954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094 Approved by: https://github.com/ngimel	2022-12-06 21:28:17 +00:00
Animesh Jain	c1950620c5	[decomp] Fix native_batch_norm_backward dtype of dweight and dbias (#89740 ) Discovered while debugging an accuracy issue for Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89740 Approved by: https://github.com/soumith, https://github.com/ngimel	2022-11-29 03:15:20 +00:00
Brian Hirsh	e20ec44544	fixes for inductor <> batch norm (#89603 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89603 Approved by: https://github.com/albanD	2022-11-29 02:16:52 +00:00
Jane Xu	8695f0cced	Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697 ) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) Calling for name suggestions! ## test plan I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. ## next steps Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD	2022-11-23 23:23:17 +00:00
Elias Ellison	a8d6b82167	Fix norm decomp when dtype is passed in (#89508 ) Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508 Approved by: https://github.com/anijain2305	2022-11-23 20:49:09 +00:00
Elias Ellison	72110d7833	Fix Upsample Decomp Striding For Small Channels (#89528 ) Fix for https://github.com/pytorch/torchdynamo/issues/623. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528 Approved by: https://github.com/ngimel, https://github.com/anijain2305	2022-11-23 20:47:39 +00:00
lezcano	154e58c032	Add most in-place references/decompositions (#88117 ) We add most in-place references in a generic way. We also implement a wrapper to implement the annoying interface that `nn.functional` nonlinearities have. We fix along the way a couple decompositions for some non-linearities by extending the arguments that the references have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117 Approved by: https://github.com/mruberry	2022-11-18 14:59:46 +00:00
lezcano	3320915303	Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204 ) See the title Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204 Approved by: https://github.com/Chillee	2022-11-16 17:46:54 +00:00
Sherlock Huang	5faa2792fa	Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang	2022-11-15 13:34:45 +00:00
PyTorch MergeBot	eea506aee1	Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 )" This reverts commit `9eabcc370f`. Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken `9eabcc370f`	2022-11-14 01:58:47 +00:00
Sherlock Huang	9eabcc370f	Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang	2022-11-13 21:30:53 +00:00
Horace He	37c5b42fa6	Fix matmul decomp to use reshape instead of contiguous().view() (#88832 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832 Approved by: https://github.com/bertmaher, https://github.com/ngimel	2022-11-12 00:15:42 +00:00
Ryan Spring	534ae6ae47	[primTorch] Implement group norm reference (#87054 ) Add group norm reference Split from #81191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87054 Approved by: https://github.com/mruberry	2022-11-11 01:08:20 +00:00
Sherlock Huang	c00c34fb69	Fix meta for aten.upsample_bilinear2d.vec (#88158 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158 Approved by: https://github.com/ngimel	2022-11-02 16:58:29 +00:00
Sherlock Huang	de1f641f11	Fix meta function for aten.addmm (#88068 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068 Approved by: https://github.com/albanD	2022-11-01 17:05:48 +00:00
lezcano	fd27246c16	Fix decomposition for std (#87181 ) The previous implementation was lacking a few features and incurred on a pretty large error cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87181 Approved by: https://github.com/ngimel, https://github.com/peterbell10	2022-10-28 00:50:29 +00:00
Sherlock Huang	eb99c1efce	Prefer python meta function over c++ meta function (#87426 ) This is a policy update for meta registration. We now prefer python meta implementation over C++ meta function. This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist. Here's the meta registration process: 1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`. However, they will NOT register them into dispatcher. 2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd. 3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases - 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta - 2. the op is a view op, as the MetaTensor doesn't support aliased storage - 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op) Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5 cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426 Approved by: https://github.com/ezyang, https://github.com/jansel	2022-10-25 16:49:02 +00:00
Ryan Spring	9bb4926de0	Add xlogy and xlog1py references (#77712 ) * Add reference implementations for `xlogy` and `xlog1py` * Replace `_wrap_scalar` helper function with `scalar_tensor` prim Pull Request resolved: https://github.com/pytorch/pytorch/pull/77712 Approved by: https://github.com/mruberry	2022-10-22 17:59:25 +00:00
Edward Z. Yang	d73d4aa7de	Audit for error prone isinstance int/float and add lint (#87345 ) We recently fixed a bug on symbolic-shapes branch where an isinstance(x, int) test failed when passed a SymIntNode. To prevent this, I've added a lint for all the codepaths where we may pass SymInt/SymFloat directly to reject direct isinstance int/float tests, and instead use one of the aliases. The lint rule explains the options. I then go and fix all of them. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-10-21 15:55:24 +00:00
Sherlock Huang	f7da9db9c1	Unify decomp registries into global_decomposition_table (#86857 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857 Approved by: https://github.com/ezyang	2022-10-20 21:29:05 +00:00
Sherlock Huang	ef045695e0	Fix decomp for huber_loss_backward (#86955 ) Fixes https://github.com/pytorch/pytorch/issues/86846 aten.huber_loss_backward calls aten.huber_loss_backward.out in its CompositeExplicitAutograd kernel. The decomp was mistaken registered for both aten.huber_loss_backward.default and aten.huber_loss_backward.out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86955 Approved by: https://github.com/Chillee	2022-10-14 18:53:02 +00:00
Nikita Karetnikov	4460e40db4	[primTorch] Add a ref for `addcmul` (#86731 ) Based on: https://github.com/pytorch/pytorch/pull/79827 https://github.com/pytorch/pytorch/pull/72949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86731 Approved by: https://github.com/lezcano, https://github.com/mruberry	2022-10-14 14:26:23 +00:00
Brian Hirsh	e17732b234	[test] add cross-ref tests for python meta kernels (#86228 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86228 Approved by: https://github.com/albanD	2022-10-13 14:14:26 +00:00
Elias Ellison	d3f7c34cb3	Enable aten-aten decomps (#85921 ) Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921 Approved by: https://github.com/ezyang	2022-10-08 05:12:42 +00:00
PyTorch MergeBot	7ec12a559c	Revert "Enable aten-aten decomps (#85921 )" This reverts commit `62e4f51efd`. Reverted https://github.com/pytorch/pytorch/pull/85921 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. I think it breaks a dynamo test in trunk `62e4f51efd`	2022-10-08 01:59:54 +00:00
Elias Ellison	62e4f51efd	Enable aten-aten decomps (#85921 ) Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921 Approved by: https://github.com/ezyang	2022-10-07 21:04:39 +00:00
lezcano	28a0b3fb18	Fix col2im and im2col decompositions (#86426 ) I threw in some tests for good measure. Fixes https://github.com/pytorch/pytorch/issues/86332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86426 Approved by: https://github.com/ngimel	2022-10-07 08:14:06 +00:00
Elias Ellison	9ceadcadb2	Fix unfold backward decomp aliasing for 0 dim input (#86428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86428 Approved by: https://github.com/ngimel, https://github.com/ezyang	2022-10-07 03:55:31 +00:00
lezcano	b67e022833	Fix ref / decomposition index_add (#86266 ) The decomposition of `index_add` was using `slice(None)`, when it should use just `None`. The reference for index_add was also wrong, as `x[idx] += t` does not use atomic add, so it does not work when several `idx`s point to the same location. This PR adds extra reference inputs to help test for this. Fixes https://github.com/pytorch/torchdynamo/issues/1356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86266 Approved by: https://github.com/ngimel	2022-10-05 19:59:15 +00:00
lezcano	c609768896	Add refs for torch.unfold and a decomposition for its backward. (#85629 ) It's not clear to me what's the difference between `unfold` and `unfold_copy`, as this latter one is codegen'd I also took this chance to clean the implementation of unfold and its reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/85629 Approved by: https://github.com/mruberry	2022-10-05 12:15:49 +00:00
Edward Z. Yang	d07b85393a	SymInt fixes from symbolic-shapes branch (#86242 ) symintify a few inplace meta functions symintify resize_(), nbytes(), functionalization input mutations meta funcs for avg_pool2d_backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/86242 Approved by: https://github.com/Chillee	2022-10-05 04:52:02 +00:00
Peter Bell	b317736c39	Fix default correction value in std/var decompositions (#85839 ) `torch.std` and `torch.var` default to the unbiased estimator, i.e. `correction=1`. This only works as is because the default on this overload is not exercised by the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85839 Approved by: https://github.com/ezyang	2022-10-04 23:23:39 +00:00
Horace He	82d9592f1b	Batch of symintifications to allow more models to pass in inference (#86104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104 Approved by: https://github.com/ezyang	2022-10-04 04:01:58 +00:00
Horace He	37013bb443	Added _unsafe_view decomp (#86103 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86103 Approved by: https://github.com/ezyang	2022-10-03 20:38:31 +00:00
lezcano	07ce0b435b	Remove backward for im2col and col2im (#85542 ) `im2col` is a linear map, and `col2im` is its adjoint. As such, the adjoint to `col2im` is `im2col` (the adjoint of the adjoint is the original function. There's no point having explicit derivatives in ATen for these functions, so this PR deletes all these. Furthermore, along the way, we fix an error for the derivative of im2col for non-batched inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85542 Approved by: https://github.com/soulitzer, https://github.com/ngimel	2022-10-03 00:16:42 +00:00
Horace He	e6dd2965af	A bunch of coverage improvements (re for models in inference snext50, BERT_pytorch, mobilenet_v3_large, pytorch_CycleGAN_and_pix2pix, dcgan, resnet18, mnasnet1_0) (#86050 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86050 Approved by: https://github.com/ezyang	2022-10-02 20:46:20 +00:00
lezcano	787028cadb	Implement col2im decomposition and fix im2col and add a few preconditions (#85541 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541 Approved by: https://github.com/jansel	2022-09-30 09:31:53 +00:00
Elias Ellison	6a2b12dd65	Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471 Approved by: https://github.com/ezyang	2022-09-28 23:06:59 +00:00
Animesh Jain	796da4df4d	Return contiguous tensor from softmax decomposition (#85788 ) Fixes https://github.com/pytorch/torchdynamo/issues/1135 Softmax decomp's output stride does not match with aten softmax output stride. Not sure if its desirable. Opening a PR for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85788 Approved by: https://github.com/ngimel, https://github.com/ezyang	2022-09-28 20:52:45 +00:00
Nikita Karetnikov	8dd45424ea	[primTorch] Add ref for `huber_loss` and error inputs (#85041 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041 Approved by: https://github.com/lezcano, https://github.com/mruberry	2022-09-28 19:56:17 +00:00
Edward Z. Yang	793488cda2	Revert "Revert "Symintifying slice ops (#85196 )"" (#85746 ) This reverts commit `3a171dfb0c`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746 Approved by: https://github.com/albanD	2022-09-28 04:37:35 +00:00
PyTorch MergeBot	3a171dfb0c	Revert "Symintifying slice ops (#85196 )" This reverts commit `4c01c51266`. Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch	2022-09-27 18:01:27 +00:00
Fabio Rocha	d5ce2bbed2	[primTorch] decompositions for upsample_bicubic2d (#85403 ) FYI, this decomposition seems to be significantly slower than the lowering in torchinductor: ``` ------------------------------------- upsample_bicubic2d -------------------------------------] \| lowering \| Inductor \| Eager 32 threads: ------------------------------------------------------------------------------------ (torch.Size([16, 4, 128, 256]),), ((512, 1024), True) \| 1.8 \| 3.880 \| 1.4 (torch.Size([16, 4, 128, 256]),), ((512, 1024), False) \| 1.9 \| 3.887 \| 1.4 ``` This seems related to the fact that in the lowering we can use int32s as the indices and in the decomp we can only use int64s (see https://github.com/pytorch/torchdynamo/issues/1293). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85403 Approved by: https://github.com/ngimel	2022-09-26 20:11:23 +00:00
Elias Ellison	bcc544e9d7	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-26 17:08:14 +00:00
Fabio Rocha	ffaff8896a	Removed None arg check in test/test_decomp.py (#85402 ) Not sure why this check was necessary? Tests seem to run fine without it. There were definitely tests this was skipping before that it shouldn't, e.g., pretty much all of the tests for `torch.nn.functional.interpolate` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85402 Approved by: https://github.com/ezyang	2022-09-24 11:37:27 +00:00
Edward Z. Yang	4c01c51266	Symintifying slice ops (#85196 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85196 Approved by: https://github.com/ezyang	2022-09-23 22:01:32 +00:00
PyTorch MergeBot	d10de31cc8	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `78afa0cf0c`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk `78afa0cf0c`	2022-09-23 17:21:43 +00:00
PyTorch MergeBot	3b195fd33e	Revert "Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471 )" This reverts commit `1e92eb8068`. Reverted https://github.com/pytorch/pytorch/pull/85471 on behalf of https://github.com/clee2000 due to stacked prs https://github.com/pytorch/pytorch/pull/85417 and https://github.com/pytorch/pytorch/pull/85434 broke trunk, reverting this so i can revert the others	2022-09-23 17:13:35 +00:00
Elias Ellison	1e92eb8068	Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471 Approved by: https://github.com/ezyang	2022-09-23 16:02:15 +00:00
Elias Ellison	78afa0cf0c	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-23 15:50:03 +00:00
Ryan Spring	71dddec6ea	Cast grad_input to half when input_dtype is half in _softmax_backward_data aten decomposition (#85497 ) Fixes #85504 `_softmax_backward_data` and `_log_softmax_backward_data` cast `grad_input` to half when the `input_dtype` is half. When running with amp without the cast, consumer ops can trigger `RuntimeError: expected scalar type Float but found Half`. https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L70-L83 https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L102-L113 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85497 Approved by: https://github.com/ngimel	2022-09-23 06:52:38 +00:00
PyTorch MergeBot	5043457a8e	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `9c77083965`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) `9c77083965`	2022-09-22 15:44:38 +00:00
Elias Ellison	9c77083965	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-22 13:03:57 +00:00
Horace He	2f4a517d67	Ported matmul compositeimplicitautograd impl into core (#85239 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239 Approved by: https://github.com/ezyang, https://github.com/lezcano	2022-09-21 09:25:24 +00:00
lezcano	d17b144e65	Adding multigammaln ref and fix arange (#85153 ) Partially based on https://github.com/pytorch/pytorch/pull/83662. I'll help land this one, as Rob does not work in the PyTorch project anymore I removed the data-dependent check for the args, as data dependencies are bad for many reasons (and it was failing when the input has NaNs). It also registers arange as a decomposition, and fixes the naming of its args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153 Approved by: https://github.com/mruberry, https://github.com/ngimel	2022-09-20 17:52:56 +00:00
lezcano	5dd9610e9d	Refs and decompositions for index_{add,copy,select,fill} (#85002 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002 Approved by: https://github.com/ngimel	2022-09-17 19:57:34 +00:00
PyTorch MergeBot	e33b464ffc	Revert "Refs and decompositions for index_{add,copy,select,fill} (#85002 )" This reverts commit `2f0b3de443`. Reverted https://github.com/pytorch/pytorch/pull/85002 on behalf of https://github.com/huydhn due to Broke trunk slow tests	2022-09-17 04:26:04 +00:00
lezcano	2f0b3de443	Refs and decompositions for index_{add,copy,select,fill} (#85002 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002 Approved by: https://github.com/ngimel	2022-09-16 23:59:35 +00:00
Sherlock Huang	29eba319b4	Use alias for nop decomp (#84727 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84727 Approved by: https://github.com/Chillee	2022-09-16 18:50:56 +00:00
Natalia Gimelshein	6162a04364	fix half_to_float arg in *softmax decomp (#85120 ) Fixes https://github.com/pytorch/torchdynamo/issues/1239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85120 Approved by: https://github.com/Chillee	2022-09-16 15:54:50 +00:00
Horace He	1459a909b4	Added mv, mm, and binary_cross_entropy_with_logits decomps (#84451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84451 Approved by: https://github.com/ngimel	2022-09-08 17:56:18 +00:00
Ivan Yashchuk	6363b1b358	Add nvFuser support for aten.native_batch_norm_backward (#84546 ) Replacing `tensor.reshape(broadcast_mask)` with unsqueezes makes the implementation of `batch_norm_backward` more friendly for PrimTorch+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84546 Approved by: https://github.com/Chillee	2022-09-06 19:56:17 +00:00
Fabio Rocha	91a5f52f51	Decomp for nn.functional.grid_sampler_2d (#84350 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84350 Approved by: https://github.com/jansel, https://github.com/Lezcano	2022-09-05 21:33:26 +00:00
lezcano	3dfbf09afe	Optimise the decomposition for `adaptive_avg_pool2d` wrt. TorchInductor (#84483 ) This fixes some part of the implementation that did not work with TorchInductor (e.g. the indices in TorchInductor need to be `int64`s, while in PyTorch we can have `int32`s). It also brings up the performance of the kernel to similar numbers than those of the lowering (benchmarks below). Pull Request resolved: https://github.com/pytorch/pytorch/pull/84483 Approved by: https://github.com/jansel	2022-09-02 22:25:09 +00:00
Sherlock Huang	ef3ab31f1c	Decomp for aten.im2col (#84303 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84303 Approved by: https://github.com/jansel, https://github.com/ngimel	2022-09-01 00:06:35 +00:00
Nikita Karetnikov	71ce9cd072	[primTorch] Add decomp for `soft_margin_loss` (#83804 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83804 Approved by: https://github.com/Lezcano, https://github.com/ngimel	2022-08-31 17:39:34 +00:00
Nikita Shulga	b8e1c54f53	[Prim] Implement group_norm_backward (#84037 ) Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus: ``` #!/usr/bin/env python3.8 import torch func = torch.ops.aten.native_group_norm_backward.default decomp = torch._decomp.decomposition_table[func] for args in ( (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]), (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])): nrc=func(args) drc=decomp(args) for i in range(len(nrc)): print(i, torch.max(nrc[i]-drc[i])) print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037 Approved by: https://github.com/Chillee, https://github.com/ngimel	2022-08-29 09:29:30 +00:00
Natalia Gimelshein	533203f5aa	_to_copy decomp (#84108 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108 Approved by: https://github.com/Chillee	2022-08-29 02:25:02 +00:00
lezcano	9fc02f6bc5	Decomposition for adaptive_avg_pool2d (#84062 ) This was already implemented as a lowering in https://github.com/pytorch/torchdynamo/pull/962. I'm putting the idea up here ~(I haven't even run this code, so it surely has many issues, but I reckon the general idea should hopefully be alright).~ The tests now pass and I corrected the issues that the first implementation had. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84062 Approved by: https://github.com/jansel	2022-08-29 01:38:51 +00:00
PyTorch MergeBot	33db5da4c1	Revert "[Prim] Implement group_norm_backward (#84037 )" This reverts commit `bed85cce8b`. Reverted https://github.com/pytorch/pytorch/pull/84037 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 17:30:50 +00:00
PyTorch MergeBot	ff23f3ac1c	Revert "_to_copy decomp (#84108 )" This reverts commit `e33897cb99`. Reverted https://github.com/pytorch/pytorch/pull/84108 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 13:27:49 +00:00
Natalia Gimelshein	e33897cb99	_to_copy decomp (#84108 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108 Approved by: https://github.com/Chillee	2022-08-27 03:51:03 +00:00
Nikita Shulga	bed85cce8b	[Prim] Implement group_norm_backward (#84037 ) Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus: ``` #!/usr/bin/env python3.8 import torch func = torch.ops.aten.native_group_norm_backward.default decomp = torch._decomp.decomposition_table[func] for args in ( (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]), (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])): nrc=func(args) drc=decomp(args) for i in range(len(nrc)): print(i, torch.max(nrc[i]-drc[i])) print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037 Approved by: https://github.com/Chillee, https://github.com/ngimel	2022-08-27 01:10:27 +00:00
Horace He	9a236c7ab4	Made some minor cleanups to decompositions (#83814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83814 Approved by: https://github.com/ngimel	2022-08-26 10:55:31 +00:00
Animesh Jain	e2f75d63d4	Decomposition - batch_norm, save_mean and save_variance always float32 (#84013 ) AMP error shown here - https://github.com/pytorch/torchdynamo/issues/835 Test missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/84013 Approved by: https://github.com/ezyang	2022-08-25 16:09:52 +00:00
Ivan Yashchuk	473b733bae	Replace .new_zeros(()) with 0.0 in torch/_decomp/decompositions (#83734 ) `new_zeros` is decomposed into `prims.empty_strided`+`prims.fill`+`prims.copy_to` and none of these are supported by prims+nvFuser executor currently. Replacing it with 0.0 makes these backward decompositions nvFuser friendly. Example with `torch.ops.aten.hardsigmoid_backward.default`: ```py # Before this PR opcode name target args kwargs ------------- ------------------------ -------------------------------- ------------------------------------------------------------ ---------------------------------------------------------------------------------------- placeholder a_1 a_1 () {} placeholder g_1 g_1 () {} call_function gt_default nvprims.gt.default (a_1, -3.0) {} call_function lt_default nvprims.lt.default (a_1, 3.0) {} call_function bitwise_and_default nvprims.bitwise_and.default (gt_default, lt_default) {} call_function mul_default nvprims.mul.default (g_1, 0.16666666666666666) {} call_function empty_strided prims.empty_strided.default ([], []) {'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False} call_function fill_default prims.fill.default (empty_strided, 0) {} call_function copy_to_default prims.copy_to.default (empty_strided, fill_default) {} call_function broadcast_in_dim_default nvprims.broadcast_in_dim.default (copy_to_default, [3, 2], []) {} call_function where_default nvprims.where.default (bitwise_and_default, mul_default, broadcast_in_dim_default) {} output output output (where_default,) {} # After this PR opcode name target args kwargs ------------- ------------------- --------------------------- --------------------------------------- -------- placeholder a_1 a_1 () {} placeholder g_1 g_1 () {} call_function gt_default nvprims.gt.default (a_1, -3.0) {} call_function lt_default nvprims.lt.default (a_1, 3.0) {} call_function bitwise_and_default nvprims.bitwise_and.default (gt_default, lt_default) {} call_function mul_default nvprims.mul.default (g_1, 0.16666666666666666) {} call_function where_default nvprims.where.default (bitwise_and_default, mul_default, 0.0) {} output output output (where_default,) {} Pull Request resolved: https://github.com/pytorch/pytorch/pull/83734 Approved by: https://github.com/Chillee	2022-08-22 09:12:13 +00:00
Edward Z. Yang	02581f053b	Address CR comments for "Delete ProxyTensor wrapper subclass" (#83646 ) CR is on https://github.com/pytorch/pytorch/pull/83330 - Factor proxy slot getters/setters into helper functions - Use a weak map for storing proxies, so they go away when tracing is done - More documentation on SymDispatchMode Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83646 Approved by: https://github.com/Chillee	2022-08-18 22:18:09 +00:00
Edward Z. Yang	817a82704f	Delete ProxyTensor wrapper subclass (#83330 ) I was working on https://github.com/pytorch/torchdynamo/issues/80 and my working hypothesis for what was causing the error was that proxy tensor was not advertising correct dispatch keys, causing AMP to operate differently when you traced. I could have fixed this directly by replicating fake tensor's fix for setting dispatch keys to also apply to proxy tensor, but I was like, "Why must I repeat myself." This PR is the result. It completely deletes the ProxyTensor wrapper subclass, so that when we are tracing, the tensors flowing through the program are the original real or fake tensors, depending on what the user requested in the top-level API. There is no more wrapping. To store the Proxy objects necessary for actually doing tracing, I store the property directly on the tensors. (Note: I never clean up old entries from the map at the moment, this is easily fixed by using a weak map) Benefits of doing this: * No more tip-toeing around no_dispatch() creation of new ProxyTensors; we never create new tensors (except when we call the underlying func), so you don't have to worry about accidentally tracing them. * No more syncing up metadata from in place operators. In particular https://github.com/pytorch/pytorch/issues/81526 is mooted * This fixes https://github.com/pytorch/torchdynamo/issues/519 as we no longer need to teach proxy tensor to support sparse tensor. * No more schlepping symbolic integers from the inner fake tensor to the outer proxy tensor. If you can make a fake tensor with symbolic ints, you're done, nothing else to do. To avoid having to rewrite all of the guts, when I get to the actual proxy tensor handler, I first "fetch" the stored ProxyTensor data from the weakmap via a tree_map, and then operate on the consequent data as before. A more optimized implementation is possible. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83330 Approved by: https://github.com/Chillee	2022-08-18 01:56:07 +00:00
Nikita Karetnikov	cd86d25515	[primTorch] Move addcdiv from decompositions -> refs (#80842 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80842 Approved by: https://github.com/Lezcano, https://github.com/ngimel	2022-08-16 17:23:00 +00:00
Horace He	f02f304657	Added nll_loss_forward decomposition + some other minor decomps (#83235 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83235 Approved by: https://github.com/ngimel	2022-08-13 10:24:58 +00:00
Natalia Gimelshein	112ec24f09	Fix device behavior for masked_fill (#82737 ) Fixes #81018, based on #81036. It will create graph break for cpu 0d tensor value due to .item() call (we could maybe specialize on that instead of breaking?), but otherwise it would create graph break due to synchronizing `to` call, so there's no way around :-(, and for number `value` argument we already should be specializing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82737 Approved by: https://github.com/Chillee	2022-08-04 15:47:56 +00:00
Brian Hirsh	4a77bee661	prevent python view impls from getting registered to the meta key (#82007 ) We don't want to register view ops in python to the `Meta` dispatch key, because doing that prevents us from correctly aliasing storage information. This PR fixes the existing python registrations, and makes it an error to do that in the future. Example: ``` with FakeTensorMode.push() as mode: b = torch.ones(2) c = b.unsqueeze(-1) b_ = StorageWeakRef(b.storage()) c_ = StorageWeakRef(c.storage()) print(b_.cdata) print(c_.cdata) # their storages are different (now fixed in this PR) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82007 Approved by: https://github.com/ezyang, https://github.com/eellison	2022-07-27 17:15:05 +00:00
Shangdi Yu	9088757cc6	move aten.native_batch_norm_backward decomposition to core (#81522 ) Move aten.native_batch_norm_backward decomposition from https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L148. Changed to not recompute mean and invstd, added type cast. In fucntorch, changed `@register_decomposition_for(aten.native_batch_norm_backward)` to `@register_decomposition_for_jvp(aten.native_batch_norm_backward)` Passing `pytest test/test_decomp.py -k norm` Note that when the output mask is False for grad_weight and grad_bias, we should return None to be consistent with the non-decomposed operator's behavior. But "None" doesn't work with vjp, so the version of decomposition in functorch used zeros. See `b33c1f7dd4/functorch/functorch/_src/decompositions.py (L210)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81522 Approved by: https://github.com/Chillee	2022-07-27 06:11:34 +00:00
lezcano	11fe277b62	[PrimTorch] Add reference for torch.norm (#81765 ) This ref does more things than `torch.norm`, and it fixes a few bugs that `torch.norm` has. This implementation and the `torch.norm` implementation come to terms in the next PR of this stack We put this PR before, as otherwise `test_decomp.py` was failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81765 Approved by: https://github.com/ngimel	2022-07-25 19:57:21 +00:00
Vivek Khandelwal	cb63ffc553	Add decomposition for `aten.upsample_bilinear2d.vec` (#80964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80964 Approved by: https://github.com/jansel, https://github.com/Chillee	2022-07-23 02:22:15 +00:00
Huy Do	12cb26509a	Apply ufmt to torch internal (#81643 ) This is a big bang PR, merge conflicts are probably expected and will be addressed at merge. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643 Approved by: https://github.com/ezyang	2022-07-22 02:19:50 +00:00
Horace He	a5fb41e3d3	Revert "Revert "Refactored prim utils into _prims_utils folder (#81746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81746 Approved by: https://github.com/anijain2305, https://github.com/Krovatkin	2022-07-20 23:43:57 +00:00
PyTorch MergeBot	e43a02c314	Revert "Refactored prim utils into _prims_utils folder (#81088 )" This reverts commit `80231d0a72`. Reverted https://github.com/pytorch/pytorch/pull/81088 on behalf of https://github.com/jeanschmidt due to breaking internal tests	2022-07-19 19:56:41 +00:00
Horace He	80231d0a72	Refactored prim utils into _prims_utils folder (#81088 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81088 Approved by: https://github.com/ngimel	2022-07-19 03:55:51 +00:00
Natalia Gimelshein	50d205c551	make clamp decomps use torch.* calls, move clamp_min/clamp_max to refs (#81619 ) Per title, @chillee is anything else necessary to remove decomp other than decorating ref with `register_decomposition`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/81619 Approved by: https://github.com/Chillee	2022-07-18 16:52:45 +00:00
Horace He	5139053e02	Fixed the decomposition for `embedding_dense_backward` (#81528 ) No guarantee about the strides of `grad_output`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81528 Approved by: https://github.com/jansel	2022-07-15 17:51:00 +00:00
Edward Z. Yang	fca03eeec1	Make proxy tensor support item() calls on torch.tensor constants (#81192 ) This PR is doing a few interrelated things, all of which are necessary to get correctness. Read the comment in torch/fx/experimental/proxy_tensor.py for the high level overview. Let's break down the parts of this PR: * Bug fix where `enable_torch_dispatch_mode` with `None` doesn't work. This make `enable_torch_dispatch_mode(current_mode.inner)` work which is the basis for how we temporarily disable fake tensor mode. * Bug fix for when fake tensor mode is combined with a non-mode tensor subclass. This actually could be ablated from this PR but it affects where the logic for allowing non fake tensor inputs with lift goes, so it's all in here in one go. There are some relevant tests for the fix in fake tensor, but it turns out I didn't need this because I'm always using proxy tensors as a mode (which ensures the ordering is right.) * New `lift_fresh` view operator. Note that like lift, we have to manually write the functionalize kernel for these functions. * The actual change, which is to save constants when we see them in the proxy tensor mode, and then propagate them as we go (because otherwise you'll handle mutations on constants incorrectly--see test.) This is mildly BC-breaking if anyone was previously interposing on at::lift, but this operator was relatively new and I checked functorch which has no explicit reference to lift. So I think it should not be too disruptive. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81192 Approved by: https://github.com/samdow, https://github.com/bdhirsh	2022-07-15 03:53:40 +00:00
lezcano	b5b9db9f84	Make `kl_div` a composite function. (#80334 ) Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285 Fixes https://github.com/pytorch/pytorch/issues/80158 Fixes https://github.com/pytorch/pytorch/issues/78867 Fixes https://github.com/pytorch/pytorch/issues/69230 Supersedes https://github.com/pytorch/pytorch/pull/79007 Supersedes https://github.com/pytorch/pytorch/pull/69212 Supersedes https://github.com/pytorch/pytorch/pull/19659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334 Approved by: https://github.com/ezyang	2022-07-13 20:07:36 +00:00
PyTorch MergeBot	f2c8557521	Revert "Make `kl_div` a composite function. (#80334 )" This reverts commit `828c787ea9`. Reverted https://github.com/pytorch/pytorch/pull/80334 on behalf of https://github.com/ezyang due to doesn't work with xla	2022-07-06 17:51:06 +00:00
lezcano	eb0889cf7d	Add support for multiple inputs to out_wrapper and strict dtype checking (#80601 ) Reland of https://github.com/pytorch/pytorch/pull/79941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80601 Approved by: https://github.com/albanD	2022-07-05 12:31:21 +00:00
lezcano	828c787ea9	Make `kl_div` a composite function. (#80334 ) Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285 Fixes https://github.com/pytorch/pytorch/issues/80158 Fixes https://github.com/pytorch/pytorch/issues/78867 Fixes https://github.com/pytorch/pytorch/issues/69230 Supersedes https://github.com/pytorch/pytorch/pull/79007 Supersedes https://github.com/pytorch/pytorch/pull/69212 Supersedes https://github.com/pytorch/pytorch/pull/19659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334 Approved by: https://github.com/ezyang	2022-07-04 19:33:43 +00:00
PyTorch MergeBot	184a065ba7	Revert "Add support for multiple inputs to out_wrapper and strict dtype checking (#79941 )" This reverts commit `dc7066a8f0`. Reverted https://github.com/pytorch/pytorch/pull/79941 on behalf of https://github.com/suo due to broke master `dc7066a8f0`	2022-06-30 03:29:30 +00:00
lezcano	dc7066a8f0	Add support for multiple inputs to out_wrapper and strict dtype checking (#79941 ) When a function returns multiple parameters in PyTorch, the `out` parameter takes a tuple of tensors (see `linalg.svd` for example). The current implementation in `out_wrapper_multi` modelled this wrong, as it assumed that it would take a number of different named parameters. This PR implements the correct behaviour in `out_wrapper`. As a small side-effect, we now need to call `@out_wrapper()` when the output is just one tensor. This PR also implements an additional optional parameter that checks whether the dtype of the given `out` is exactly the dtype that the meta function requires. This is the behaviour that we currently have in PyTorch, and this check is necessary in eager when we call with these tensors into external libraries. We also make the functions with several outputs return a namedtuple, similar to what we do in PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79941 Approved by: https://github.com/mruberry, https://github.com/ezyang	2022-06-30 02:47:16 +00:00
Horace He	d43e6c9f4a	Revert "Revert "formatted _decomp folder with black"" This reverts commit `2027eae67c`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79226 Approved by: https://github.com/Krovatkin	2022-06-22 20:47:52 +00:00
Horace He	4193252de9	Revert "Revert "Added kl_div_backward decomp"" This reverts commit `60a13f4ec9`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79225 Approved by: https://github.com/Krovatkin	2022-06-22 18:09:52 +00:00
Horace He	e89676f76c	fix logical_not reland issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/79900 Approved by: https://github.com/ngimel	2022-06-21 03:41:18 +00:00
Nikita Shulga	f5eb05f107	Revert "Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads"" This reverts commit `f3665dd237`. Reverted https://github.com/pytorch/pytorch/pull/79819 on behalf of https://github.com/malfet due to land raced with softshrink refs	2022-06-20 14:22:15 -07:00
Horace He	f3665dd237	Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads" Pull Request resolved: https://github.com/pytorch/pytorch/pull/79819 Approved by: https://github.com/mruberry	2022-06-20 19:50:43 +00:00
lezcano	16f30b494c	Make l1_loss composite Fixing the forward AD for `sgn` in the next PR of this stack uncovered a number of issues with the derivatives of `l1_loss`. Upon inspection, `l1_loss` was just implemented as a composite function, but it was not differentiable. This PR makes it a fully differentiable function. As a side note, `l1_loss_out` was incorrect in a number of ways. Even more, it is not exposed to the public as `F.l1_loss` does not accept an `out=` parameter. As such it is not even tested. I wonder how useful is to have `out=` variants for loss functions if we don't expose them at all. Even more, I wonder how useful is to have `_out` variants for loss functions, given that their most normal use case is to return just a real number cc jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/79804 Approved by: https://github.com/zou3519, https://github.com/malfet	2022-06-20 19:10:54 +00:00
Jason Ansel	d2e18606e7	Fix view issue in embedding_dense_backward decomp (#79857 ) I was hitting: ``` File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 66, in proxy_call return CURRENT_DECOMPOSITION_TABLE[func_overload](args, kwargs) File "/home/jansel/pytorch/torch/_decomp/decompositions.py", line 801, in embedding_dense_backward indices_rank1 = indices.view(numel) File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 122, in __torch_dispatch__ return proxy_call(func_overload, args, kwargs) File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 86, in proxy_call real_out = func_overload(args, *kwargs) File "/home/jansel/pytorch/torch/_ops.py", line 49, in __call__ return self._op(args, **kwargs or {}) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79857 Approved by: https://github.com/Chillee	2022-06-20 17:58:14 +00:00
PyTorch MergeBot	d4a9438786	Revert "Make l1_loss composite" This reverts commit `61a5c779bf`. Reverted https://github.com/pytorch/pytorch/pull/78257 on behalf of https://github.com/malfet due to This breaks executorch	2022-06-17 18:14:21 +00:00
Ivan Yashchuk	bc1fef96af	Reference implementations for rsqrt and native_layer_norm (#79413 ) This PR adds references for: - `torch.rsqrt` - `torch.native_layer_norm` - `torch.nn.functional.layer_norm` `native_layer_norm` had a different number of dimensions if the input was 0-sized. I fixed that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79413 Approved by: https://github.com/mruberry, https://github.com/Chillee	2022-06-17 07:24:02 +00:00
Jason Ansel	c8fb02b452	Use amax instead of max for softmax decomps (#79667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79667 Approved by: https://github.com/Chillee	2022-06-16 04:09:33 +00:00
lezcano	61a5c779bf	Make l1_loss composite Fixing the forward AD for `sgn` in the next PR of this stack uncovered a number of issues with the derivatives of `l1_loss`. Upon inspection, `l1_loss` was just implemented as a composite function, but it was not differentiable. This PR makes it a fully differentiable function. As a side note, `l1_loss_out` was incorrect in a number of ways. Even more, it is not exposed to the public as `F.l1_loss` does not accept an `out=` parameter. As such it is not even tested. I wonder how useful is to have `out=` variants for loss functions if we don't expose them at all. Even more, I wonder how useful is to have `_out` variants for loss functions, given that their most normal use case is to return just a real number cc jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/78257 Approved by: https://github.com/jbschlosser	2022-06-16 00:03:22 +00:00
PyTorch MergeBot	fefff54cad	Revert "Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads""" This reverts commit `a2d2981e8e`. Reverted https://github.com/pytorch/pytorch/pull/79224 on behalf of https://github.com/suo due to broke lots of things `a2d2981e8e`	2022-06-10 04:40:43 +00:00
Horace He	a2d2981e8e	Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"" This reverts commit `d67309aefb`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79224 Approved by: https://github.com/mruberry	2022-06-10 03:07:14 +00:00
PyTorch MergeBot	d67309aefb	Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads" This reverts commit `64b6bd8c1e`. Reverted https://github.com/pytorch/pytorch/pull/79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000	2022-06-09 13:11:23 +00:00
PyTorch MergeBot	60a13f4ec9	Revert "Added kl_div_backward decomp" This reverts commit `a08685ebc9`. Reverted https://github.com/pytorch/pytorch/pull/79001 on behalf of https://github.com/malfet due to PR failed in newly added tests, see https://hud.pytorch.org/pr/79001	2022-06-09 13:08:30 +00:00
PyTorch MergeBot	2027eae67c	Revert "formatted _decomp folder with black" This reverts commit `4945c72151`. Reverted https://github.com/pytorch/pytorch/pull/79002 on behalf of https://github.com/janeyx99 due to Broke decomp tests on trunk + also on PR https://hud.pytorch.org/minihud#4945c72151e29cb524974e1714654cf790ddb37d	2022-06-09 12:58:03 +00:00
Horace He	4945c72151	formatted _decomp folder with black Pull Request resolved: https://github.com/pytorch/pytorch/pull/79002 Approved by: https://github.com/ezyang	2022-06-09 07:16:37 +00:00
Horace He	a08685ebc9	Added kl_div_backward decomp Pull Request resolved: https://github.com/pytorch/pytorch/pull/79001 Approved by: https://github.com/ezyang	2022-06-09 07:16:37 +00:00

1 2 3 4 5 ...

292 Commits