Commit Graph

292 Commits

Author SHA1 Message Date
Tugsbayasgalan Manlaibaatar
36e1f7bc2b Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608
Approved by: https://github.com/ngimel
2023-01-22 07:12:29 +00:00
Peter Bell
dd760c98f8 [decomp] Use new squeeze.dims overload in decompositions (#91602)
This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602
Approved by: https://github.com/ngimel
2023-01-20 18:08:18 +00:00
PyTorch MergeBot
2891cecd8d Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608)"
This reverts commit 4386f317b9.

Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ZainRizvi due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (__main__.TestEagerFusionOpInfoCPU) is failing consistently since this PR was merged
2023-01-20 17:17:35 +00:00
Tugsbayasgalan Manlaibaatar
4386f317b9 Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608
Approved by: https://github.com/ngimel
2023-01-20 12:39:56 +00:00
lezcano
8b861544f9 Remove lowering and decompositions of zero_, zero, zeros_like... in favour of their references (#92071)
The generated triton code is identical.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92071
Approved by: https://github.com/ngimel
2023-01-18 23:22:36 +00:00
Peter Bell
8770a7ed6f Decompose more inplace ops (#90967)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90967
Approved by: https://github.com/anijain2305
2023-01-18 21:07:47 +00:00
Peter Bell
4058dedf21 Replace log(1 + x) with log1p(x) (#92114)
`log1p` offers better precision near zero since `(1 + x) - 1` truncates any
values less than the float epsilon to zero. For `soft_margin_loss` this also
requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup
on CUDA and a 1.1x speedup on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-01-18 10:43:56 +00:00
lezcano
da58f9eb8f Rewrite out-of-place decompositions in terms of out-of-place ops (#92003)
Fixes https://github.com/pytorch/torchdynamo/issues/1863

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92003
Approved by: https://github.com/ngimel
2023-01-17 16:53:27 +00:00
vfdev-5
5f55335c2e Fixed output memory format mismatch for bicubic2d (#90470)
Description:

- output memory format is matching input for bicubic2d

Problem: output tensor's memory format does not match input format for bicubic2d

```python
import torch

i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"

> AssertionError: Should be channels last but given channels first (True)
```

Related PR fixing bilinear ops: https://github.com/pytorch/pytorch/pull/53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh )

Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev

- Updated code to match grad input / output memory formats
- temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl`
- Updated tests
- Added missing forward AD support for bicubic with antialiasing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90470
Approved by: https://github.com/NicolasHug, https://github.com/lezcano
2023-01-12 19:52:28 +00:00
min-jean-cho
af242eedfb [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-11 23:23:42 +00:00
David Berard
d7dc1c2fd5 Support zero dimensions in softmax decompositions (#91322)
The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including:
* decompositions & refs (this was causing dynamo failures)
* forward AD for logsumexp
* MPS log_softmax_backward

This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos.

example of "computation along zero dimensions":

```python
# example of where
import torch

t = torch.rand((4, 0, 0))
print("~")
print(torch.nn.functional.softmax(t, dim=-1))  # this passes
print("~")
torch._refs.softmax(t, dim=-1)  # this fails
print("~")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322
Approved by: https://github.com/lezcano
2023-01-11 09:35:43 +00:00
XiaobingSuper
3790b50505 inductor: fix .to(memort_format) issue which doesn't generate right stride (#91948)
Motivation: for **.to(memory_format),** the inductor doesn't generate the right stride, see the following example:
```
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, x):
        x = x.to(memory_format=torch.contiguous_format)
        return x
```

the generated code doesn't do the memory format change and gets a wrong stride **(802816, 1, 14336, 256)**, it is not a contiguous stride.

```
from ctypes import c_void_p, c_long
import torch
import random
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, = args
    args.clear()
    return (arg0_1, )

if __name__ == "__main__":
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32)
    print_performance(lambda: call([arg0_1]))
```

After this PR, the will have a memory format change:

```
from ctypes import c_void_p, c_long
import torch
import random
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

kernel_cpp_0 = async_compile.cpp('''
#include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
extern "C" void kernel(const float* __restrict__ in_ptr0,
                       float* __restrict__ out_ptr0)
{
    #pragma omp parallel num_threads(40)
    {
        {
            #pragma omp for
            for(long i0=0; i0<128; i0+=1)
            {
                #pragma GCC ivdep
                for(long i1=0; i1<256; i1+=1)
                {
                    #pragma GCC ivdep
                    for(long i2=0; i2<3136; i2+=1)
                    {
                        auto tmp0 = in_ptr0[i1 + (256*i2) + (802816*i0)];
                        out_ptr0[i2 + (3136*i1) + (802816*i0)] = tmp0;
                    }
                }
            }
        }
    }
}
''')

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, = args
    args.clear()
    buf1 = empty_strided((128, 256, 56, 56), (802816, 3136, 56, 1), device='cpu', dtype=torch.float32)
    kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
    del arg0_1
    return (buf1, )

if __name__ == "__main__":
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32)
    print_performance(lambda: call([arg0_1]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91948
Approved by: https://github.com/ngimel
2023-01-11 08:23:26 +00:00
min-jean-cho
364f526b9c [Inductor] assert generator for random, dropout (#91833)
See comment https://github.com/pytorch/pytorch/pull/90869#discussion_r1063731541 , https://github.com/pytorch/pytorch/pull/91673#discussion_r1061099337.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91833
Approved by: https://github.com/jansel
2023-01-11 03:24:10 +00:00
PyTorch MergeBot
43050b8301 Revert "[Inductor] Added aten.uniform_ decomp (#90869)"
This reverts commit c55293d640.

Reverted https://github.com/pytorch/pytorch/pull/90869 on behalf of https://github.com/huydhn due to Crossref error cannot just simply be ignored because it would break trunk for every commits after this, i.e. fd0030fe74.  The failure would need to be handled gracefully, i.e. adding an XFAIL for example
2023-01-11 01:18:11 +00:00
min-jean-cho
c55293d640 [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-10 23:05:01 +00:00
Nikita Karetnikov
00e5f3a9c5 [primTorch] Move logsumexp decomp to refs (#91860)
Fixes #91843.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91860
Approved by: https://github.com/lezcano
2023-01-09 17:00:43 +00:00
Natalia Gimelshein
2c00064113 remove unnecessary decomps (#91828)
in favor of refs. Generated triton code is the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91828
Approved by: https://github.com/lezcano, https://github.com/soumith
2023-01-07 20:37:12 +00:00
PyTorch MergeBot
c73147f741 Revert "[decomp] Use new squeeze.dims overload in decompositions (#91602)"
This reverts commit 9262ffc692.

Reverted https://github.com/pytorch/pytorch/pull/91602 on behalf of https://github.com/clee2000 due to stacked pr was reverted, this is dependent
2023-01-05 20:39:52 +00:00
Peter Bell
9262ffc692 [decomp] Use new squeeze.dims overload in decompositions (#91602)
This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602
Approved by: https://github.com/ngimel
2023-01-05 17:59:32 +00:00
lezcano
484dd40022 Implement PReLU in a compositional way (#91238)
The PReLU implementation was all over the place. This lead to a number
of bugs like https://github.com/pytorch/pytorch/issues/68760.  We fix it by:
- Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel
- This second kernel is just a good-ol' pointwise kernel.
- We implement the derivative for the pointwise kernel via TI as well for speed.
- We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally

This fixes a number of issues:
- We don't perform copies any more when the inputs are not contiguous
- The derivatives are now correct
- We fix vmap and many other functorch-related issues.
- CPU and CUDA now share the relevant broadcasting logic
- The implementation is about 1/3 the length.

Fixes https://github.com/pytorch/pytorch/issues/68760
Fixes https://github.com/pytorch/pytorch/issues/89895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238
Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD
2022-12-30 10:42:30 +00:00
Joel Schlosser
8b55b86dbd Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317)
This PR moves the definitions for:
* `sym_int`
* `sym_ceil` (used only for `sym_int`)
* `sym_floor` (used only for `sym_int`)
* `sym_float`

from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined.

This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2022-12-28 16:08:16 +00:00
Joel Schlosser
1c40ec46ff Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260)
Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260
Approved by: https://github.com/ezyang
2022-12-28 16:03:25 +00:00
Nikita Shulga
fd3a7264ae [MPS] Add group_norm[fwd+backward] and mean_var (take 2) (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private

Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-22 08:54:37 +00:00
PyTorch MergeBot
645eda0a00 Revert "[MPS] Add group_norm[fwd+backward] and mean_var (#91190)"
This reverts commit 371716eb36.

Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops
2022-12-21 19:37:43 +00:00
Nikita Shulga
371716eb36 [MPS] Add group_norm[fwd+backward] and mean_var (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Depends on https://github.com/pytorch/pytorch/pull/91203

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-21 17:33:27 +00:00
Nikita Shulga
46f64117db [BE] Use aten global var (#91188)
s/torch.ops.aten/aten/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91188
Approved by: https://github.com/ngimel
2022-12-21 02:28:51 +00:00
Peter Bell
e670c261c5 Decompose fill, zero, and zeros_like (#90968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90968
Approved by: https://github.com/ngimel
2022-12-21 00:59:50 +00:00
Natalia Gimelshein
e689c50922 Don't recompute var in bn decomp (#90984)
Fixes https://github.com/pytorch/torchdynamo/issues/1988
Repeated `var` computation is not CSE'd for some reason.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90984
Approved by: https://github.com/Chillee
2022-12-16 21:38:49 +00:00
Brian Hirsh
7a683eaeb8 aot_autograd: add assert for functional-only graph (#88816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-12-16 21:04:36 +00:00
soulitzer
98a9235dce Fix prelu ref when a.ndim < 2 (#89809)
Fixes https://github.com/pytorch/pytorch/issues/89560

Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures:
- forward AD (fixed in this PR)
- vmap (filed https://github.com/pytorch/pytorch/issues/89895)
- ref/meta (fixed this PR, though this also regresses nvFuser support)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809
Approved by: https://github.com/ngimel
2022-12-12 23:55:31 +00:00
Bin Bao
282dfe8ba4 [inductor][Reland] Use decomposition for _to_copy (#90494)
Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90494
Approved by: https://github.com/ngimel
2022-12-09 16:51:50 +00:00
PyTorch MergeBot
e89685b0b5 Revert "[inductor] Use decomposition for _to_copy (#90314)"
This reverts commit 3fdb5f2dda.

Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert
2022-12-08 18:29:06 +00:00
Bin Bao
3fdb5f2dda [inductor] Use decomposition for _to_copy (#90314)
Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314
Approved by: https://github.com/ngimel
2022-12-08 15:25:44 +00:00
Peter Bell
e6a7278753 Give std/var correction overloads proper defaults (#56398)
The correction overloads defaults were left off for forward
compatibility reasons, but this FC window expired well over a year ago
at this point.

Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398
Approved by: https://github.com/mruberry
2022-12-07 15:15:00 +00:00
Yanbo Liang
25f39c1bce Fix uniform ref implementation (#90094)
Fixes https://github.com/pytorch/torchdynamo/issues/1954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094
Approved by: https://github.com/ngimel
2022-12-06 21:28:17 +00:00
Animesh Jain
c1950620c5 [decomp] Fix native_batch_norm_backward dtype of dweight and dbias (#89740)
Discovered while debugging an accuracy issue for Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89740
Approved by: https://github.com/soumith, https://github.com/ngimel
2022-11-29 03:15:20 +00:00
Brian Hirsh
e20ec44544 fixes for inductor <> batch norm (#89603)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89603
Approved by: https://github.com/albanD
2022-11-29 02:16:52 +00:00
Jane Xu
8695f0cced Rectify native_batch_norm schema by splitting it into two legit schemas (#88697)
Using the same repro from the issue (but with BatchNorm2D)

Rectifies native_batch_norm schema by splitting the schema into 2:
1. one will have NON-optional alias-able running_mean and running_var inputs
2. the other will just not have those parameters at all (no_stats variation)

**Calling for name suggestions!**

## test plan
I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
CI should pass.

## next steps
Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
Approved by: https://github.com/albanD
2022-11-23 23:23:17 +00:00
Elias Ellison
a8d6b82167 Fix norm decomp when dtype is passed in (#89508)
Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
Approved by: https://github.com/anijain2305
2022-11-23 20:49:09 +00:00
Elias Ellison
72110d7833 Fix Upsample Decomp Striding For Small Channels (#89528)
Fix for https://github.com/pytorch/torchdynamo/issues/623.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
Approved by: https://github.com/ngimel, https://github.com/anijain2305
2022-11-23 20:47:39 +00:00
lezcano
154e58c032 Add most in-place references/decompositions (#88117)
We add most in-place references in a generic way. We also implement a
wrapper to implement the annoying interface that `nn.functional`
nonlinearities have.

We fix along the way a couple decompositions for some non-linearities by
extending the arguments that the references have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117
Approved by: https://github.com/mruberry
2022-11-18 14:59:46 +00:00
lezcano
3320915303 Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204)
See the title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204
Approved by: https://github.com/Chillee
2022-11-16 17:46:54 +00:00
Sherlock Huang
5faa2792fa Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-15 13:34:45 +00:00
PyTorch MergeBot
eea506aee1 Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)"
This reverts commit 9eabcc370f.

Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken 9eabcc370f
2022-11-14 01:58:47 +00:00
Sherlock Huang
9eabcc370f Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-13 21:30:53 +00:00
Horace He
37c5b42fa6 Fix matmul decomp to use reshape instead of contiguous().view() (#88832)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832
Approved by: https://github.com/bertmaher, https://github.com/ngimel
2022-11-12 00:15:42 +00:00
Ryan Spring
534ae6ae47 [primTorch] Implement group norm reference (#87054)
Add group norm reference
Split from #81191
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87054
Approved by: https://github.com/mruberry
2022-11-11 01:08:20 +00:00
Sherlock Huang
c00c34fb69 Fix meta for aten.upsample_bilinear2d.vec (#88158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158
Approved by: https://github.com/ngimel
2022-11-02 16:58:29 +00:00
Sherlock Huang
de1f641f11 Fix meta function for aten.addmm (#88068)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068
Approved by: https://github.com/albanD
2022-11-01 17:05:48 +00:00
lezcano
fd27246c16 Fix decomposition for std (#87181)
The previous implementation was lacking a few features and incurred on a
pretty large error

cc @ezyang @mruberry @ngimel @Lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87181
Approved by: https://github.com/ngimel, https://github.com/peterbell10
2022-10-28 00:50:29 +00:00
Sherlock Huang
eb99c1efce Prefer python meta function over c++ meta function (#87426)
This is a policy update for meta registration. **We now prefer python meta implementation over C++ meta function.**  This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist.

Here's the meta registration process:
1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`.  However, they will NOT register them into dispatcher.
2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd.
3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases
- 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta
- 2. the op is a view op, as the MetaTensor doesn't support aliased storage
- 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op)

Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5

cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-10-25 16:49:02 +00:00
Ryan Spring
9bb4926de0 Add xlogy and xlog1py references (#77712)
* Add reference implementations for `xlogy` and `xlog1py`
 * Replace `_wrap_scalar` helper function with `scalar_tensor` prim
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77712
Approved by: https://github.com/mruberry
2022-10-22 17:59:25 +00:00
Edward Z. Yang
d73d4aa7de Audit for error prone isinstance int/float and add lint (#87345)
We recently fixed a bug on symbolic-shapes branch where
an isinstance(x, int) test failed when passed a SymIntNode.
To prevent this, I've added a lint for all the codepaths
where we may pass SymInt/SymFloat directly to reject
direct isinstance int/float tests, and instead use one of
the aliases.  The lint rule explains the options.  I then
go and fix all of them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-10-21 15:55:24 +00:00
Sherlock Huang
f7da9db9c1 Unify decomp registries into global_decomposition_table (#86857)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857
Approved by: https://github.com/ezyang
2022-10-20 21:29:05 +00:00
Sherlock Huang
ef045695e0 Fix decomp for huber_loss_backward (#86955)
Fixes https://github.com/pytorch/pytorch/issues/86846

aten.huber_loss_backward calls aten.huber_loss_backward.out in its CompositeExplicitAutograd kernel.
The decomp was mistaken registered for both aten.huber_loss_backward.default and aten.huber_loss_backward.out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86955
Approved by: https://github.com/Chillee
2022-10-14 18:53:02 +00:00
Nikita Karetnikov
4460e40db4 [primTorch] Add a ref for addcmul (#86731)
Based on:
https://github.com/pytorch/pytorch/pull/79827
https://github.com/pytorch/pytorch/pull/72949
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86731
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-10-14 14:26:23 +00:00
Brian Hirsh
e17732b234 [test] add cross-ref tests for python meta kernels (#86228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86228
Approved by: https://github.com/albanD
2022-10-13 14:14:26 +00:00
Elias Ellison
d3f7c34cb3 Enable aten-aten decomps (#85921)
Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921
Approved by: https://github.com/ezyang
2022-10-08 05:12:42 +00:00
PyTorch MergeBot
7ec12a559c Revert "Enable aten-aten decomps (#85921)"
This reverts commit 62e4f51efd.

Reverted https://github.com/pytorch/pytorch/pull/85921 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. I think it breaks a dynamo test in trunk 62e4f51efd
2022-10-08 01:59:54 +00:00
Elias Ellison
62e4f51efd Enable aten-aten decomps (#85921)
Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921
Approved by: https://github.com/ezyang
2022-10-07 21:04:39 +00:00
lezcano
28a0b3fb18 Fix col2im and im2col decompositions (#86426)
I threw in some tests for good measure.

Fixes https://github.com/pytorch/pytorch/issues/86332
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86426
Approved by: https://github.com/ngimel
2022-10-07 08:14:06 +00:00
Elias Ellison
9ceadcadb2 Fix unfold backward decomp aliasing for 0 dim input (#86428)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86428
Approved by: https://github.com/ngimel, https://github.com/ezyang
2022-10-07 03:55:31 +00:00
lezcano
b67e022833 Fix ref / decomposition index_add (#86266)
The decomposition of `index_add` was using `slice(None)`, when it should
use just `None`.

The reference for index_add was also wrong, as `x[idx] += t` does not
use atomic add, so it does not work when several `idx`s point to the
same location.

This PR adds extra reference inputs to help test for this.

Fixes https://github.com/pytorch/torchdynamo/issues/1356
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86266
Approved by: https://github.com/ngimel
2022-10-05 19:59:15 +00:00
lezcano
c609768896 Add refs for torch.unfold and a decomposition for its backward. (#85629)
It's not clear to me what's the difference between `unfold` and `unfold_copy`, as this latter one is codegen'd

I also took this chance to clean the implementation of unfold and its reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85629
Approved by: https://github.com/mruberry
2022-10-05 12:15:49 +00:00
Edward Z. Yang
d07b85393a SymInt fixes from symbolic-shapes branch (#86242)
symintify a few inplace meta functions

symintify resize_(), nbytes(), functionalization input mutations

meta funcs for avg_pool2d_backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86242
Approved by: https://github.com/Chillee
2022-10-05 04:52:02 +00:00
Peter Bell
b317736c39 Fix default correction value in std/var decompositions (#85839)
`torch.std` and `torch.var` default to the unbiased estimator, i.e.
`correction=1`. This only works as is because the default on this
overload is not exercised by the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85839
Approved by: https://github.com/ezyang
2022-10-04 23:23:39 +00:00
Horace He
82d9592f1b Batch of symintifications to allow more models to pass in inference (#86104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104
Approved by: https://github.com/ezyang
2022-10-04 04:01:58 +00:00
Horace He
37013bb443 Added _unsafe_view decomp (#86103)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86103
Approved by: https://github.com/ezyang
2022-10-03 20:38:31 +00:00
lezcano
07ce0b435b Remove backward for im2col and col2im (#85542)
`im2col` is a linear map, and `col2im` is its adjoint. As such, the
adjoint to `col2im` is `im2col` (the adjoint of the adjoint is the
original function.

There's no point having explicit derivatives in ATen for these
functions, so this PR deletes all these.

Furthermore, along the way, we fix an error for the derivative of im2col
for non-batched inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85542
Approved by: https://github.com/soulitzer, https://github.com/ngimel
2022-10-03 00:16:42 +00:00
Horace He
e6dd2965af A bunch of coverage improvements (re for models in inference snext50, BERT_pytorch, mobilenet_v3_large, pytorch_CycleGAN_and_pix2pix, dcgan, resnet18, mnasnet1_0) (#86050)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86050
Approved by: https://github.com/ezyang
2022-10-02 20:46:20 +00:00
lezcano
787028cadb Implement col2im decomposition and fix im2col and add a few preconditions (#85541)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541
Approved by: https://github.com/jansel
2022-09-30 09:31:53 +00:00
Elias Ellison
6a2b12dd65 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471
Approved by: https://github.com/ezyang
2022-09-28 23:06:59 +00:00
Animesh Jain
796da4df4d Return contiguous tensor from softmax decomposition (#85788)
Fixes https://github.com/pytorch/torchdynamo/issues/1135

Softmax decomp's output stride does not match with aten softmax output stride. Not sure if its desirable. Opening a PR for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85788
Approved by: https://github.com/ngimel, https://github.com/ezyang
2022-09-28 20:52:45 +00:00
Nikita Karetnikov
8dd45424ea [primTorch] Add ref for huber_loss and error inputs (#85041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-09-28 19:56:17 +00:00
Edward Z. Yang
793488cda2 Revert "Revert "Symintifying slice ops (#85196)"" (#85746)
This reverts commit 3a171dfb0c.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746
Approved by: https://github.com/albanD
2022-09-28 04:37:35 +00:00
PyTorch MergeBot
3a171dfb0c Revert "Symintifying slice ops (#85196)"
This reverts commit 4c01c51266.

Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch
2022-09-27 18:01:27 +00:00
Fabio Rocha
d5ce2bbed2 [primTorch] decompositions for upsample_bicubic2d (#85403)
FYI, this decomposition seems to be significantly slower than the lowering in torchinductor:

```
------------------------------------- upsample_bicubic2d -------------------------------------]
                                                              |  lowering  |  Inductor  |  Eager
32 threads: ------------------------------------------------------------------------------------
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), True)   |    1.8     |   3.880    |   1.4
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), False)  |    1.9     |   3.887    |   1.4
```

This seems related to the fact that in the lowering we can use int32s as the indices and in the decomp we can only use int64s (see https://github.com/pytorch/torchdynamo/issues/1293).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85403
Approved by: https://github.com/ngimel
2022-09-26 20:11:23 +00:00
Elias Ellison
bcc544e9d7 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
Fabio Rocha
ffaff8896a Removed None arg check in test/test_decomp.py (#85402)
Not sure why this check was necessary? Tests seem to run fine without
it.
There were definitely tests this was skipping before that it shouldn't,
e.g., pretty much all of the tests for `torch.nn.functional.interpolate`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85402
Approved by: https://github.com/ezyang
2022-09-24 11:37:27 +00:00
Edward Z. Yang
4c01c51266 Symintifying slice ops (#85196)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85196
Approved by: https://github.com/ezyang
2022-09-23 22:01:32 +00:00
PyTorch MergeBot
d10de31cc8 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 78afa0cf0c.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
PyTorch MergeBot
3b195fd33e Revert "Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)"
This reverts commit 1e92eb8068.

Reverted https://github.com/pytorch/pytorch/pull/85471 on behalf of https://github.com/clee2000 due to stacked prs https://github.com/pytorch/pytorch/pull/85417 and https://github.com/pytorch/pytorch/pull/85434 broke trunk, reverting this so i can revert the others
2022-09-23 17:13:35 +00:00
Elias Ellison
1e92eb8068 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471
Approved by: https://github.com/ezyang
2022-09-23 16:02:15 +00:00
Elias Ellison
78afa0cf0c Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
Ryan Spring
71dddec6ea Cast grad_input to half when input_dtype is half in _softmax_backward_data aten decomposition (#85497)
Fixes #85504

`_softmax_backward_data` and `_log_softmax_backward_data` cast `grad_input` to half when the `input_dtype` is half.
When running with amp without the cast, consumer ops can trigger `RuntimeError: expected scalar type Float but found Half`.

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L70-L83
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L102-L113

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85497
Approved by: https://github.com/ngimel
2022-09-23 06:52:38 +00:00
PyTorch MergeBot
5043457a8e Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 9c77083965.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
Horace He
2f4a517d67 Ported matmul compositeimplicitautograd impl into core (#85239)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239
Approved by: https://github.com/ezyang, https://github.com/lezcano
2022-09-21 09:25:24 +00:00
lezcano
d17b144e65 Adding multigammaln ref and fix arange (#85153)
Partially based on https://github.com/pytorch/pytorch/pull/83662.

I'll help land this one, as Rob does not work in the PyTorch project
anymore

I removed the data-dependent check for the args, as data dependencies
are bad for many reasons (and it was failing when the input has NaNs).

It also registers arange as a decomposition, and fixes the naming of its
args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-09-20 17:52:56 +00:00
lezcano
5dd9610e9d Refs and decompositions for index_{add,copy,select,fill} (#85002)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002
Approved by: https://github.com/ngimel
2022-09-17 19:57:34 +00:00
PyTorch MergeBot
e33b464ffc Revert "Refs and decompositions for index_{add,copy,select,fill} (#85002)"
This reverts commit 2f0b3de443.

Reverted https://github.com/pytorch/pytorch/pull/85002 on behalf of https://github.com/huydhn due to Broke trunk slow tests
2022-09-17 04:26:04 +00:00
lezcano
2f0b3de443 Refs and decompositions for index_{add,copy,select,fill} (#85002)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002
Approved by: https://github.com/ngimel
2022-09-16 23:59:35 +00:00
Sherlock Huang
29eba319b4 Use alias for nop decomp (#84727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84727
Approved by: https://github.com/Chillee
2022-09-16 18:50:56 +00:00
Natalia Gimelshein
6162a04364 fix half_to_float arg in *softmax decomp (#85120)
Fixes https://github.com/pytorch/torchdynamo/issues/1239

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85120
Approved by: https://github.com/Chillee
2022-09-16 15:54:50 +00:00
Horace He
1459a909b4 Added mv, mm, and binary_cross_entropy_with_logits decomps (#84451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84451
Approved by: https://github.com/ngimel
2022-09-08 17:56:18 +00:00
Ivan Yashchuk
6363b1b358 Add nvFuser support for aten.native_batch_norm_backward (#84546)
Replacing `tensor.reshape(broadcast_mask)` with unsqueezes makes the implementation of `batch_norm_backward` more friendly for PrimTorch+nvFuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84546
Approved by: https://github.com/Chillee
2022-09-06 19:56:17 +00:00
Fabio Rocha
91a5f52f51 Decomp for nn.functional.grid_sampler_2d (#84350)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84350
Approved by: https://github.com/jansel, https://github.com/Lezcano
2022-09-05 21:33:26 +00:00
lezcano
3dfbf09afe Optimise the decomposition for adaptive_avg_pool2d wrt. TorchInductor (#84483)
This fixes some part of the implementation that did not work with
TorchInductor (e.g. the indices in TorchInductor need to be `int64`s,
while in PyTorch we can have `int32`s).

It also brings up the performance of the kernel to similar numbers than
those of the lowering (benchmarks below).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84483
Approved by: https://github.com/jansel
2022-09-02 22:25:09 +00:00
Sherlock Huang
ef3ab31f1c Decomp for aten.im2col (#84303)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84303
Approved by: https://github.com/jansel, https://github.com/ngimel
2022-09-01 00:06:35 +00:00
Nikita Karetnikov
71ce9cd072 [primTorch] Add decomp for soft_margin_loss (#83804)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83804
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-31 17:39:34 +00:00
Nikita Shulga
b8e1c54f53 [Prim] Implement group_norm_backward (#84037)
Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus:
```
#!/usr/bin/env python3.8
import torch

func = torch.ops.aten.native_group_norm_backward.default
decomp =  torch._decomp.decomposition_table[func]
for args in (
        (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]),
        (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])):
    nrc=func(*args)
    drc=decomp(*args)
    for i in range(len(nrc)):
       print(i, torch.max(nrc[i]-drc[i]))
    print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc)))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037
Approved by: https://github.com/Chillee, https://github.com/ngimel
2022-08-29 09:29:30 +00:00
Natalia Gimelshein
533203f5aa _to_copy decomp (#84108)
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108
Approved by: https://github.com/Chillee
2022-08-29 02:25:02 +00:00
lezcano
9fc02f6bc5 Decomposition for adaptive_avg_pool2d (#84062)
This was already implemented as a lowering in https://github.com/pytorch/torchdynamo/pull/962. I'm putting the idea up here ~(I haven't even run this code, so it surely has *many* issues, but I reckon the general idea should hopefully be alright).~ The tests now pass and I corrected the issues that the first implementation had.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84062
Approved by: https://github.com/jansel
2022-08-29 01:38:51 +00:00
PyTorch MergeBot
33db5da4c1 Revert "[Prim] Implement group_norm_backward (#84037)"
This reverts commit bed85cce8b.

Reverted https://github.com/pytorch/pytorch/pull/84037 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 17:30:50 +00:00
PyTorch MergeBot
ff23f3ac1c Revert "_to_copy decomp (#84108)"
This reverts commit e33897cb99.

Reverted https://github.com/pytorch/pytorch/pull/84108 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 13:27:49 +00:00
Natalia Gimelshein
e33897cb99 _to_copy decomp (#84108)
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108
Approved by: https://github.com/Chillee
2022-08-27 03:51:03 +00:00
Nikita Shulga
bed85cce8b [Prim] Implement group_norm_backward (#84037)
Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus:
```
#!/usr/bin/env python3.8
import torch

func = torch.ops.aten.native_group_norm_backward.default
decomp =  torch._decomp.decomposition_table[func]
for args in (
        (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]),
        (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])):
    nrc=func(*args)
    drc=decomp(*args)
    for i in range(len(nrc)):
       print(i, torch.max(nrc[i]-drc[i]))
    print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc)))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037
Approved by: https://github.com/Chillee, https://github.com/ngimel
2022-08-27 01:10:27 +00:00
Horace He
9a236c7ab4 Made some minor cleanups to decompositions (#83814)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83814
Approved by: https://github.com/ngimel
2022-08-26 10:55:31 +00:00
Animesh Jain
e2f75d63d4 Decomposition - batch_norm, save_mean and save_variance always float32 (#84013)
AMP error shown here - https://github.com/pytorch/torchdynamo/issues/835

Test missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84013
Approved by: https://github.com/ezyang
2022-08-25 16:09:52 +00:00
Ivan Yashchuk
473b733bae Replace .new_zeros(()) with 0.0 in torch/_decomp/decompositions (#83734)
`new_zeros` is decomposed into `prims.empty_strided`+`prims.fill`+`prims.copy_to` and none of these are supported by prims+nvFuser executor currently.
Replacing it with 0.0 makes these backward decompositions nvFuser friendly.

Example with `torch.ops.aten.hardsigmoid_backward.default`:
```py
# Before this PR
opcode         name                      target                            args                                                          kwargs
-------------  ------------------------  --------------------------------  ------------------------------------------------------------  ----------------------------------------------------------------------------------------
placeholder    a_1                       a_1                               ()                                                            {}
placeholder    g_1                       g_1                               ()                                                            {}
call_function  gt_default                nvprims.gt.default                (a_1, -3.0)                                                   {}
call_function  lt_default                nvprims.lt.default                (a_1, 3.0)                                                    {}
call_function  bitwise_and_default       nvprims.bitwise_and.default       (gt_default, lt_default)                                      {}
call_function  mul_default               nvprims.mul.default               (g_1, 0.16666666666666666)                                    {}
call_function  empty_strided             prims.empty_strided.default       ([], [])                                                      {'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}
call_function  fill_default              prims.fill.default                (empty_strided, 0)                                            {}
call_function  copy_to_default           prims.copy_to.default             (empty_strided, fill_default)                                 {}
call_function  broadcast_in_dim_default  nvprims.broadcast_in_dim.default  (copy_to_default, [3, 2], [])                                 {}
call_function  where_default             nvprims.where.default             (bitwise_and_default, mul_default, broadcast_in_dim_default)  {}
output         output                    output                            (where_default,)                                              {}

# After this PR
opcode         name                 target                       args                                     kwargs
-------------  -------------------  ---------------------------  ---------------------------------------  --------
placeholder    a_1                  a_1                          ()                                       {}
placeholder    g_1                  g_1                          ()                                       {}
call_function  gt_default           nvprims.gt.default           (a_1, -3.0)                              {}
call_function  lt_default           nvprims.lt.default           (a_1, 3.0)                               {}
call_function  bitwise_and_default  nvprims.bitwise_and.default  (gt_default, lt_default)                 {}
call_function  mul_default          nvprims.mul.default          (g_1, 0.16666666666666666)               {}
call_function  where_default        nvprims.where.default        (bitwise_and_default, mul_default, 0.0)  {}
output         output               output                       (where_default,)                         {}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83734
Approved by: https://github.com/Chillee
2022-08-22 09:12:13 +00:00
Edward Z. Yang
02581f053b Address CR comments for "Delete ProxyTensor wrapper subclass" (#83646)
CR is on https://github.com/pytorch/pytorch/pull/83330

- Factor proxy slot getters/setters into helper functions
- Use a weak map for storing proxies, so they go away when
  tracing is done
- More documentation on SymDispatchMode

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83646
Approved by: https://github.com/Chillee
2022-08-18 22:18:09 +00:00
Edward Z. Yang
817a82704f Delete ProxyTensor wrapper subclass (#83330)
I was working on https://github.com/pytorch/torchdynamo/issues/80 and my
working hypothesis for what was causing the error was that proxy tensor
was not advertising correct dispatch keys, causing AMP to operate
differently when you traced.  I could have fixed this directly by
replicating fake tensor's fix for setting dispatch keys to also apply to
proxy tensor, but I was like, "Why must I repeat myself."

This PR is the result.  It completely deletes the ProxyTensor wrapper
subclass, so that when we are tracing, the tensors flowing through the
program are the *original* real or fake tensors, depending on what the
user requested in the top-level API.  There is no more wrapping.  To
store the Proxy objects necessary for actually doing tracing, I store
the property directly on the tensors.  (Note: I never
clean up old entries from the map at the moment, this is easily fixed
by using a weak map)

Benefits of doing this:

* No more tip-toeing around no_dispatch() creation of new ProxyTensors;
  we never create new tensors (except when we call the underlying func),
  so you don't have to worry about accidentally tracing them.

* No more syncing up metadata from in place operators.  In particular
  https://github.com/pytorch/pytorch/issues/81526 is mooted

* This fixes https://github.com/pytorch/torchdynamo/issues/519 as we no longer need to teach proxy tensor to support sparse tensor.

* No more schlepping symbolic integers from the inner fake tensor to the
  outer proxy tensor.  If you can make a fake tensor with symbolic ints,
  you're done, nothing else to do.

To avoid having to rewrite all of the guts, when I get to the actual
proxy tensor handler, I first "fetch" the stored ProxyTensor data from
the weakmap via a tree_map, and then operate on the consequent data as
before.  A more optimized implementation is possible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83330
Approved by: https://github.com/Chillee
2022-08-18 01:56:07 +00:00
Nikita Karetnikov
cd86d25515 [primTorch] Move addcdiv from decompositions -> refs (#80842)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80842
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-16 17:23:00 +00:00
Horace He
f02f304657 Added nll_loss_forward decomposition + some other minor decomps (#83235)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83235
Approved by: https://github.com/ngimel
2022-08-13 10:24:58 +00:00
Natalia Gimelshein
112ec24f09 Fix device behavior for masked_fill (#82737)
Fixes #81018, based on #81036.
It will create graph break for cpu 0d tensor value due to .item() call (we could maybe specialize on that instead of breaking?), but otherwise it would create graph break due to synchronizing `to` call, so there's no way around :-(, and for number `value` argument we already should be specializing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82737
Approved by: https://github.com/Chillee
2022-08-04 15:47:56 +00:00
Brian Hirsh
4a77bee661 prevent python view impls from getting registered to the meta key (#82007)
We don't want to register view ops in python to the `Meta` dispatch key, because doing that prevents us from correctly aliasing storage information. This PR fixes the existing python registrations, and makes it an error to do that in the future. Example:
```
with FakeTensorMode.push() as mode:
    b = torch.ones(2)
    c = b.unsqueeze(-1)
    b_ = StorageWeakRef(b.storage())
    c_ = StorageWeakRef(c.storage())
    print(b_.cdata)
    print(c_.cdata)  # their storages are different (now fixed in this PR)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82007
Approved by: https://github.com/ezyang, https://github.com/eellison
2022-07-27 17:15:05 +00:00
Shangdi Yu
9088757cc6 move aten.native_batch_norm_backward decomposition to core (#81522)
Move  aten.native_batch_norm_backward decomposition from  https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L148.

Changed to not recompute mean and invstd, added type cast.

In fucntorch, changed `@register_decomposition_for(aten.native_batch_norm_backward)` to `@register_decomposition_for_jvp(aten.native_batch_norm_backward)`

Passing `pytest test/test_decomp.py -k norm`

Note that when the output mask is False for grad_weight and grad_bias, we should return None to be consistent with the non-decomposed operator's behavior. But "None" doesn't work with vjp, so the version of decomposition in functorch used zeros. See b33c1f7dd4/functorch/functorch/_src/decompositions.py (L210).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81522
Approved by: https://github.com/Chillee
2022-07-27 06:11:34 +00:00
lezcano
11fe277b62 [PrimTorch] Add reference for torch.norm (#81765)
This ref does more things than `torch.norm`, and it fixes a few bugs
that `torch.norm` has. This implementation and the `torch.norm`
implementation come to terms in the next PR of this stack

We put this PR before, as otherwise `test_decomp.py` was failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81765
Approved by: https://github.com/ngimel
2022-07-25 19:57:21 +00:00
Vivek Khandelwal
cb63ffc553 Add decomposition for aten.upsample_bilinear2d.vec (#80964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80964
Approved by: https://github.com/jansel, https://github.com/Chillee
2022-07-23 02:22:15 +00:00
Huy Do
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
Horace He
a5fb41e3d3 Revert "Revert "Refactored prim utils into _prims_utils folder (#81746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81746
Approved by: https://github.com/anijain2305, https://github.com/Krovatkin
2022-07-20 23:43:57 +00:00
PyTorch MergeBot
e43a02c314 Revert "Refactored prim utils into _prims_utils folder (#81088)"
This reverts commit 80231d0a72.

Reverted https://github.com/pytorch/pytorch/pull/81088 on behalf of https://github.com/jeanschmidt due to breaking internal tests
2022-07-19 19:56:41 +00:00
Horace He
80231d0a72 Refactored prim utils into _prims_utils folder (#81088)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81088
Approved by: https://github.com/ngimel
2022-07-19 03:55:51 +00:00
Natalia Gimelshein
50d205c551 make clamp decomps use torch.* calls, move clamp_min/clamp_max to refs (#81619)
Per title,
@chillee is anything else necessary to remove decomp other than decorating ref with `register_decomposition`?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81619
Approved by: https://github.com/Chillee
2022-07-18 16:52:45 +00:00
Horace He
5139053e02 Fixed the decomposition for embedding_dense_backward (#81528)
No guarantee about the strides of `grad_output`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81528
Approved by: https://github.com/jansel
2022-07-15 17:51:00 +00:00
Edward Z. Yang
fca03eeec1 Make proxy tensor support item() calls on torch.tensor constants (#81192)
This PR is doing a few interrelated things, all of which are necessary to get correctness. Read the comment in torch/fx/experimental/proxy_tensor.py for the high level overview.

Let's break down the parts of this PR:

* Bug fix where `enable_torch_dispatch_mode` with `None` doesn't work. This make `enable_torch_dispatch_mode(current_mode.inner)` work which is the basis for how we temporarily disable fake tensor mode.
* Bug fix for when fake tensor mode is combined with a non-mode tensor subclass. This actually could be ablated from this PR but it affects where the logic for allowing non fake tensor inputs with lift goes, so it's all in here in one go. There are some relevant tests for the fix in fake tensor, but it turns out I didn't need this because I'm always using proxy tensors as a mode (which ensures the ordering is right.)
* New `lift_fresh` view operator.  Note that like lift, we have to manually write the functionalize kernel for these functions.
* The actual change, which is to save constants when we see them in the proxy tensor mode, and then propagate them as we go (because otherwise you'll handle mutations on constants incorrectly--see test.)

This is mildly BC-breaking if anyone was previously interposing on
at::lift, but this operator was relatively new and I checked
functorch which has no explicit reference to lift.  So I think it
should not be too disruptive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81192
Approved by: https://github.com/samdow, https://github.com/bdhirsh
2022-07-15 03:53:40 +00:00
lezcano
b5b9db9f84 Make kl_div a composite function. (#80334)
Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285

Fixes https://github.com/pytorch/pytorch/issues/80158
Fixes https://github.com/pytorch/pytorch/issues/78867
Fixes https://github.com/pytorch/pytorch/issues/69230

Supersedes https://github.com/pytorch/pytorch/pull/79007
Supersedes https://github.com/pytorch/pytorch/pull/69212
Supersedes https://github.com/pytorch/pytorch/pull/19659
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334
Approved by: https://github.com/ezyang
2022-07-13 20:07:36 +00:00
PyTorch MergeBot
f2c8557521 Revert "Make kl_div a composite function. (#80334)"
This reverts commit 828c787ea9.

Reverted https://github.com/pytorch/pytorch/pull/80334 on behalf of https://github.com/ezyang due to doesn't work with xla
2022-07-06 17:51:06 +00:00
lezcano
eb0889cf7d Add support for multiple inputs to out_wrapper and strict dtype checking (#80601)
Reland of https://github.com/pytorch/pytorch/pull/79941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80601
Approved by: https://github.com/albanD
2022-07-05 12:31:21 +00:00
lezcano
828c787ea9 Make kl_div a composite function. (#80334)
Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285

Fixes https://github.com/pytorch/pytorch/issues/80158
Fixes https://github.com/pytorch/pytorch/issues/78867
Fixes https://github.com/pytorch/pytorch/issues/69230

Supersedes https://github.com/pytorch/pytorch/pull/79007
Supersedes https://github.com/pytorch/pytorch/pull/69212
Supersedes https://github.com/pytorch/pytorch/pull/19659
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334
Approved by: https://github.com/ezyang
2022-07-04 19:33:43 +00:00
PyTorch MergeBot
184a065ba7 Revert "Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)"
This reverts commit dc7066a8f0.

Reverted https://github.com/pytorch/pytorch/pull/79941 on behalf of https://github.com/suo due to broke master dc7066a8f0
2022-06-30 03:29:30 +00:00
lezcano
dc7066a8f0 Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)
When a function returns multiple parameters in PyTorch, the `out`
parameter takes a tuple of tensors (see `linalg.svd` for example).
The current implementation in `out_wrapper_multi` modelled this wrong,
as it assumed that it would take a number of different named
parameters.

This PR implements the correct behaviour in `out_wrapper`. As a small
side-effect, we now need to call `@out_wrapper()` when the output is
just one tensor.

This PR also implements an additional optional parameter that checks
whether the dtype of the given `out` is exactly the dtype that the meta
function requires. This is the behaviour that we currently have in
PyTorch, and this check is necessary in eager when we call with these
tensors into external libraries.

We also make the functions with several outputs return a namedtuple,
similar to what we do in PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79941
Approved by: https://github.com/mruberry, https://github.com/ezyang
2022-06-30 02:47:16 +00:00
Horace He
d43e6c9f4a Revert "Revert "formatted _decomp folder with black""
This reverts commit 2027eae67c.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79226

Approved by: https://github.com/Krovatkin
2022-06-22 20:47:52 +00:00
Horace He
4193252de9 Revert "Revert "Added kl_div_backward decomp""
This reverts commit 60a13f4ec9.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79225

Approved by: https://github.com/Krovatkin
2022-06-22 18:09:52 +00:00
Horace He
e89676f76c fix logical_not reland issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79900

Approved by: https://github.com/ngimel
2022-06-21 03:41:18 +00:00
Nikita Shulga
f5eb05f107 Revert "Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads""
This reverts commit f3665dd237.

Reverted https://github.com/pytorch/pytorch/pull/79819 on behalf of https://github.com/malfet due to land raced with softshrink refs
2022-06-20 14:22:15 -07:00
Horace He
f3665dd237 Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79819

Approved by: https://github.com/mruberry
2022-06-20 19:50:43 +00:00
lezcano
16f30b494c Make l1_loss composite
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.

As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants  for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79804

Approved by: https://github.com/zou3519, https://github.com/malfet
2022-06-20 19:10:54 +00:00
Jason Ansel
d2e18606e7 Fix view issue in embedding_dense_backward decomp (#79857)
I was hitting:
```
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 66, in proxy_call
    return CURRENT_DECOMPOSITION_TABLE[func_overload](*args, **kwargs)
  File "/home/jansel/pytorch/torch/_decomp/decompositions.py", line 801, in embedding_dense_backward
    indices_rank1 = indices.view(numel)
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 122, in __torch_dispatch__
    return proxy_call(func_overload, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 86, in proxy_call
    real_out = func_overload(*args, **kwargs)
  File "/home/jansel/pytorch/torch/_ops.py", line 49, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79857
Approved by: https://github.com/Chillee
2022-06-20 17:58:14 +00:00
PyTorch MergeBot
d4a9438786 Revert "Make l1_loss composite"
This reverts commit 61a5c779bf.

Reverted https://github.com/pytorch/pytorch/pull/78257 on behalf of https://github.com/malfet due to This breaks executorch
2022-06-17 18:14:21 +00:00
Ivan Yashchuk
bc1fef96af Reference implementations for rsqrt and native_layer_norm (#79413)
This PR adds references for:
- `torch.rsqrt`
- `torch.native_layer_norm`
-  `torch.nn.functional.layer_norm`

`native_layer_norm` had a different number of dimensions if the input was 0-sized. I fixed that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79413
Approved by: https://github.com/mruberry, https://github.com/Chillee
2022-06-17 07:24:02 +00:00
Jason Ansel
c8fb02b452 Use amax instead of max for softmax decomps (#79667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79667
Approved by: https://github.com/Chillee
2022-06-16 04:09:33 +00:00
lezcano
61a5c779bf Make l1_loss composite
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.

As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants  for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78257

Approved by: https://github.com/jbschlosser
2022-06-16 00:03:22 +00:00
PyTorch MergeBot
fefff54cad Revert "Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"""
This reverts commit a2d2981e8e.

Reverted https://github.com/pytorch/pytorch/pull/79224 on behalf of https://github.com/suo due to broke lots of things a2d2981e8e
2022-06-10 04:40:43 +00:00
Horace He
a2d2981e8e Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads""
This reverts commit d67309aefb.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79224

Approved by: https://github.com/mruberry
2022-06-10 03:07:14 +00:00
PyTorch MergeBot
d67309aefb Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"
This reverts commit 64b6bd8c1e.

Reverted https://github.com/pytorch/pytorch/pull/79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000
2022-06-09 13:11:23 +00:00
PyTorch MergeBot
60a13f4ec9 Revert "Added kl_div_backward decomp"
This reverts commit a08685ebc9.

Reverted https://github.com/pytorch/pytorch/pull/79001 on behalf of https://github.com/malfet due to PR failed in newly added tests, see https://hud.pytorch.org/pr/79001
2022-06-09 13:08:30 +00:00
PyTorch MergeBot
2027eae67c Revert "formatted _decomp folder with black"
This reverts commit 4945c72151.

Reverted https://github.com/pytorch/pytorch/pull/79002 on behalf of https://github.com/janeyx99 due to Broke decomp tests on trunk + also on PR https://hud.pytorch.org/minihud#4945c72151e29cb524974e1714654cf790ddb37d
2022-06-09 12:58:03 +00:00
Horace He
4945c72151 formatted _decomp folder with black
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79002

Approved by: https://github.com/ezyang
2022-06-09 07:16:37 +00:00
Horace He
a08685ebc9 Added kl_div_backward decomp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79001

Approved by: https://github.com/ezyang
2022-06-09 07:16:37 +00:00