Commit Graph

350 Commits

Author SHA1 Message Date
PyTorch MergeBot
2891cecd8d Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608)"
This reverts commit 4386f317b9.

Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ZainRizvi due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (__main__.TestEagerFusionOpInfoCPU) is failing consistently since this PR was merged
2023-01-20 17:17:35 +00:00
Tugsbayasgalan Manlaibaatar
4386f317b9 Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608
Approved by: https://github.com/ngimel
2023-01-20 12:39:56 +00:00
lezcano
8b861544f9 Remove lowering and decompositions of zero_, zero, zeros_like... in favour of their references (#92071)
The generated triton code is identical.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92071
Approved by: https://github.com/ngimel
2023-01-18 23:22:36 +00:00
Peter Bell
8770a7ed6f Decompose more inplace ops (#90967)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90967
Approved by: https://github.com/anijain2305
2023-01-18 21:07:47 +00:00
Peter Bell
4058dedf21 Replace log(1 + x) with log1p(x) (#92114)
`log1p` offers better precision near zero since `(1 + x) - 1` truncates any
values less than the float epsilon to zero. For `soft_margin_loss` this also
requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup
on CUDA and a 1.1x speedup on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-01-18 10:43:56 +00:00
lezcano
da58f9eb8f Rewrite out-of-place decompositions in terms of out-of-place ops (#92003)
Fixes https://github.com/pytorch/torchdynamo/issues/1863

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92003
Approved by: https://github.com/ngimel
2023-01-17 16:53:27 +00:00
vfdev-5
5f55335c2e Fixed output memory format mismatch for bicubic2d (#90470)
Description:

- output memory format is matching input for bicubic2d

Problem: output tensor's memory format does not match input format for bicubic2d

```python
import torch

i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"

> AssertionError: Should be channels last but given channels first (True)
```

Related PR fixing bilinear ops: https://github.com/pytorch/pytorch/pull/53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh )

Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev

- Updated code to match grad input / output memory formats
- temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl`
- Updated tests
- Added missing forward AD support for bicubic with antialiasing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90470
Approved by: https://github.com/NicolasHug, https://github.com/lezcano
2023-01-12 19:52:28 +00:00
min-jean-cho
af242eedfb [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-11 23:23:42 +00:00
David Berard
d7dc1c2fd5 Support zero dimensions in softmax decompositions (#91322)
The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including:
* decompositions & refs (this was causing dynamo failures)
* forward AD for logsumexp
* MPS log_softmax_backward

This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos.

example of "computation along zero dimensions":

```python
# example of where
import torch

t = torch.rand((4, 0, 0))
print("~")
print(torch.nn.functional.softmax(t, dim=-1))  # this passes
print("~")
torch._refs.softmax(t, dim=-1)  # this fails
print("~")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322
Approved by: https://github.com/lezcano
2023-01-11 09:35:43 +00:00
XiaobingSuper
3790b50505 inductor: fix .to(memort_format) issue which doesn't generate right stride (#91948)
Motivation: for **.to(memory_format),** the inductor doesn't generate the right stride, see the following example:
```
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, x):
        x = x.to(memory_format=torch.contiguous_format)
        return x
```

the generated code doesn't do the memory format change and gets a wrong stride **(802816, 1, 14336, 256)**, it is not a contiguous stride.

```
from ctypes import c_void_p, c_long
import torch
import random
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, = args
    args.clear()
    return (arg0_1, )

if __name__ == "__main__":
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32)
    print_performance(lambda: call([arg0_1]))
```

After this PR, the will have a memory format change:

```
from ctypes import c_void_p, c_long
import torch
import random
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

kernel_cpp_0 = async_compile.cpp('''
#include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
extern "C" void kernel(const float* __restrict__ in_ptr0,
                       float* __restrict__ out_ptr0)
{
    #pragma omp parallel num_threads(40)
    {
        {
            #pragma omp for
            for(long i0=0; i0<128; i0+=1)
            {
                #pragma GCC ivdep
                for(long i1=0; i1<256; i1+=1)
                {
                    #pragma GCC ivdep
                    for(long i2=0; i2<3136; i2+=1)
                    {
                        auto tmp0 = in_ptr0[i1 + (256*i2) + (802816*i0)];
                        out_ptr0[i2 + (3136*i1) + (802816*i0)] = tmp0;
                    }
                }
            }
        }
    }
}
''')

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, = args
    args.clear()
    buf1 = empty_strided((128, 256, 56, 56), (802816, 3136, 56, 1), device='cpu', dtype=torch.float32)
    kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
    del arg0_1
    return (buf1, )

if __name__ == "__main__":
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((128, 256, 56, 56), (802816, 1, 14336, 256), device='cpu', dtype=torch.float32)
    print_performance(lambda: call([arg0_1]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91948
Approved by: https://github.com/ngimel
2023-01-11 08:23:26 +00:00
min-jean-cho
364f526b9c [Inductor] assert generator for random, dropout (#91833)
See comment https://github.com/pytorch/pytorch/pull/90869#discussion_r1063731541 , https://github.com/pytorch/pytorch/pull/91673#discussion_r1061099337.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91833
Approved by: https://github.com/jansel
2023-01-11 03:24:10 +00:00
PyTorch MergeBot
43050b8301 Revert "[Inductor] Added aten.uniform_ decomp (#90869)"
This reverts commit c55293d640.

Reverted https://github.com/pytorch/pytorch/pull/90869 on behalf of https://github.com/huydhn due to Crossref error cannot just simply be ignored because it would break trunk for every commits after this, i.e. fd0030fe74.  The failure would need to be handled gracefully, i.e. adding an XFAIL for example
2023-01-11 01:18:11 +00:00
min-jean-cho
c55293d640 [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-10 23:05:01 +00:00
Nikita Karetnikov
00e5f3a9c5 [primTorch] Move logsumexp decomp to refs (#91860)
Fixes #91843.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91860
Approved by: https://github.com/lezcano
2023-01-09 17:00:43 +00:00
Natalia Gimelshein
2c00064113 remove unnecessary decomps (#91828)
in favor of refs. Generated triton code is the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91828
Approved by: https://github.com/lezcano, https://github.com/soumith
2023-01-07 20:37:12 +00:00
PyTorch MergeBot
c73147f741 Revert "[decomp] Use new squeeze.dims overload in decompositions (#91602)"
This reverts commit 9262ffc692.

Reverted https://github.com/pytorch/pytorch/pull/91602 on behalf of https://github.com/clee2000 due to stacked pr was reverted, this is dependent
2023-01-05 20:39:52 +00:00
Peter Bell
9262ffc692 [decomp] Use new squeeze.dims overload in decompositions (#91602)
This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602
Approved by: https://github.com/ngimel
2023-01-05 17:59:32 +00:00
lezcano
484dd40022 Implement PReLU in a compositional way (#91238)
The PReLU implementation was all over the place. This lead to a number
of bugs like https://github.com/pytorch/pytorch/issues/68760.  We fix it by:
- Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel
- This second kernel is just a good-ol' pointwise kernel.
- We implement the derivative for the pointwise kernel via TI as well for speed.
- We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally

This fixes a number of issues:
- We don't perform copies any more when the inputs are not contiguous
- The derivatives are now correct
- We fix vmap and many other functorch-related issues.
- CPU and CUDA now share the relevant broadcasting logic
- The implementation is about 1/3 the length.

Fixes https://github.com/pytorch/pytorch/issues/68760
Fixes https://github.com/pytorch/pytorch/issues/89895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238
Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD
2022-12-30 10:42:30 +00:00
Joel Schlosser
8b55b86dbd Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317)
This PR moves the definitions for:
* `sym_int`
* `sym_ceil` (used only for `sym_int`)
* `sym_floor` (used only for `sym_int`)
* `sym_float`

from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined.

This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2022-12-28 16:08:16 +00:00
Joel Schlosser
1c40ec46ff Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260)
Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260
Approved by: https://github.com/ezyang
2022-12-28 16:03:25 +00:00
Nikita Shulga
fd3a7264ae [MPS] Add group_norm[fwd+backward] and mean_var (take 2) (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private

Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-22 08:54:37 +00:00
PyTorch MergeBot
645eda0a00 Revert "[MPS] Add group_norm[fwd+backward] and mean_var (#91190)"
This reverts commit 371716eb36.

Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops
2022-12-21 19:37:43 +00:00
Nikita Shulga
371716eb36 [MPS] Add group_norm[fwd+backward] and mean_var (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Depends on https://github.com/pytorch/pytorch/pull/91203

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-21 17:33:27 +00:00
Nikita Shulga
46f64117db [BE] Use aten global var (#91188)
s/torch.ops.aten/aten/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91188
Approved by: https://github.com/ngimel
2022-12-21 02:28:51 +00:00
Peter Bell
e670c261c5 Decompose fill, zero, and zeros_like (#90968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90968
Approved by: https://github.com/ngimel
2022-12-21 00:59:50 +00:00
Natalia Gimelshein
e689c50922 Don't recompute var in bn decomp (#90984)
Fixes https://github.com/pytorch/torchdynamo/issues/1988
Repeated `var` computation is not CSE'd for some reason.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90984
Approved by: https://github.com/Chillee
2022-12-16 21:38:49 +00:00
Brian Hirsh
7a683eaeb8 aot_autograd: add assert for functional-only graph (#88816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-12-16 21:04:36 +00:00
soulitzer
98a9235dce Fix prelu ref when a.ndim < 2 (#89809)
Fixes https://github.com/pytorch/pytorch/issues/89560

Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures:
- forward AD (fixed in this PR)
- vmap (filed https://github.com/pytorch/pytorch/issues/89895)
- ref/meta (fixed this PR, though this also regresses nvFuser support)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809
Approved by: https://github.com/ngimel
2022-12-12 23:55:31 +00:00
Bin Bao
282dfe8ba4 [inductor][Reland] Use decomposition for _to_copy (#90494)
Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90494
Approved by: https://github.com/ngimel
2022-12-09 16:51:50 +00:00
PyTorch MergeBot
e89685b0b5 Revert "[inductor] Use decomposition for _to_copy (#90314)"
This reverts commit 3fdb5f2dda.

Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert
2022-12-08 18:29:06 +00:00
Bin Bao
3fdb5f2dda [inductor] Use decomposition for _to_copy (#90314)
Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314
Approved by: https://github.com/ngimel
2022-12-08 15:25:44 +00:00
Peter Bell
e6a7278753 Give std/var correction overloads proper defaults (#56398)
The correction overloads defaults were left off for forward
compatibility reasons, but this FC window expired well over a year ago
at this point.

Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398
Approved by: https://github.com/mruberry
2022-12-07 15:15:00 +00:00
Yanbo Liang
25f39c1bce Fix uniform ref implementation (#90094)
Fixes https://github.com/pytorch/torchdynamo/issues/1954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094
Approved by: https://github.com/ngimel
2022-12-06 21:28:17 +00:00
Animesh Jain
c1950620c5 [decomp] Fix native_batch_norm_backward dtype of dweight and dbias (#89740)
Discovered while debugging an accuracy issue for Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89740
Approved by: https://github.com/soumith, https://github.com/ngimel
2022-11-29 03:15:20 +00:00
Brian Hirsh
e20ec44544 fixes for inductor <> batch norm (#89603)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89603
Approved by: https://github.com/albanD
2022-11-29 02:16:52 +00:00
Jane Xu
8695f0cced Rectify native_batch_norm schema by splitting it into two legit schemas (#88697)
Using the same repro from the issue (but with BatchNorm2D)

Rectifies native_batch_norm schema by splitting the schema into 2:
1. one will have NON-optional alias-able running_mean and running_var inputs
2. the other will just not have those parameters at all (no_stats variation)

**Calling for name suggestions!**

## test plan
I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
CI should pass.

## next steps
Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
Approved by: https://github.com/albanD
2022-11-23 23:23:17 +00:00
Elias Ellison
a8d6b82167 Fix norm decomp when dtype is passed in (#89508)
Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
Approved by: https://github.com/anijain2305
2022-11-23 20:49:09 +00:00
Elias Ellison
72110d7833 Fix Upsample Decomp Striding For Small Channels (#89528)
Fix for https://github.com/pytorch/torchdynamo/issues/623.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
Approved by: https://github.com/ngimel, https://github.com/anijain2305
2022-11-23 20:47:39 +00:00
lezcano
154e58c032 Add most in-place references/decompositions (#88117)
We add most in-place references in a generic way. We also implement a
wrapper to implement the annoying interface that `nn.functional`
nonlinearities have.

We fix along the way a couple decompositions for some non-linearities by
extending the arguments that the references have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117
Approved by: https://github.com/mruberry
2022-11-18 14:59:46 +00:00
lezcano
3320915303 Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204)
See the title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204
Approved by: https://github.com/Chillee
2022-11-16 17:46:54 +00:00
Sherlock Huang
5faa2792fa Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-15 13:34:45 +00:00
PyTorch MergeBot
eea506aee1 Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)"
This reverts commit 9eabcc370f.

Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken 9eabcc370f
2022-11-14 01:58:47 +00:00
Sherlock Huang
9eabcc370f Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-13 21:30:53 +00:00
Horace He
37c5b42fa6 Fix matmul decomp to use reshape instead of contiguous().view() (#88832)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832
Approved by: https://github.com/bertmaher, https://github.com/ngimel
2022-11-12 00:15:42 +00:00
Ryan Spring
534ae6ae47 [primTorch] Implement group norm reference (#87054)
Add group norm reference
Split from #81191
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87054
Approved by: https://github.com/mruberry
2022-11-11 01:08:20 +00:00
Sherlock Huang
c00c34fb69 Fix meta for aten.upsample_bilinear2d.vec (#88158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158
Approved by: https://github.com/ngimel
2022-11-02 16:58:29 +00:00
Sherlock Huang
de1f641f11 Fix meta function for aten.addmm (#88068)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068
Approved by: https://github.com/albanD
2022-11-01 17:05:48 +00:00
lezcano
fd27246c16 Fix decomposition for std (#87181)
The previous implementation was lacking a few features and incurred on a
pretty large error

cc @ezyang @mruberry @ngimel @Lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87181
Approved by: https://github.com/ngimel, https://github.com/peterbell10
2022-10-28 00:50:29 +00:00
Sherlock Huang
eb99c1efce Prefer python meta function over c++ meta function (#87426)
This is a policy update for meta registration. **We now prefer python meta implementation over C++ meta function.**  This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist.

Here's the meta registration process:
1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`.  However, they will NOT register them into dispatcher.
2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd.
3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases
- 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta
- 2. the op is a view op, as the MetaTensor doesn't support aliased storage
- 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op)

Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5

cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-10-25 16:49:02 +00:00
Ryan Spring
9bb4926de0 Add xlogy and xlog1py references (#77712)
* Add reference implementations for `xlogy` and `xlog1py`
 * Replace `_wrap_scalar` helper function with `scalar_tensor` prim
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77712
Approved by: https://github.com/mruberry
2022-10-22 17:59:25 +00:00
Edward Z. Yang
d73d4aa7de Audit for error prone isinstance int/float and add lint (#87345)
We recently fixed a bug on symbolic-shapes branch where
an isinstance(x, int) test failed when passed a SymIntNode.
To prevent this, I've added a lint for all the codepaths
where we may pass SymInt/SymFloat directly to reject
direct isinstance int/float tests, and instead use one of
the aliases.  The lint rule explains the options.  I then
go and fix all of them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-10-21 15:55:24 +00:00
Sherlock Huang
f7da9db9c1 Unify decomp registries into global_decomposition_table (#86857)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857
Approved by: https://github.com/ezyang
2022-10-20 21:29:05 +00:00
Sherlock Huang
ef045695e0 Fix decomp for huber_loss_backward (#86955)
Fixes https://github.com/pytorch/pytorch/issues/86846

aten.huber_loss_backward calls aten.huber_loss_backward.out in its CompositeExplicitAutograd kernel.
The decomp was mistaken registered for both aten.huber_loss_backward.default and aten.huber_loss_backward.out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86955
Approved by: https://github.com/Chillee
2022-10-14 18:53:02 +00:00
Nikita Karetnikov
4460e40db4 [primTorch] Add a ref for addcmul (#86731)
Based on:
https://github.com/pytorch/pytorch/pull/79827
https://github.com/pytorch/pytorch/pull/72949
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86731
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-10-14 14:26:23 +00:00
Brian Hirsh
e17732b234 [test] add cross-ref tests for python meta kernels (#86228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86228
Approved by: https://github.com/albanD
2022-10-13 14:14:26 +00:00
Elias Ellison
d3f7c34cb3 Enable aten-aten decomps (#85921)
Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921
Approved by: https://github.com/ezyang
2022-10-08 05:12:42 +00:00
PyTorch MergeBot
7ec12a559c Revert "Enable aten-aten decomps (#85921)"
This reverts commit 62e4f51efd.

Reverted https://github.com/pytorch/pytorch/pull/85921 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. I think it breaks a dynamo test in trunk 62e4f51efd
2022-10-08 01:59:54 +00:00
Elias Ellison
62e4f51efd Enable aten-aten decomps (#85921)
Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921
Approved by: https://github.com/ezyang
2022-10-07 21:04:39 +00:00
lezcano
28a0b3fb18 Fix col2im and im2col decompositions (#86426)
I threw in some tests for good measure.

Fixes https://github.com/pytorch/pytorch/issues/86332
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86426
Approved by: https://github.com/ngimel
2022-10-07 08:14:06 +00:00
Elias Ellison
9ceadcadb2 Fix unfold backward decomp aliasing for 0 dim input (#86428)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86428
Approved by: https://github.com/ngimel, https://github.com/ezyang
2022-10-07 03:55:31 +00:00
lezcano
b67e022833 Fix ref / decomposition index_add (#86266)
The decomposition of `index_add` was using `slice(None)`, when it should
use just `None`.

The reference for index_add was also wrong, as `x[idx] += t` does not
use atomic add, so it does not work when several `idx`s point to the
same location.

This PR adds extra reference inputs to help test for this.

Fixes https://github.com/pytorch/torchdynamo/issues/1356
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86266
Approved by: https://github.com/ngimel
2022-10-05 19:59:15 +00:00
lezcano
c609768896 Add refs for torch.unfold and a decomposition for its backward. (#85629)
It's not clear to me what's the difference between `unfold` and `unfold_copy`, as this latter one is codegen'd

I also took this chance to clean the implementation of unfold and its reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85629
Approved by: https://github.com/mruberry
2022-10-05 12:15:49 +00:00
Edward Z. Yang
d07b85393a SymInt fixes from symbolic-shapes branch (#86242)
symintify a few inplace meta functions

symintify resize_(), nbytes(), functionalization input mutations

meta funcs for avg_pool2d_backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86242
Approved by: https://github.com/Chillee
2022-10-05 04:52:02 +00:00
Peter Bell
b317736c39 Fix default correction value in std/var decompositions (#85839)
`torch.std` and `torch.var` default to the unbiased estimator, i.e.
`correction=1`. This only works as is because the default on this
overload is not exercised by the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85839
Approved by: https://github.com/ezyang
2022-10-04 23:23:39 +00:00
Horace He
82d9592f1b Batch of symintifications to allow more models to pass in inference (#86104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104
Approved by: https://github.com/ezyang
2022-10-04 04:01:58 +00:00
Edward Z. Yang
f3d7ab5438 Unconditionally register Python decomps to Meta key in Python Dispatcher (#85750)
This makes them available for Python Dispatcher to service them when
symbolic shapes are involved.  This is needed because under certain
conditions, functionalization will directly call the Meta kernel for a
function in order to produce a properly sized output wrapper tensor
for a view operation. This direct call bypasses the normal decomposition
table mechanism.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85750
Approved by: https://github.com/wconstab
2022-10-03 22:49:25 +00:00
Horace He
37013bb443 Added _unsafe_view decomp (#86103)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86103
Approved by: https://github.com/ezyang
2022-10-03 20:38:31 +00:00
lezcano
07ce0b435b Remove backward for im2col and col2im (#85542)
`im2col` is a linear map, and `col2im` is its adjoint. As such, the
adjoint to `col2im` is `im2col` (the adjoint of the adjoint is the
original function.

There's no point having explicit derivatives in ATen for these
functions, so this PR deletes all these.

Furthermore, along the way, we fix an error for the derivative of im2col
for non-batched inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85542
Approved by: https://github.com/soulitzer, https://github.com/ngimel
2022-10-03 00:16:42 +00:00
Horace He
e6dd2965af A bunch of coverage improvements (re for models in inference snext50, BERT_pytorch, mobilenet_v3_large, pytorch_CycleGAN_and_pix2pix, dcgan, resnet18, mnasnet1_0) (#86050)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86050
Approved by: https://github.com/ezyang
2022-10-02 20:46:20 +00:00
lezcano
787028cadb Implement col2im decomposition and fix im2col and add a few preconditions (#85541)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541
Approved by: https://github.com/jansel
2022-09-30 09:31:53 +00:00
Elias Ellison
6a2b12dd65 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471
Approved by: https://github.com/ezyang
2022-09-28 23:06:59 +00:00
Animesh Jain
796da4df4d Return contiguous tensor from softmax decomposition (#85788)
Fixes https://github.com/pytorch/torchdynamo/issues/1135

Softmax decomp's output stride does not match with aten softmax output stride. Not sure if its desirable. Opening a PR for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85788
Approved by: https://github.com/ngimel, https://github.com/ezyang
2022-09-28 20:52:45 +00:00
Nikita Karetnikov
8dd45424ea [primTorch] Add ref for huber_loss and error inputs (#85041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-09-28 19:56:17 +00:00
Edward Z. Yang
793488cda2 Revert "Revert "Symintifying slice ops (#85196)"" (#85746)
This reverts commit 3a171dfb0c.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746
Approved by: https://github.com/albanD
2022-09-28 04:37:35 +00:00
PyTorch MergeBot
3a171dfb0c Revert "Symintifying slice ops (#85196)"
This reverts commit 4c01c51266.

Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch
2022-09-27 18:01:27 +00:00
Fabio Rocha
d5ce2bbed2 [primTorch] decompositions for upsample_bicubic2d (#85403)
FYI, this decomposition seems to be significantly slower than the lowering in torchinductor:

```
------------------------------------- upsample_bicubic2d -------------------------------------]
                                                              |  lowering  |  Inductor  |  Eager
32 threads: ------------------------------------------------------------------------------------
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), True)   |    1.8     |   3.880    |   1.4
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), False)  |    1.9     |   3.887    |   1.4
```

This seems related to the fact that in the lowering we can use int32s as the indices and in the decomp we can only use int64s (see https://github.com/pytorch/torchdynamo/issues/1293).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85403
Approved by: https://github.com/ngimel
2022-09-26 20:11:23 +00:00
Elias Ellison
bcc544e9d7 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
Fabio Rocha
ffaff8896a Removed None arg check in test/test_decomp.py (#85402)
Not sure why this check was necessary? Tests seem to run fine without
it.
There were definitely tests this was skipping before that it shouldn't,
e.g., pretty much all of the tests for `torch.nn.functional.interpolate`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85402
Approved by: https://github.com/ezyang
2022-09-24 11:37:27 +00:00
Edward Z. Yang
4c01c51266 Symintifying slice ops (#85196)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85196
Approved by: https://github.com/ezyang
2022-09-23 22:01:32 +00:00
PyTorch MergeBot
d10de31cc8 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 78afa0cf0c.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
PyTorch MergeBot
3b195fd33e Revert "Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)"
This reverts commit 1e92eb8068.

Reverted https://github.com/pytorch/pytorch/pull/85471 on behalf of https://github.com/clee2000 due to stacked prs https://github.com/pytorch/pytorch/pull/85417 and https://github.com/pytorch/pytorch/pull/85434 broke trunk, reverting this so i can revert the others
2022-09-23 17:13:35 +00:00
Elias Ellison
1e92eb8068 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471
Approved by: https://github.com/ezyang
2022-09-23 16:02:15 +00:00
Elias Ellison
78afa0cf0c Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
Ryan Spring
71dddec6ea Cast grad_input to half when input_dtype is half in _softmax_backward_data aten decomposition (#85497)
Fixes #85504

`_softmax_backward_data` and `_log_softmax_backward_data` cast `grad_input` to half when the `input_dtype` is half.
When running with amp without the cast, consumer ops can trigger `RuntimeError: expected scalar type Float but found Half`.

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L70-L83
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L102-L113

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85497
Approved by: https://github.com/ngimel
2022-09-23 06:52:38 +00:00
PyTorch MergeBot
5043457a8e Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 9c77083965.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
Horace He
2f4a517d67 Ported matmul compositeimplicitautograd impl into core (#85239)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239
Approved by: https://github.com/ezyang, https://github.com/lezcano
2022-09-21 09:25:24 +00:00
lezcano
d17b144e65 Adding multigammaln ref and fix arange (#85153)
Partially based on https://github.com/pytorch/pytorch/pull/83662.

I'll help land this one, as Rob does not work in the PyTorch project
anymore

I removed the data-dependent check for the args, as data dependencies
are bad for many reasons (and it was failing when the input has NaNs).

It also registers arange as a decomposition, and fixes the naming of its
args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-09-20 17:52:56 +00:00
lezcano
5dd9610e9d Refs and decompositions for index_{add,copy,select,fill} (#85002)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002
Approved by: https://github.com/ngimel
2022-09-17 19:57:34 +00:00
PyTorch MergeBot
e33b464ffc Revert "Refs and decompositions for index_{add,copy,select,fill} (#85002)"
This reverts commit 2f0b3de443.

Reverted https://github.com/pytorch/pytorch/pull/85002 on behalf of https://github.com/huydhn due to Broke trunk slow tests
2022-09-17 04:26:04 +00:00
lezcano
2f0b3de443 Refs and decompositions for index_{add,copy,select,fill} (#85002)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002
Approved by: https://github.com/ngimel
2022-09-16 23:59:35 +00:00
Sherlock Huang
29eba319b4 Use alias for nop decomp (#84727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84727
Approved by: https://github.com/Chillee
2022-09-16 18:50:56 +00:00
Natalia Gimelshein
6162a04364 fix half_to_float arg in *softmax decomp (#85120)
Fixes https://github.com/pytorch/torchdynamo/issues/1239

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85120
Approved by: https://github.com/Chillee
2022-09-16 15:54:50 +00:00
soulitzer
7f88934a8f [reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976)
Reland of https://github.com/pytorch/pytorch/pull/84675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84976
Approved by: https://github.com/zou3519
2022-09-15 22:46:19 +00:00
PyTorch MergeBot
36d79143ce Revert "[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)"
This reverts commit bb4e96c964.

Reverted https://github.com/pytorch/pytorch/pull/84675 on behalf of https://github.com/osalpekar due to causing asan xplat link-time errors like ld.lld: error: undefined symbol: torch::jit::has_jit_decomposition(c10::FunctionSchema const&)
2022-09-13 22:54:54 +00:00
soulitzer
bb4e96c964 [reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)
This reverts commit acb4a09628.

In addition, we also fix a memory leak in layer norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84675
Approved by: https://github.com/zou3519
2022-09-12 20:33:14 +00:00
Horace He
1459a909b4 Added mv, mm, and binary_cross_entropy_with_logits decomps (#84451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84451
Approved by: https://github.com/ngimel
2022-09-08 17:56:18 +00:00
soulitzer
e31ad1c2d3 [reland] Move decompositions and helpers for jvp from functorch into core (#84581)
Reland of https://github.com/pytorch/pytorch/pull/84358
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84581
Approved by: https://github.com/samdow
2022-09-07 15:31:46 +00:00
Ivan Yashchuk
6363b1b358 Add nvFuser support for aten.native_batch_norm_backward (#84546)
Replacing `tensor.reshape(broadcast_mask)` with unsqueezes makes the implementation of `batch_norm_backward` more friendly for PrimTorch+nvFuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84546
Approved by: https://github.com/Chillee
2022-09-06 19:56:17 +00:00
Fabio Rocha
91a5f52f51 Decomp for nn.functional.grid_sampler_2d (#84350)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84350
Approved by: https://github.com/jansel, https://github.com/Lezcano
2022-09-05 21:33:26 +00:00
lezcano
3dfbf09afe Optimise the decomposition for adaptive_avg_pool2d wrt. TorchInductor (#84483)
This fixes some part of the implementation that did not work with
TorchInductor (e.g. the indices in TorchInductor need to be `int64`s,
while in PyTorch we can have `int32`s).

It also brings up the performance of the kernel to similar numbers than
those of the lowering (benchmarks below).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84483
Approved by: https://github.com/jansel
2022-09-02 22:25:09 +00:00
PyTorch MergeBot
375d6cd5b7 Revert "Move decompositions and helpers for jvp from functorch into core (#84358)"
This reverts commit a3c60a4db4.

Reverted https://github.com/pytorch/pytorch/pull/84358 on behalf of https://github.com/malfet due to Broke lint
2022-09-01 23:42:48 +00:00
soulitzer
a3c60a4db4 Move decompositions and helpers for jvp from functorch into core (#84358)
This refactor shouldn't change any behavior. At this point functorch still relies on the mechanism in DynamicLayerFront; we just moved some parts of it into core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84358
Approved by: https://github.com/samdow
2022-09-01 22:39:15 +00:00
Sherlock Huang
ef3ab31f1c Decomp for aten.im2col (#84303)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84303
Approved by: https://github.com/jansel, https://github.com/ngimel
2022-09-01 00:06:35 +00:00
Nikita Karetnikov
71ce9cd072 [primTorch] Add decomp for soft_margin_loss (#83804)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83804
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-31 17:39:34 +00:00
Nikita Shulga
b8e1c54f53 [Prim] Implement group_norm_backward (#84037)
Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus:
```
#!/usr/bin/env python3.8
import torch

func = torch.ops.aten.native_group_norm_backward.default
decomp =  torch._decomp.decomposition_table[func]
for args in (
        (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]),
        (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])):
    nrc=func(*args)
    drc=decomp(*args)
    for i in range(len(nrc)):
       print(i, torch.max(nrc[i]-drc[i]))
    print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc)))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037
Approved by: https://github.com/Chillee, https://github.com/ngimel
2022-08-29 09:29:30 +00:00
Natalia Gimelshein
533203f5aa _to_copy decomp (#84108)
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108
Approved by: https://github.com/Chillee
2022-08-29 02:25:02 +00:00
lezcano
9fc02f6bc5 Decomposition for adaptive_avg_pool2d (#84062)
This was already implemented as a lowering in https://github.com/pytorch/torchdynamo/pull/962. I'm putting the idea up here ~(I haven't even run this code, so it surely has *many* issues, but I reckon the general idea should hopefully be alright).~ The tests now pass and I corrected the issues that the first implementation had.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84062
Approved by: https://github.com/jansel
2022-08-29 01:38:51 +00:00
PyTorch MergeBot
33db5da4c1 Revert "[Prim] Implement group_norm_backward (#84037)"
This reverts commit bed85cce8b.

Reverted https://github.com/pytorch/pytorch/pull/84037 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 17:30:50 +00:00
PyTorch MergeBot
ff23f3ac1c Revert "_to_copy decomp (#84108)"
This reverts commit e33897cb99.

Reverted https://github.com/pytorch/pytorch/pull/84108 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 13:27:49 +00:00
Natalia Gimelshein
e33897cb99 _to_copy decomp (#84108)
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108
Approved by: https://github.com/Chillee
2022-08-27 03:51:03 +00:00
Nikita Shulga
bed85cce8b [Prim] Implement group_norm_backward (#84037)
Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus:
```
#!/usr/bin/env python3.8
import torch

func = torch.ops.aten.native_group_norm_backward.default
decomp =  torch._decomp.decomposition_table[func]
for args in (
        (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]),
        (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])):
    nrc=func(*args)
    drc=decomp(*args)
    for i in range(len(nrc)):
       print(i, torch.max(nrc[i]-drc[i]))
    print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc)))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037
Approved by: https://github.com/Chillee, https://github.com/ngimel
2022-08-27 01:10:27 +00:00
Horace He
9a236c7ab4 Made some minor cleanups to decompositions (#83814)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83814
Approved by: https://github.com/ngimel
2022-08-26 10:55:31 +00:00
Animesh Jain
e2f75d63d4 Decomposition - batch_norm, save_mean and save_variance always float32 (#84013)
AMP error shown here - https://github.com/pytorch/torchdynamo/issues/835

Test missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84013
Approved by: https://github.com/ezyang
2022-08-25 16:09:52 +00:00
Ivan Yashchuk
473b733bae Replace .new_zeros(()) with 0.0 in torch/_decomp/decompositions (#83734)
`new_zeros` is decomposed into `prims.empty_strided`+`prims.fill`+`prims.copy_to` and none of these are supported by prims+nvFuser executor currently.
Replacing it with 0.0 makes these backward decompositions nvFuser friendly.

Example with `torch.ops.aten.hardsigmoid_backward.default`:
```py
# Before this PR
opcode         name                      target                            args                                                          kwargs
-------------  ------------------------  --------------------------------  ------------------------------------------------------------  ----------------------------------------------------------------------------------------
placeholder    a_1                       a_1                               ()                                                            {}
placeholder    g_1                       g_1                               ()                                                            {}
call_function  gt_default                nvprims.gt.default                (a_1, -3.0)                                                   {}
call_function  lt_default                nvprims.lt.default                (a_1, 3.0)                                                    {}
call_function  bitwise_and_default       nvprims.bitwise_and.default       (gt_default, lt_default)                                      {}
call_function  mul_default               nvprims.mul.default               (g_1, 0.16666666666666666)                                    {}
call_function  empty_strided             prims.empty_strided.default       ([], [])                                                      {'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}
call_function  fill_default              prims.fill.default                (empty_strided, 0)                                            {}
call_function  copy_to_default           prims.copy_to.default             (empty_strided, fill_default)                                 {}
call_function  broadcast_in_dim_default  nvprims.broadcast_in_dim.default  (copy_to_default, [3, 2], [])                                 {}
call_function  where_default             nvprims.where.default             (bitwise_and_default, mul_default, broadcast_in_dim_default)  {}
output         output                    output                            (where_default,)                                              {}

# After this PR
opcode         name                 target                       args                                     kwargs
-------------  -------------------  ---------------------------  ---------------------------------------  --------
placeholder    a_1                  a_1                          ()                                       {}
placeholder    g_1                  g_1                          ()                                       {}
call_function  gt_default           nvprims.gt.default           (a_1, -3.0)                              {}
call_function  lt_default           nvprims.lt.default           (a_1, 3.0)                               {}
call_function  bitwise_and_default  nvprims.bitwise_and.default  (gt_default, lt_default)                 {}
call_function  mul_default          nvprims.mul.default          (g_1, 0.16666666666666666)               {}
call_function  where_default        nvprims.where.default        (bitwise_and_default, mul_default, 0.0)  {}
output         output               output                       (where_default,)                         {}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83734
Approved by: https://github.com/Chillee
2022-08-22 09:12:13 +00:00
Edward Z. Yang
02581f053b Address CR comments for "Delete ProxyTensor wrapper subclass" (#83646)
CR is on https://github.com/pytorch/pytorch/pull/83330

- Factor proxy slot getters/setters into helper functions
- Use a weak map for storing proxies, so they go away when
  tracing is done
- More documentation on SymDispatchMode

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83646
Approved by: https://github.com/Chillee
2022-08-18 22:18:09 +00:00
Edward Z. Yang
817a82704f Delete ProxyTensor wrapper subclass (#83330)
I was working on https://github.com/pytorch/torchdynamo/issues/80 and my
working hypothesis for what was causing the error was that proxy tensor
was not advertising correct dispatch keys, causing AMP to operate
differently when you traced.  I could have fixed this directly by
replicating fake tensor's fix for setting dispatch keys to also apply to
proxy tensor, but I was like, "Why must I repeat myself."

This PR is the result.  It completely deletes the ProxyTensor wrapper
subclass, so that when we are tracing, the tensors flowing through the
program are the *original* real or fake tensors, depending on what the
user requested in the top-level API.  There is no more wrapping.  To
store the Proxy objects necessary for actually doing tracing, I store
the property directly on the tensors.  (Note: I never
clean up old entries from the map at the moment, this is easily fixed
by using a weak map)

Benefits of doing this:

* No more tip-toeing around no_dispatch() creation of new ProxyTensors;
  we never create new tensors (except when we call the underlying func),
  so you don't have to worry about accidentally tracing them.

* No more syncing up metadata from in place operators.  In particular
  https://github.com/pytorch/pytorch/issues/81526 is mooted

* This fixes https://github.com/pytorch/torchdynamo/issues/519 as we no longer need to teach proxy tensor to support sparse tensor.

* No more schlepping symbolic integers from the inner fake tensor to the
  outer proxy tensor.  If you can make a fake tensor with symbolic ints,
  you're done, nothing else to do.

To avoid having to rewrite all of the guts, when I get to the actual
proxy tensor handler, I first "fetch" the stored ProxyTensor data from
the weakmap via a tree_map, and then operate on the consequent data as
before.  A more optimized implementation is possible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83330
Approved by: https://github.com/Chillee
2022-08-18 01:56:07 +00:00
Nikita Karetnikov
cd86d25515 [primTorch] Move addcdiv from decompositions -> refs (#80842)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80842
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-16 17:23:00 +00:00
Horace He
f02f304657 Added nll_loss_forward decomposition + some other minor decomps (#83235)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83235
Approved by: https://github.com/ngimel
2022-08-13 10:24:58 +00:00
Brian Hirsh
1a51efd8bb dispatch API for checking computed table, use it in prim decomps (#82358)
Fixes https://github.com/pytorch/pytorch/issues/82331

Expose a `torch._C._dispatch_has_computed_kernel_for_dispatch_key` to check if an operator has a kernel registered to the given dispatch key in the **computed table**.

Use it in the prim registration logic, making it more accurate and robust (so that it e.g. picks up `CompositeExplicitAutograd` kernels.

It looks like before this change we'd register 134 prim ops to the meta key, and after we only register 62. So that's 72 ops that now use an existing C++ decomp to get meta working, instead of going directly through the prim decomp.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82358
Approved by: https://github.com/ezyang
2022-08-10 23:42:02 +00:00
Natalia Gimelshein
112ec24f09 Fix device behavior for masked_fill (#82737)
Fixes #81018, based on #81036.
It will create graph break for cpu 0d tensor value due to .item() call (we could maybe specialize on that instead of breaking?), but otherwise it would create graph break due to synchronizing `to` call, so there's no way around :-(, and for number `value` argument we already should be specializing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82737
Approved by: https://github.com/Chillee
2022-08-04 15:47:56 +00:00
Brian Hirsh
4a77bee661 prevent python view impls from getting registered to the meta key (#82007)
We don't want to register view ops in python to the `Meta` dispatch key, because doing that prevents us from correctly aliasing storage information. This PR fixes the existing python registrations, and makes it an error to do that in the future. Example:
```
with FakeTensorMode.push() as mode:
    b = torch.ones(2)
    c = b.unsqueeze(-1)
    b_ = StorageWeakRef(b.storage())
    c_ = StorageWeakRef(c.storage())
    print(b_.cdata)
    print(c_.cdata)  # their storages are different (now fixed in this PR)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82007
Approved by: https://github.com/ezyang, https://github.com/eellison
2022-07-27 17:15:05 +00:00
Shangdi Yu
9088757cc6 move aten.native_batch_norm_backward decomposition to core (#81522)
Move  aten.native_batch_norm_backward decomposition from  https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L148.

Changed to not recompute mean and invstd, added type cast.

In fucntorch, changed `@register_decomposition_for(aten.native_batch_norm_backward)` to `@register_decomposition_for_jvp(aten.native_batch_norm_backward)`

Passing `pytest test/test_decomp.py -k norm`

Note that when the output mask is False for grad_weight and grad_bias, we should return None to be consistent with the non-decomposed operator's behavior. But "None" doesn't work with vjp, so the version of decomposition in functorch used zeros. See b33c1f7dd4/functorch/functorch/_src/decompositions.py (L210).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81522
Approved by: https://github.com/Chillee
2022-07-27 06:11:34 +00:00
lezcano
11fe277b62 [PrimTorch] Add reference for torch.norm (#81765)
This ref does more things than `torch.norm`, and it fixes a few bugs
that `torch.norm` has. This implementation and the `torch.norm`
implementation come to terms in the next PR of this stack

We put this PR before, as otherwise `test_decomp.py` was failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81765
Approved by: https://github.com/ngimel
2022-07-25 19:57:21 +00:00
Vivek Khandelwal
cb63ffc553 Add decomposition for aten.upsample_bilinear2d.vec (#80964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80964
Approved by: https://github.com/jansel, https://github.com/Chillee
2022-07-23 02:22:15 +00:00
Huy Do
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
Horace He
a5fb41e3d3 Revert "Revert "Refactored prim utils into _prims_utils folder (#81746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81746
Approved by: https://github.com/anijain2305, https://github.com/Krovatkin
2022-07-20 23:43:57 +00:00
PyTorch MergeBot
e43a02c314 Revert "Refactored prim utils into _prims_utils folder (#81088)"
This reverts commit 80231d0a72.

Reverted https://github.com/pytorch/pytorch/pull/81088 on behalf of https://github.com/jeanschmidt due to breaking internal tests
2022-07-19 19:56:41 +00:00
Horace He
80231d0a72 Refactored prim utils into _prims_utils folder (#81088)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81088
Approved by: https://github.com/ngimel
2022-07-19 03:55:51 +00:00
Natalia Gimelshein
50d205c551 make clamp decomps use torch.* calls, move clamp_min/clamp_max to refs (#81619)
Per title,
@chillee is anything else necessary to remove decomp other than decorating ref with `register_decomposition`?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81619
Approved by: https://github.com/Chillee
2022-07-18 16:52:45 +00:00
Horace He
5139053e02 Fixed the decomposition for embedding_dense_backward (#81528)
No guarantee about the strides of `grad_output`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81528
Approved by: https://github.com/jansel
2022-07-15 17:51:00 +00:00
Edward Z. Yang
fca03eeec1 Make proxy tensor support item() calls on torch.tensor constants (#81192)
This PR is doing a few interrelated things, all of which are necessary to get correctness. Read the comment in torch/fx/experimental/proxy_tensor.py for the high level overview.

Let's break down the parts of this PR:

* Bug fix where `enable_torch_dispatch_mode` with `None` doesn't work. This make `enable_torch_dispatch_mode(current_mode.inner)` work which is the basis for how we temporarily disable fake tensor mode.
* Bug fix for when fake tensor mode is combined with a non-mode tensor subclass. This actually could be ablated from this PR but it affects where the logic for allowing non fake tensor inputs with lift goes, so it's all in here in one go. There are some relevant tests for the fix in fake tensor, but it turns out I didn't need this because I'm always using proxy tensors as a mode (which ensures the ordering is right.)
* New `lift_fresh` view operator.  Note that like lift, we have to manually write the functionalize kernel for these functions.
* The actual change, which is to save constants when we see them in the proxy tensor mode, and then propagate them as we go (because otherwise you'll handle mutations on constants incorrectly--see test.)

This is mildly BC-breaking if anyone was previously interposing on
at::lift, but this operator was relatively new and I checked
functorch which has no explicit reference to lift.  So I think it
should not be too disruptive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81192
Approved by: https://github.com/samdow, https://github.com/bdhirsh
2022-07-15 03:53:40 +00:00
lezcano
b5b9db9f84 Make kl_div a composite function. (#80334)
Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285

Fixes https://github.com/pytorch/pytorch/issues/80158
Fixes https://github.com/pytorch/pytorch/issues/78867
Fixes https://github.com/pytorch/pytorch/issues/69230

Supersedes https://github.com/pytorch/pytorch/pull/79007
Supersedes https://github.com/pytorch/pytorch/pull/69212
Supersedes https://github.com/pytorch/pytorch/pull/19659
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334
Approved by: https://github.com/ezyang
2022-07-13 20:07:36 +00:00
PyTorch MergeBot
f2c8557521 Revert "Make kl_div a composite function. (#80334)"
This reverts commit 828c787ea9.

Reverted https://github.com/pytorch/pytorch/pull/80334 on behalf of https://github.com/ezyang due to doesn't work with xla
2022-07-06 17:51:06 +00:00
lezcano
eb0889cf7d Add support for multiple inputs to out_wrapper and strict dtype checking (#80601)
Reland of https://github.com/pytorch/pytorch/pull/79941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80601
Approved by: https://github.com/albanD
2022-07-05 12:31:21 +00:00
lezcano
828c787ea9 Make kl_div a composite function. (#80334)
Benchmarks: https://github.com/pytorch/pytorch/pull/80334#issuecomment-1167229285

Fixes https://github.com/pytorch/pytorch/issues/80158
Fixes https://github.com/pytorch/pytorch/issues/78867
Fixes https://github.com/pytorch/pytorch/issues/69230

Supersedes https://github.com/pytorch/pytorch/pull/79007
Supersedes https://github.com/pytorch/pytorch/pull/69212
Supersedes https://github.com/pytorch/pytorch/pull/19659
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80334
Approved by: https://github.com/ezyang
2022-07-04 19:33:43 +00:00
PyTorch MergeBot
184a065ba7 Revert "Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)"
This reverts commit dc7066a8f0.

Reverted https://github.com/pytorch/pytorch/pull/79941 on behalf of https://github.com/suo due to broke master dc7066a8f0
2022-06-30 03:29:30 +00:00
lezcano
dc7066a8f0 Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)
When a function returns multiple parameters in PyTorch, the `out`
parameter takes a tuple of tensors (see `linalg.svd` for example).
The current implementation in `out_wrapper_multi` modelled this wrong,
as it assumed that it would take a number of different named
parameters.

This PR implements the correct behaviour in `out_wrapper`. As a small
side-effect, we now need to call `@out_wrapper()` when the output is
just one tensor.

This PR also implements an additional optional parameter that checks
whether the dtype of the given `out` is exactly the dtype that the meta
function requires. This is the behaviour that we currently have in
PyTorch, and this check is necessary in eager when we call with these
tensors into external libraries.

We also make the functions with several outputs return a namedtuple,
similar to what we do in PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79941
Approved by: https://github.com/mruberry, https://github.com/ezyang
2022-06-30 02:47:16 +00:00
Horace He
d43e6c9f4a Revert "Revert "formatted _decomp folder with black""
This reverts commit 2027eae67c.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79226

Approved by: https://github.com/Krovatkin
2022-06-22 20:47:52 +00:00
Horace He
4193252de9 Revert "Revert "Added kl_div_backward decomp""
This reverts commit 60a13f4ec9.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79225

Approved by: https://github.com/Krovatkin
2022-06-22 18:09:52 +00:00
Horace He
e89676f76c fix logical_not reland issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79900

Approved by: https://github.com/ngimel
2022-06-21 03:41:18 +00:00
PyTorch MergeBot
79507d2a9d error when registering meta kernels to composite ops in core
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79741

Approved by: https://github.com/Chillee, https://github.com/albanD
2022-06-21 02:17:13 +00:00
Nikita Shulga
f5eb05f107 Revert "Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads""
This reverts commit f3665dd237.

Reverted https://github.com/pytorch/pytorch/pull/79819 on behalf of https://github.com/malfet due to land raced with softshrink refs
2022-06-20 14:22:15 -07:00
Horace He
f3665dd237 Reland #2 of "Added {logical_not, trace} refs, moved logical ops to use method overloads"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79819

Approved by: https://github.com/mruberry
2022-06-20 19:50:43 +00:00
lezcano
16f30b494c Make l1_loss composite
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.

As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants  for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79804

Approved by: https://github.com/zou3519, https://github.com/malfet
2022-06-20 19:10:54 +00:00
Jason Ansel
d2e18606e7 Fix view issue in embedding_dense_backward decomp (#79857)
I was hitting:
```
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 66, in proxy_call
    return CURRENT_DECOMPOSITION_TABLE[func_overload](*args, **kwargs)
  File "/home/jansel/pytorch/torch/_decomp/decompositions.py", line 801, in embedding_dense_backward
    indices_rank1 = indices.view(numel)
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 122, in __torch_dispatch__
    return proxy_call(func_overload, args, kwargs)
  File "/home/jansel/pytorch/torch/fx/experimental/proxy_tensor.py", line 86, in proxy_call
    real_out = func_overload(*args, **kwargs)
  File "/home/jansel/pytorch/torch/_ops.py", line 49, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79857
Approved by: https://github.com/Chillee
2022-06-20 17:58:14 +00:00
PyTorch MergeBot
d4a9438786 Revert "Make l1_loss composite"
This reverts commit 61a5c779bf.

Reverted https://github.com/pytorch/pytorch/pull/78257 on behalf of https://github.com/malfet due to This breaks executorch
2022-06-17 18:14:21 +00:00
Ivan Yashchuk
bc1fef96af Reference implementations for rsqrt and native_layer_norm (#79413)
This PR adds references for:
- `torch.rsqrt`
- `torch.native_layer_norm`
-  `torch.nn.functional.layer_norm`

`native_layer_norm` had a different number of dimensions if the input was 0-sized. I fixed that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79413
Approved by: https://github.com/mruberry, https://github.com/Chillee
2022-06-17 07:24:02 +00:00
Jason Ansel
c8fb02b452 Use amax instead of max for softmax decomps (#79667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79667
Approved by: https://github.com/Chillee
2022-06-16 04:09:33 +00:00
lezcano
61a5c779bf Make l1_loss composite
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.

As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants  for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78257

Approved by: https://github.com/jbschlosser
2022-06-16 00:03:22 +00:00
PyTorch MergeBot
fefff54cad Revert "Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"""
This reverts commit a2d2981e8e.

Reverted https://github.com/pytorch/pytorch/pull/79224 on behalf of https://github.com/suo due to broke lots of things a2d2981e8e
2022-06-10 04:40:43 +00:00
Horace He
a2d2981e8e Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads""
This reverts commit d67309aefb.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79224

Approved by: https://github.com/mruberry
2022-06-10 03:07:14 +00:00
PyTorch MergeBot
d67309aefb Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"
This reverts commit 64b6bd8c1e.

Reverted https://github.com/pytorch/pytorch/pull/79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000
2022-06-09 13:11:23 +00:00
PyTorch MergeBot
60a13f4ec9 Revert "Added kl_div_backward decomp"
This reverts commit a08685ebc9.

Reverted https://github.com/pytorch/pytorch/pull/79001 on behalf of https://github.com/malfet due to PR failed in newly added tests, see https://hud.pytorch.org/pr/79001
2022-06-09 13:08:30 +00:00
PyTorch MergeBot
2027eae67c Revert "formatted _decomp folder with black"
This reverts commit 4945c72151.

Reverted https://github.com/pytorch/pytorch/pull/79002 on behalf of https://github.com/janeyx99 due to Broke decomp tests on trunk + also on PR https://hud.pytorch.org/minihud#4945c72151e29cb524974e1714654cf790ddb37d
2022-06-09 12:58:03 +00:00
Horace He
4945c72151 formatted _decomp folder with black
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79002

Approved by: https://github.com/ezyang
2022-06-09 07:16:37 +00:00
Horace He
a08685ebc9 Added kl_div_backward decomp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79001

Approved by: https://github.com/ezyang
2022-06-09 07:16:37 +00:00
Horace He
64b6bd8c1e Added {logical_not, trace} refs, moved logical ops to use method overloads
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79000

Approved by: https://github.com/ezyang
2022-06-09 07:16:36 +00:00
Horace He
dc11a5642d Improved stack ref and added more decomposition annotations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78994

Approved by: https://github.com/mruberry
2022-06-09 03:20:28 +00:00
PyTorch MergeBot
8ce310b943 Revert "Revert "moved logit to use torch ops instead of refs + added …a couple more decompositions"" (#79082)
cc: @osalpekar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79082
Approved by: https://github.com/eellison
2022-06-08 01:44:53 +00:00
PyTorch MergeBot
7d192d48d2 Revert "moved logit to use torch ops instead of refs + added a couple more decompositions"
This reverts commit 1d9f445b5d.

Reverted https://github.com/pytorch/pytorch/pull/78984 on behalf of https://github.com/osalpekar due to broke some jobs, like meta functorch builds
2022-06-07 21:51:41 +00:00
Horace He
1d9f445b5d moved logit to use torch ops instead of refs + added a couple more decompositions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78984

Approved by: https://github.com/ezyang
2022-06-07 05:34:05 +00:00
Horace He
69778ee4eb Ported nn.functional functions to use torch calls instead of ref calls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78978

Approved by: https://github.com/ezyang
2022-06-07 05:09:05 +00:00
Horace He
e675dbadc4 Ported gelu decomp to ref (#78697)
Ugh... these are actually so painful to write without operator overloading lol.

Decided to just utilize operator overloading, and xfail the ref tests for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78697
Approved by: https://github.com/mruberry
2022-06-06 22:30:20 +00:00
Horace He
ea3c4d0c75 Added glu_backward decomp (#78919)
Requested in https://github.com/pytorch/torchdynamo/issues/327
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78919
Approved by: https://github.com/mruberry
2022-06-06 19:56:01 +00:00
Horace He
080cf84bed Reland hardtanh ref (again) (#78914)
Fixes land race between 823ddb6e87 and Ed's stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78914
Approved by: https://github.com/wanchaol
2022-06-06 09:39:01 +00:00
PyTorch MergeBot
ddf1930734 Revert "reland Hardtanh ref (#78894)"
This reverts commit 823ddb6e87.

Reverted https://github.com/pytorch/pytorch/pull/78894 on behalf of https://github.com/suo due to this caused unexpected successes on master (lol), search test_python_ref_meta__refs_nn_functional_hardtanh_cpu_bfloat16: 823ddb6e87"`
2022-06-06 03:59:53 +00:00
Horace He
823ddb6e87 reland Hardtanh ref (#78894)
Reland of https://github.com/pytorch/pytorch/pull/78689

cc: @kit1980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78894
Approved by: https://github.com/kit1980
2022-06-06 02:09:31 +00:00
PyTorch MergeBot
e6cc2e8d38 Revert "Ported hardtanh decomposition to ref (#78689)"
This reverts commit 484282a6fd.

Reverted https://github.com/pytorch/pytorch/pull/78689 on behalf of https://github.com/kit1980 due to test_meta_nn_functional_hardtanh_cuda_float32 failed on both PR and trunk, see 484282a6fd
2022-06-05 17:46:54 +00:00
Horace He
484282a6fd Ported hardtanh decomposition to ref (#78689)
One note:

The logic for handling scalar boundary conditions seems to be a bit different than other ops - I simply copied the ATen logic (https://github.com/pytorch/pytorch/blob/hardtanh_ref/aten/src/ATen/native/Activation.cpp#L370). Not sure if it's an inconsistency we should fix.

Will add error opinfo after figuring out the scalar boundary condition stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78689
Approved by: https://github.com/mruberry
2022-06-05 11:41:23 +00:00
Horace He
1ea4075bda Ported t decomp to become a ref (#78686)
Also added an error input for `t`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78686
Approved by: https://github.com/mruberry
2022-06-03 01:16:20 +00:00
Shangdi Yu
02273f056b Norm decomposition (#78582)
A decomposition for torch.ops.aten.norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78582
Approved by: https://github.com/Chillee
2022-06-02 00:25:43 +00:00
Jason Ansel
dabf8f0569 Populate the torch._decomp table on import (#78476)
#78041 broke TorchInductor, because of:
```
>>> from torch import _decomp
>>> import torch
>>> _decomp.get_decompositions([torch.ops.aten.leaky_relu])
{}
>>> import torch._refs.nn.functional
>>> _decomp.get_decompositions([torch.ops.aten.leaky_relu])
{<OpOverload(op='aten.leaky_relu', overload='default')>: <function leaky_relu at 0x7f5a39b56c10>, <OpOverload(op='aten.leaky_relu', overload='out')>: <function leaky_relu at 0x7f5a39b56c10>}
```

cc @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78476
Approved by: https://github.com/Chillee
2022-05-31 03:46:38 +00:00
Bairen Yi
b6672b10e1 Fix incorrect decomposition for native_dropout (#77933)
Quick sanity check: it should be identity function if p=0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77933
Approved by: https://github.com/Chillee
2022-05-30 20:08:48 +00:00
Aidyn-A
31016eb81e [primTorch] Elementwise Binary Ops I (#78023)
This PR is a result of collaboration with @rdspring1 and @mruberry on primTorch.

It adds the following prims:
- `fmax`
- `fmin`
- `fmod`

And adds the following refs:
- `fmax`
- `fmin`
- `fmod`
- `logical_xor`

The work is in progress as there are some tests that fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78023
Approved by: https://github.com/mruberry
2022-05-26 20:22:27 +00:00
Horace He
ea5d01e629 [Primtorch] Tried porting leaky_relu into a ref (#78041)
Feels good to delete it from `torch._decomps`. This is mainly to clarify the process for me -

Seems like there's still some components missing of the `torch <-> refs` mapping? For example, seems like methods don't work yet for mapping from torch <-> refs, and neither do the meta tests? (cc: @ezyang).

If I replace the `torch` with `refs`, then the tests seem to pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78041
Approved by: https://github.com/mruberry
2022-05-23 18:00:21 +00:00
Horace He
4428218945 [primtorch] Added native_group_norm decomp (#78029)
cc: @jansel @bertmaher

More or less identical in spirit to the layer norm and batch norm ones.

One annoying thing about all 3 of these is that layer_norm has slightly different `mean/var` semantics than batch norm and group norm. After normalization, `layer_norm` keeps them unsqueezed (so they're something like [1, 5, 1, 1]) while batch norm and group norm squeeze out the 1-dims.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78029
Approved by: https://github.com/bertmaher
2022-05-21 08:07:02 +00:00
Edward Z. Yang
6b273444c4 Add logit ref; allow non-refs to be called in refs.
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77816

Approved by: https://github.com/mruberry
2022-05-21 02:35:14 +00:00
Horace He
64b4bb4b01 Fix meta tests on norm (and relanding norm fixes) (#77930)
Had a land race with meta tests.

Will also be relanding https://github.com/pytorch/pytorch/pull/77407
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77930
Approved by: https://github.com/malfet, https://github.com/ezyang
2022-05-20 23:15:53 +00:00
PyTorch MergeBot
03546e9c07 Revert "Fixed type promotion semantics for native_batch_norm and native_layer_norm (#77407)"
This reverts commit 70d80fb424.

Reverted https://github.com/pytorch/pytorch/pull/77407 on behalf of https://github.com/malfet due to as it broke meta tests ( I guess due to landrace), see 70d80fb424
2022-05-20 02:31:57 +00:00
Horace He
70d80fb424 Fixed type promotion semantics for native_batch_norm and native_layer_norm (#77407)
Originally, when these were written, they simply used the naive strategy of "upcast all inputs to floats, and downcast all inputs back". In addition to being... not quite what the kernels did, they also didn't capture some additional semantics. Namely, that the norms (except for layer norm on CPU! cc: @ngimel) return fp32 for the mean and rstd values.

Also, folks didn't like that I wrote `native_layer_norm` in terms of `native_batch_norm`. Which is fair - so I refactored the common logic into a `normalize` function.

cc: @jansel / @bertmaher , who've been looking at lowering layer norm/batch norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77407
Approved by: https://github.com/bertmaher
2022-05-19 17:11:47 +00:00
Edward Z. Yang
88c89c9eb9 log_sigmoid_forward out support; out_wrapper_multi
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77739

Approved by: https://github.com/mruberry
2022-05-19 14:43:35 +00:00
Edward Z. Yang
4941e72e40 Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)""
This reverts commit c35bd8d423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719

Approved by: https://github.com/Chillee, https://github.com/malfet
2022-05-18 18:40:57 +00:00
Mike Ruberry
580a053832 [primTorch] Enforces stride metadata (#77542)
This PR...

**Filed the Following Issues**
- https://github.com/pytorch/pytorch/issues/77553
- https://github.com/pytorch/pytorch/issues/77526
- https://github.com/pytorch/pytorch/issues/77600

**Testing**
- Updates test_dtypes to longer attempt to test the backward of sample inputs where no inputs require grad
- Adds a new test_python_reference_errors; it ensures the meta operations for references throw errors as expected
- Updates compare_tensor_meta to better handle CUDA devices, and (temporarily) restricts stride checking to the CUDA device type
- Elementwise unary and elementwise binary operators now have arbitrarily strided reference inputs
- Reference inputs for _like functions are added
- An OpInfo for torch.empty is added
- Reference inputs for torch.clone are added
- A NumPy reference for clone is added
- Adds OpInfos for refs.empty and refs.empty_like

**Prims**
- Renames the "max" and "min" prims have been renamed to "maximum" and "minimum," respectively, to better conform to their ATen names
- Adds the empty, empty_like, full, and full_like prims
- Fixes the elementwise meta function's stride propagation
- Fixes clone's meta function's stride propagation
- Fixes convert_element_type's meta's stride propagation
- Adds a (temporary) _to_dtype pprivate prim that casts a tensor while preserving its stride permutation
- Removes the _set prim comment
- Adds utils.compute_elementwise_output_strides, which computes the correct output strides for elementwise operations
- Corrects an issue where utils.make_contiguous_strides_for was creating the incorrect strides for tensors with no elements

**References**
- Adds the empty, empty_like, full, full_like, and ones_like refs
- Extends make_elementwise_unary_reference to accept an additional callable to perform extra input validation
- Adds an extra validation function to handle refs.neg(BoolTensor)
- Updates the isfinite ref to call ones_like when appropriate
- Models Python scalar handling for elementwise binary operations
- Added a 64 dim check for the amin and amax references
- opmath is now a flag that can be set separately for cpu and CUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77542
Approved by: https://github.com/ezyang
2022-05-18 13:57:26 +00:00
PyTorch MergeBot
48581d74ad Revert "Add dispatch mode testing for meta tensors and other stuff"
This reverts commit c1cdb1216b.

Reverted https://github.com/pytorch/pytorch/pull/77477 on behalf of https://github.com/malfet
2022-05-18 02:56:48 +00:00
Edward Z. Yang
c1cdb1216b Add dispatch mode testing for meta tensors and other stuff
We don't have any coverage for meta tensor correctness for backwards
because torch function mode can only allow us to interpose on
Python torch API calls, but backwards invocations happen from C++.
To make this possible, I add torch_dispatch_meta test which runs the
tests with __torch_dispatch__

While doing this, I needed to generate fresh expected failure / skip
lists for the new test suite, and I discovered that my original
scaffolding for this purpose was woefully insufficient.  So I rewrote
how the test framework worked, and at the same time rewrote the
__torch_function__ code to also use the new logic.  Here's whats
new:

- Expected failure / skip is now done on a per function call basis,
  rather than the entire test.  This means that separate OpInfo
  samples for a function don't affect each other.

- There are now only two lists: expect failure list (where the test
  consistently fails on all runs) and skip list (where the test
  sometimes passes and fails.

- We explicitly notate the dtype that failed.  I considered detecting
  when something failed on all dtypes, but this was complicated and
  listing everything out seemed to be nice and simple.  To keep the
  dtypes short, I introduce a shorthand notation for dtypes.

- Conversion to meta tensors is factored into its own class
  MetaConverter

- To regenerate the expected failure / skip lists, just run with
  PYTORCH_COLLECT_EXPECT and filter on a specific test type
  (test_meta or test_dispatch_meta) for whichever you want to update.

Other misc fixes:

- Fix max_pool1d to work with BFloat16 in all circumstances, by making
  it dispatch and then fixing a minor compile error (constexpr doesn't
  work with BFloat16)

- Add resolve_name for turning random torch API functions into string
  names

- Add push classmethod to the Mode classes, so that you can more easily
  push a mode onto the mode stack

- Add some more skips for missing LAPACK

- Added an API to let you query if there's already a registration for
  a function, added a test to check that we register_meta for all
  decompositions (except detach, that decomp is wrong lol), and then
  update all the necessary sites to make the test pass.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77477

Approved by: https://github.com/zou3519
2022-05-18 00:18:34 +00:00
Horace He
8626f76555 Add trace and log_sigmoid_forward decomps (#77329)
Main question mark is that `log_sigmoid_forward` uses `acc_t` instead of `opmath_t` - not sure if we have a decorator today for that?

Glad to add one if we don't.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77329
Approved by: https://github.com/ezyang
2022-05-13 04:55:52 +00:00
Edward Z. Yang
d5ed73badd Make it possible to register decompositions to Meta key
Decompositions can be used to fill in meta support where necessary,
assuming the operations they decompose to support meta key.
This PR adds register_meta kwarg to register_decomposition that
optionally lets you register the meta to the C++ dispatch table
for meta tensors.  I use this to then get the meta function for
where and huber_loss for free.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77353

Approved by: https://github.com/mruberry
2022-05-12 23:20:16 +00:00
Horace He
c25bdeea26 Added logsumexp decomposition (#77219)
Pretty simple.

cc: @jansel who mentioned this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77219
Approved by: https://github.com/jansel
2022-05-12 02:01:31 +00:00
samdow
d694cf60fe add decomposition for nll_loss2d_backward (#77198)
Adds a decomposition for `nll_loss2d_backward`

This will let us actually run all the tests for jvpvjp ([see this functorch PR](https://github.com/pytorch/functorch/pull/792)). I confirmed locally that this made those tests pass too
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77198
Approved by: https://github.com/Chillee
2022-05-11 20:41:20 +00:00
Mike Ruberry
bb8baea932 [primTorch] flatten, squeeze, unsqueeze... (#77043)
This PR ...

Makes the following testing changes:

- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
  - https://github.com/pytorch/pytorch/issues/77046

Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module

Adds the following prims:
- expand_dims
- view_of
- rev
- clone

Adds the following references:
  -  flatten
  - squeeze
  - unsqueeze
  - special.i0e
  - special.i1e
  - logical_or
  - logical_and
  - isclose
  - flip
  - stack
  - nn.functional.elu
  - chunk
  - clone
  - narrow

Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
2022-05-09 11:24:55 +00:00
Mike Ruberry
c031643e39 Adds decorators for Python References and extends Python Reference testing (#76945)
This PR does the following...

Tests:
- fixes test_type_promotion in test_binary_ufuncs to correctly generate scalar cpu tensors
- fixes test_python_reference_consistency to use the Python Reference's reference inputs
- extends Python reference testing to test_conj_view, test_neg_view, and test_neg_conj_view
- adds a NaN propagation sample input for elementwise unary and binary operations
- fixes the UnaryUfuncInfo class to properly register its reference inputs
- Updates the Python Reference OpInfos to skip error inputs when their behavior on scalar inputs is inconsistent with their reference operators

Code organization:
- moves elementwise type promotion functionality to prims.utils

Prims & Refs:
- fixes scalar cpu tensor handling by having them pass through broadcasting and device and shape checks
- adds two decorators, `elementwise_type_promotion_wrapper` and `out_wrapper`, the former allows for elementwise type promotion to be automated and the latter automatically adds the out kwarg and handles it properly

cc @ezyang who also had some thoughts on cpu scalar tensor handling
cc @chillee -- might want to use this new decorator as we converge decompositions and references
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76945
Approved by: https://github.com/ngimel
2022-05-07 03:42:24 +00:00
Edward Z. Yang
f2eed9400d Register PrimTorch refs as decompositions.
For the most part, PrimTorch refs have the same signature as their
ATen equivalents.  I modify most PrimTorch refs to register themselves
as decompositions, using the prim name they wrap to find the aten name
(except for a few cases where the prim/aten names mismatch).  There are
some exclusions, falling into one of two categories:

- The torch equivalent was already implemented as a CompositeImplicitAutograd
  decomposition in C++

- The ref doesn't support enough features (e.g., the real deal has more
  kwargs / overloads than are currently implemented)

PrimTorch refs are written as a single function that supports all
overloads, and this style is convenient for cases where we have a bundle
of overloads for what morally is a single overload with a Union type
on an argument (which we ought to have supported in
native_functions.yaml but blah); to support registering a single decomp
for all the overloads, we modify register_decomposition to register
to ALL overloads if you pass it an overload packet.  This is technically
BC breaking but no tests started failing because of it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76835

Approved by: https://github.com/Chillee, https://github.com/mruberry
2022-05-06 20:11:45 +00:00
Horace He
e9f34931ef Add some shape decomps (t, transpose, rot90, stack)
Also fixes xlogy (turns out the only thing it was missing was a type cast annotation! nice!)

I also renamed `canonicalize_idx` => `canonicalize_dim` (to align with `canonicalize_dims`) and fixed a bug in it (cc: @mruberry)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76873
Approved by: https://github.com/mruberry
2022-05-06 02:40:57 +00:00
Horace He
6917034afb Added logit/reciprocal decomps, fixed var for complex, moved type promotion logic to standardize on primtorch's
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76633
Approved by: https://github.com/ezyang
2022-05-04 21:29:52 +00:00
PyTorch MergeBot
ce63c53c9b Revert "Add binary_cross_entropy and trace decomp - fixed _log_softmax/_softmax dtype promotion semantics"
This reverts commit 8a3e9255ea.

Reverted https://github.com/pytorch/pytorch/pull/76670 on behalf of https://github.com/mruberry
2022-05-04 10:42:39 +00:00
Horace He
ed18181d83 Added gelu decomposition
^
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76763
Approved by: https://github.com/ezyang
2022-05-03 23:23:18 +00:00
Horace He
8a3e9255ea Add binary_cross_entropy and trace decomp - fixed _log_softmax/_softmax dtype promotion semantics
cc: @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76670
Approved by: https://github.com/ezyang
2022-05-03 18:20:17 +00:00
Horace He
fb24614011 Port functorch decomps over and fix some tests
Still some stuff to fix up, will finish later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76621
Approved by: https://github.com/ezyang
2022-05-01 08:48:48 +00:00
Edward Z. Yang
a3f10ec281 Move functorch decompositions to PyTorch
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76311

Approved by: https://github.com/Chillee
2022-04-30 16:47:53 +00:00