Commit Graph

118 Commits

Author SHA1 Message Date
Xuehai Pan
a229b4526f [BE] Prefer dash over underscore in command-line options (#94505)
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.

Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:

`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)

```python
class BooleanOptionalAction(Action):
    def __init__(...):
            if option_string.startswith('--'):
                option_string = '--no-' + option_string[2:]
                _option_strings.append(option_string)
```

It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-09 20:16:49 +00:00
Edward Z. Yang
c028fc4e25 Decouple PT2 dynamic shapes from the functorch setting (#94469)
The functorch setting still exists, but now it is no longer necessary:
we infer use of Python dispatcher by checking if the ambient
FakeTensorMode has a ShapeEnv or not.  The setting still exists,
but it is for controlling direct AOTAutograd use now; for PT2,
it's sufficient to use torch._dynamo.config.dynamic_shapes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469
Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel
2023-02-09 06:41:41 +00:00
PyTorch MergeBot
ca63040d2b Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363)"
This reverts commit 7bfc59993d.

Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/huydhn due to This change fails in trunk 7bfc59993d running out of memory.  Mark this as weird because it was green in PR
2023-02-09 01:24:35 +00:00
Bin Bao
7bfc59993d Set torch.backends.cudnn.enabled to false when testing accuracy (#94363)
Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363
Approved by: https://github.com/ezyang
2023-02-08 23:30:10 +00:00
Jason Ansel
eb1aca162e Re-enable cudagraphs for benchmark scripts (#94192)
Related to https://github.com/pytorch/pytorch/pull/93253

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94192
Approved by: https://github.com/albanD, https://github.com/desertfire
2023-02-08 16:38:32 +00:00
chuanqiw
94394e568e change the dynamo benchmark timeout as a parameter (#94284)
Change the dynamo benchmark timeout from hard code to a parameter with default value 1200ms, cause the hard code 1200ms timeout led some single thread mode model crashed on CPU platform. With the parameter, users can specify the timeout freely.

Fixes #94281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94284
Approved by: https://github.com/malfet
2023-02-08 00:45:08 +00:00
Bin Bao
db011e11ea Skip sebotnet33ts_256 on CI (#94067)
Summary: Random failure on CI and it happens more frequently lately.
Skip for now and filed an issue at https://github.com/pytorch/pytorch/issues/94066

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94067
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-02-06 14:58:54 +00:00
Edward Z. Yang
1d53123f44 Report graph breaks separately from graph count (#94143)
graph break != graph count - 1.  Suppose you have a nested
inline function call f1 to f2 to f3.  A graph break in f3
results in six graphs: f1 before, f2 before, f3 before, f3 after,
f2 after, f1 after.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94143
Approved by: https://github.com/voznesenskym
2023-02-05 04:03:12 +00:00
Edward Z. Yang
c1da35af5e Update dynamic benchmark skips (#94114)
Data from https://github.com/pytorch/pytorch/pull/94134

Signed-off-by: Edward Z. Yang <ezyangmeta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94114
Approved by: https://github.com/SherlockNoMad
2023-02-04 20:36:51 +00:00
Jason Ansel
e071d72f3c Tag dynamo backends as debug/experimental (#93878)
Hides debug/experimental backends by default.

Before:
```
torch._dynamo.list_backends()
['aot_eager', 'aot_eager_decomp_partition', 'aot_torchxla_trace_once', 'aot_torchxla_trivial', 'aot_ts', 'aot_ts_nvfuser', 'cudagraphs', 'dynamo_accuracy_minifier_backend', 'dynamo_minifier_backend', 'eager', 'inductor', 'ipex', 'nvprims_aten', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'torchxla_trace_once', 'torchxla_trivial', 'ts', 'tvm']
```

After:
```
torch._dynamo.list_backends()
['aot_ts_nvfuser', 'cudagraphs', 'inductor', 'ipex', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'tvm']
```

Fixes https://github.com/pytorch/pytorch/issues/93733

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93878
Approved by: https://github.com/voznesenskym
2023-02-04 00:50:51 +00:00
Jason Ansel
0a93e6db5a Fix/refactor dynamo ipex backend (#93863)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93863
Approved by: https://github.com/desertfire
2023-02-03 21:42:27 +00:00
Jason Ansel
203b2cad3e Remove fx2trt/torch2trt backends (#93822)
These backends have been broken for some time.  I tried to get them
running again, but as far as I can tell they are not maintained.
Installing torch_tensorrt downgrades PyTorch to 1.12.  If I manually
bypass that downgrade, I get import errors from inside fx2trt.  Fixes that
re-add these are welcome, but it might make sense to move these wrappers
to the torch_tensorrt repo once PyTorch 2.0 support is added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93822
Approved by: https://github.com/frank-wei
2023-02-03 21:04:21 +00:00
Jason Ansel
a5ff40032d Fix/refactor dynamo onnxrt backend (#93818)
Fixes https://github.com/pytorch/pytorch/issues/90352

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93818
Approved by: https://github.com/voznesenskym
2023-02-03 20:48:02 +00:00
Edward Z. Yang
2481fc0df4 Add count to FakeTensorMode.__torch_dispatch__ (#93936)
Most calls to fake tensor never hit `FakeTensor.__torch_dispatch__`

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93936
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2023-02-03 14:21:11 +00:00
Fabio Rocha
63115b70f0 Fixed issue with --diff-branch arg in dynamo benchmarks (#93989)
As @peterbell10 pointed out, it was giving incorrect results for `compression_ratio`
and `compression_latency` when you used `--diff-branch`.

This fixes this by running a separate subprocess for each branch to make sure you are not being affected by run for other branch.

Also added a couple of more significant figures
to numbers in summary table.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93989
Approved by: https://github.com/jansel
2023-02-03 08:36:57 +00:00
Jason Ansel
60e8c766b5 Refactor dynamo training backends (#93409)
This splits training.py into many files and moves them from `dynamo.optimizations.training` to `dynamo.backends.*`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93409
Approved by: https://github.com/ezyang
2023-02-03 03:07:15 +00:00
atalman
6e285c479d Remove cuda 11.6 from CI replace with 11.7 (#93406)
Remove cuda 11.6 from CI replace with 11.7
Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406
Approved by: https://github.com/malfet, https://github.com/desertfire
2023-02-02 19:16:05 +00:00
Jason Ansel
d7b39b17ab Remove torch/_dynamo/optimizations/{analysis,log_args}.py (#93279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93279
Approved by: https://github.com/voznesenskym
2023-02-02 02:34:36 +00:00
Edward Z. Yang
03b465a6d0 Add --iterations to benchmark script (#93858)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93858
Approved by: https://github.com/williamwen42
2023-02-01 21:56:49 +00:00
Edward Z. Yang
08041c5264 Configurable repro_tolerance for same_two_models (#93398)
Fixes https://github.com/pytorch/pytorch/issues/93293

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93398
Approved by: https://github.com/SherlockNoMad
2023-02-01 01:41:48 +00:00
Edward Z. Yang
811e95a15e --dynamic-ci-skips now works for all backends (#93369)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93369
Approved by: https://github.com/albanD
2023-01-31 20:07:58 +00:00
Edward Z. Yang
efee879695 Don't suppress warnings in CI. (#93269)
Warnings are an important clue that something bad is going on.
You want to see them in logs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93269
Approved by: https://github.com/voznesenskym
2023-01-30 19:21:09 +00:00
Edward Z. Yang
9eb402d18e Update dynamic benchmark skips (#93228)
Data from https://github.com/pytorch/pytorch/pull/93223

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93228
Approved by: https://github.com/desertfire
2023-01-30 14:22:53 +00:00
XiaobingSuper
9a2becf60a inductor: fix inplace op's wrong lowering issue when preop is NopKernel (#92247)
For TIMM ghostnet_100, there has such case, concat+inplace_add:

```
import torch
from torch._inductor import config
config.debug = True
torch._dynamo.config.verbose=True

class MockModule(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, y, z):
        out = torch.cat([x, y], dim=1)
        out+=z
        return out

mod = MockModule().eval()
inputs = (
                torch.randn([1, 64, 16, 16]),
                torch.randn([1, 64, 16, 16]),
                torch.randn([1, 128, 16, 16]),
            )
ref = mod(*inputs)

with torch.no_grad():
    opt_model = torch._dynamo.optimize('inductor')(mod)
    out = opt_model(*inputs)
    out = opt_model(*inputs)
    out = opt_model(*inputs)
print(torch.equal(ref, out))
```

the inductor always get a wrong result, I find that inductor get a wrong code:

```

from ctypes import c_void_p, c_long
import torch
import random
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile
from torch._inductor.select_algorithm import extern_kernels

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

kernel_cpp_0 = async_compile.cpp('''
#include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
extern "C" void kernel(const float* __restrict__ in_ptr0,
                       const float* __restrict__ in_ptr1,
                       const float* __restrict__ in_ptr2,
                       const float* __restrict__ in_ptr3,
                       float* __restrict__ out_ptr0,
                       float* __restrict__ out_ptr1,
                       float* __restrict__ out_ptr2)
{
    {
        for(long i0=0; i0<1024; i0+=1)
        {
            auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 16*i0);
            tmp0.store(out_ptr0 + 16*i0);
        }
        #pragma omp simd simdlen(8)
        for(long i0=16384; i0<16384; i0+=1)
        {
            auto tmp0 = in_ptr0[i0];
            out_ptr0[i0] = tmp0;
        }
    }
    {
        for(long i0=0; i0<1024; i0+=1)
        {
            auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 16*i0);
            tmp0.store(out_ptr1 + 16*i0);
        }
        #pragma omp simd simdlen(8)
        for(long i0=16384; i0<16384; i0+=1)
        {
            auto tmp0 = in_ptr1[i0];
            out_ptr1[i0] = tmp0;
        }
    }
    {
        for(long i0=0; i0<2048; i0+=1)
        {
            auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 16*i0);
            auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr3 + 16*i0);
            auto tmp2 = tmp0 + tmp1;
            tmp2.store(out_ptr2 + 16*i0);
        }
        #pragma omp simd simdlen(8)
        for(long i0=32768; i0<32768; i0+=1)
        {
            auto tmp0 = in_ptr2[i0];
            auto tmp1 = in_ptr3[i0];
            auto tmp2 = tmp0 + tmp1;
            out_ptr2[i0] = tmp2;
        }
    }
}
''')

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, arg1_1, arg2_1 = args
    args.clear()
    buf3 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32)
    buf0 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1))  # alias
    buf1 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1), 16384)  # alias
    buf2 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32)
    kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr()))
    del arg0_1
    del arg1_1
    del arg2_1
    return (buf3, )

if __name__ == "__main__":
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32)
    arg1_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32)
    arg2_1 = rand_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32)
    print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))

```
you can see that the add operation always adds a random value, see the ir code:

1. **ir_pre_fusion.txt**
```
buf0: SchedulerNode(ComputedBuffer)
buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))]
buf0.unmet_dependencies = []
buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))]
buf0.group.device = cpu
buf0.group.iteration = ((16384,), ())
buf0.sizes = ([16384], [])
buf0.aliases = ['buf3']
class buf0_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg0_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf0', get_index_1, load, None)
        return store

buf1: SchedulerNode(ComputedBuffer)
buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))]
buf1.unmet_dependencies = []
buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))]
buf1.group.device = cpu
buf1.group.iteration = ((16384,), ())
buf1.sizes = ([16384], [])
buf1.aliases = ['buf3']
class buf1_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg1_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf1', get_index_1, load, None)
        return store

buf2: NopKernelSchedulerNode(ConcatKernel)
buf2.writes = [StarDep(name='buf2')]
buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')]
buf2.met_dependencies = []

buf3: SchedulerNode(ComputedBuffer)
buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))]
buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))]
buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))]
buf3.group.device = cpu
buf3.group.iteration = ((32768,), ())
buf3.sizes = ([32768], [])
class buf3_loop_body:
    var_ranges = {z0: 32768}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('buf2', get_index)
        get_index_1 = self.get_index('index0')
        load_1 = ops.load('arg2_1', get_index_1)
        add = ops.add(load, load_1)
        get_index_2 = self.get_index('index0')
        store = ops.store('buf3', get_index_2, add, None)
        return store

```
2. **ir_post_fusion.txt**
```
buf0: SchedulerNode(ComputedBuffer)
buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))]
buf0.unmet_dependencies = []
buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))]
buf0.group.device = cpu
buf0.group.iteration = ((16384,), ())
buf0.sizes = ([16384], [])
buf0.aliases = ['buf3']
class buf0_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg0_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf0', get_index_1, load, None)
        return store

buf1: SchedulerNode(ComputedBuffer)
buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))]
buf1.unmet_dependencies = []
buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))]
buf1.group.device = cpu
buf1.group.iteration = ((16384,), ())
buf1.sizes = ([16384], [])
buf1.aliases = ['buf3']
class buf1_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg1_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf1', get_index_1, load, None)
        return store

buf2: NopKernelSchedulerNode(ConcatKernel)
buf2.writes = [StarDep(name='buf2')]
buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')]
buf2.met_dependencies = []

buf3: SchedulerNode(ComputedBuffer)
buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))]
buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))]
buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))]
buf3.group.device = cpu
buf3.group.iteration = ((32768,), ())
buf3.sizes = ([32768], [])
class buf3_loop_body:
    var_ranges = {z0: 32768}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('buf2', get_index)
        get_index_1 = self.get_index('index0')
        load_1 = ops.load('arg2_1', get_index_1)
        add = ops.add(load, load_1)
        get_index_2 = self.get_index('index0')
        store = ops.store('buf3', get_index_2, add, None)
        return store
```

From the ir code, you can see the buf3 always adds an empty buf2 which has never been written. The root cause is that there has a potential issue when doing the mutation for inplace add when its' input is a NopKernel.

After this PR, the ir will be like(**ir_pre_fusion.txt**):

```
buf0: SchedulerNode(ComputedBuffer)
buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))]
buf0.unmet_dependencies = []
buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))]
buf0.group.device = cpu
buf0.group.iteration = ((16384,), ())
buf0.sizes = ([16384], [])
buf0.aliases = ['buf2']
class buf0_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg0_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf0', get_index_1, load, None)
        return store

buf1: SchedulerNode(ComputedBuffer)
buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))]
buf1.unmet_dependencies = []
buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))]
buf1.group.device = cpu
buf1.group.iteration = ((16384,), ())
buf1.sizes = ([16384], [])
buf1.aliases = ['buf2']
class buf1_loop_body:
    var_ranges = {z0: 16384}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg1_1', get_index)
        get_index_1 = self.get_index('index0')
        store = ops.store('buf1', get_index_1, load, None)
        return store

buf2: NopKernelSchedulerNode(ConcatKernel)
buf2.writes = [StarDep(name='buf2')]
buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')]
buf2.met_dependencies = []

buf3: SchedulerNode(ComputedBuffer)
buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))]
buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,)), StarDep(name='buf2')]
buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))]
buf3.group.device = cpu
buf3.group.iteration = ((32768,), ())
buf3.sizes = ([32768], [])
buf3.mutations = ['buf2']
class buf3_loop_body:
    var_ranges = {z0: 32768}
    index0 = z0
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('buf2', get_index)
        get_index_1 = self.get_index('index0')
        load_1 = ops.load('arg2_1', get_index_1)
        add = ops.add(load, load_1)
        get_index_2 = self.get_index('index0')
        store = ops.store('buf3', get_index_2, add, None)
        return store

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92247
Approved by: https://github.com/ngimel, https://github.com/desertfire, https://github.com/jansel
2023-01-29 05:35:21 +00:00
Edward Z. Yang
025ef99ddf Get rid of dedicated inductor dynamic_shapes config (#93076)
Instead, use Dynamo dynamic_shapes config

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93076
Approved by: https://github.com/voznesenskym
2023-01-27 02:58:16 +00:00
Edward Z. Yang
5e9fa0a8fc Mark crossvit_9_240 as passing dynamic=True (#92981)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92981
Approved by: https://github.com/Chillee
2023-01-26 13:05:37 +00:00
Michael Voznesensky
d322f82b05 Add @count util to torch, use it to track benchmark stats (#93013)
<img width="1333" alt="image" src="https://user-images.githubusercontent.com/4755252/214687911-f766f072-c162-4298-9aed-c889f1375336.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93013
Approved by: https://github.com/ezyang
2023-01-26 03:09:12 +00:00
Edward Z. Yang
2ee94633a1 Change ciflow/inductor to test inductor inference with dynamic shapes (#92771)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92771
Approved by: https://github.com/voznesenskym
2023-01-25 02:21:02 +00:00
Edward Z. Yang
f724ecbd52 Add dynamic shapes aot_eager to periodic (#92770)
This means it overlaps with ciflow/inductor, but I'm about
to change that soon.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92770
Approved by: https://github.com/voznesenskym, https://github.com/albanD, https://github.com/desertfire
2023-01-25 02:21:02 +00:00
Edward Z. Yang
fb46d3e138 Run all of the timm models shards in the periodic (#92900)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92900
Approved by: https://github.com/bdhirsh, https://github.com/atalman
2023-01-24 17:56:20 +00:00
Horace He
c0327eb463 Some more inductor fixes for symbolic shapes (#92867)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92867
Approved by: https://github.com/ezyang
2023-01-24 15:05:46 +00:00
PyTorch MergeBot
2cf03bbbab Revert "Run all of the timm models shards in the periodic (#92743)"
This reverts commit de69cedf98.

Reverted https://github.com/pytorch/pytorch/pull/92743 on behalf of https://github.com/atalman due to This needs to be landed after https://github.com/pytorch/pytorch/pull/92845 and https://github.com/pytorch/pytorch/pull/92846 are landed
2023-01-23 23:44:09 +00:00
Fabio Rocha
a43b55e135 A few usability improvements for the dynamo benchmarks. (#92713)
--diff_main renamed to --diff-branch BRANCH and now works again
Summary table splits results per branch.
csv output now has column with branch name when run in this mode

Added --progress flag so you can track how many models are going to be
run.

Example output:
```
$ python benchmarks/dynamo/torchbench.py  --quiet --performance --backend inductor --float16 --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt)   --filter 'alexnet|vgg16' --progress  --diff viable/strict
Running model 1/2
batch size: 1024
cuda eval  alexnet                             dynamo_bench_diff_branch   1.251x p=0.00
cuda eval  alexnet                             viable/strict              1.251x p=0.00
Running model 2/2
batch size: 128
cuda eval  vgg16                               dynamo_bench_diff_branch   1.344x p=0.00
cuda eval  vgg16                               viable/strict              1.342x p=0.00

Summary for tag=dynamo_bench_diff_branch:
speedup             gmean=1.30x mean=1.30x
abs_latency         gmean=24.09x mean=25.26x
compilation_latency mean=2.0 seconds
compression_ratio   mean=0.9x

Summary for tag=viable/strict:
speedup             gmean=1.30x mean=1.30x
abs_latency         gmean=24.11x mean=25.29x
compilation_latency mean=0.5 seconds
compression_ratio   mean=1.0x
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92713
Approved by: https://github.com/jansel
2023-01-23 18:23:35 +00:00
Edward Z. Yang
4a3fb7bcbc Make CI_SKIPS into a consolidated dict (#92769)
This makes it easier to add more configurations without causing a
thicket of if statements selecting the correct variable.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92769
Approved by: https://github.com/voznesenskym, https://github.com/desertfire
2023-01-23 14:57:18 +00:00
Edward Z. Yang
3cfd2fa1c7 Make --inductor imply --backend inductor (#92764)
This is to make some downstream code more uniform (can always ask args.backend for backend)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92764
Approved by: https://github.com/voznesenskym, https://github.com/desertfire
2023-01-23 14:57:18 +00:00
Edward Z. Yang
c52567ec18 Switch CI exclusions to use exact match. (#92761)
Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI."

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761
Approved by: https://github.com/jansel
2023-01-22 17:10:20 +00:00
Edward Z. Yang
de69cedf98 Run all of the timm models shards in the periodic (#92743)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92743
Approved by: https://github.com/kit1980
2023-01-21 18:39:17 +00:00
Michael Voznesensky
5778c04a15 Add --timing flag, phase timing to @dynamo_timed (#92637)
Ex output:
```
 TIMING:
 entire_frame_compile:8.574629999999999
 backend_compile:5.26806
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637
Approved by: https://github.com/ezyang
2023-01-21 10:52:13 +00:00
Edward Z. Yang
27bf879b8c Forward fix: restore sebotnet33ts_256 aot_eager skip (#92741)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92741
Approved by: https://github.com/kit1980
2023-01-21 08:10:23 +00:00
Edward Z. Yang
9ad0aca6e5 Update aot_eager CI failures (#92696)
Based on https://hud.pytorch.org/pr/92689

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92696
Approved by: https://github.com/desertfire
2023-01-21 02:29:22 +00:00
PyTorch MergeBot
44132cc4b0 Revert "Add --timing flag, phase timing to @dynamo_timed (#92637)"
This reverts commit 773b513435.

Reverted https://github.com/pytorch/pytorch/pull/92637 on behalf of https://github.com/malfet due to Broke lint
2023-01-20 16:23:20 +00:00
Michael Voznesensky
773b513435 Add --timing flag, phase timing to @dynamo_timed (#92637)
Ex output:
```
 TIMING:
 entire_frame_compile:8.574629999999999
 backend_compile:5.26806
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637
Approved by: https://github.com/ezyang
2023-01-20 05:01:21 +00:00
Edward Z. Yang
44e52ea514 Reenable mobilevit_s in CI, seems to pass (#92585)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92585
Approved by: https://github.com/Chillee
2023-01-19 15:24:45 +00:00
Edward Z. Yang
b92a7afed9 Reclassify some dynamic aot_eager failures as static failures (#92376)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92376
Approved by: https://github.com/Chillee
2023-01-18 19:27:11 +00:00
Wu, Chunyuan
3aa6cec18c [dynamo] exclude reset_rng_state when measure timing (#92237)
Fixes inductor performance regression on CPU: https://github.com/pytorch/torchdynamo/issues/2027, https://github.com/pytorch/torchdynamo/issues/2028 and https://github.com/pytorch/torchdynamo/issues/2029.
The details are explained here: https://github.com/pytorch/torchdynamo/issues/2028#issuecomment-1381496678.

### Performance

- Model: lennard_jones
- Machine: IceLake (32 cores per socket)
- Configuration: single instance, 32 cores per instance
- jemalloc and iomp enabled

```bash
python benchmarks/dynamo/torchbench.py  --inductor-settings --inductor --performance --float32 -dcpu -n5000  --no-skip --dashboard --only=lennard_jones --quiet
```

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

Time before   regression | Time after regression | Time with this PR
-- | -- | --
0.00020483799744397402 | 0.0002818034990923479 | 0.00020241099991835654

</body>

</html>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92237
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-18 13:17:28 +00:00
Edward Z. Yang
fbbb19599a Update dynamic skips after #92076 (#92103)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92103
Approved by: https://github.com/voznesenskym, https://github.com/Chillee
2023-01-13 04:05:10 +00:00
Edward Z. Yang
74cbf058a5 Support --dynamic-ci-skips (#91893)
This makes it easier for us to run only the skipped benchmarks and
see if that actually started passing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91893
Approved by: https://github.com/albanD
2023-01-11 20:02:58 +00:00
Edward Z. Yang
d24324bf1d s/INDCUTOR/INDUCTOR/ (#91885)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91885
Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet
2023-01-11 12:28:19 +00:00
Edward Z. Yang
56ed976edf hrnet_w18, tts_angular works with dynamic shapes (#91891)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91891
Approved by: https://github.com/voznesenskym
2023-01-11 11:40:16 +00:00
blzheng
0c1777acec Dynamo benchmark: add CPU specific changes (#88477)
This pr adds some CPU specific changes:

- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-07 09:26:06 +00:00