pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Edward Z. Yang	c028fc4e25	Decouple PT2 dynamic shapes from the functorch setting (#94469 ) The functorch setting still exists, but now it is no longer necessary: we infer use of Python dispatcher by checking if the ambient FakeTensorMode has a ShapeEnv or not. The setting still exists, but it is for controlling direct AOTAutograd use now; for PT2, it's sufficient to use torch._dynamo.config.dynamic_shapes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469 Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel	2023-02-09 06:41:41 +00:00
PyTorch MergeBot	ca63040d2b	Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 )" This reverts commit `7bfc59993d`. Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/huydhn due to This change fails in trunk `7bfc59993d` running out of memory. Mark this as weird because it was green in PR	2023-02-09 01:24:35 +00:00
Bin Bao	7bfc59993d	Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 ) Summary: It looks like setting torch.backends.cudnn.deterministic to True is not enough for eliminating non-determinism when testing benchmarks with --accuracy, so let's turn off cudnn completely. With this change, mobilenet_v3_large does not show random failure on my local environment. Also take this chance to clean up CI skip lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363 Approved by: https://github.com/ezyang	2023-02-08 23:30:10 +00:00
Jason Ansel	eb1aca162e	Re-enable cudagraphs for benchmark scripts (#94192 ) Related to https://github.com/pytorch/pytorch/pull/93253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94192 Approved by: https://github.com/albanD, https://github.com/desertfire	2023-02-08 16:38:32 +00:00
chuanqiw	94394e568e	change the dynamo benchmark timeout as a parameter (#94284 ) Change the dynamo benchmark timeout from hard code to a parameter with default value 1200ms, cause the hard code 1200ms timeout led some single thread mode model crashed on CPU platform. With the parameter, users can specify the timeout freely. Fixes #94281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94284 Approved by: https://github.com/malfet	2023-02-08 00:45:08 +00:00
Bin Bao	db011e11ea	Skip sebotnet33ts_256 on CI (#94067 ) Summary: Random failure on CI and it happens more frequently lately. Skip for now and filed an issue at https://github.com/pytorch/pytorch/issues/94066 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94067 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-02-06 14:58:54 +00:00
Edward Z. Yang	1d53123f44	Report graph breaks separately from graph count (#94143 ) graph break != graph count - 1. Suppose you have a nested inline function call f1 to f2 to f3. A graph break in f3 results in six graphs: f1 before, f2 before, f3 before, f3 after, f2 after, f1 after. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94143 Approved by: https://github.com/voznesenskym	2023-02-05 04:03:12 +00:00
Edward Z. Yang	c1da35af5e	Update dynamic benchmark skips (#94114 ) Data from https://github.com/pytorch/pytorch/pull/94134 Signed-off-by: Edward Z. Yang <ezyangmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94114 Approved by: https://github.com/SherlockNoMad	2023-02-04 20:36:51 +00:00
Jason Ansel	e071d72f3c	Tag dynamo backends as debug/experimental (#93878 ) Hides debug/experimental backends by default. Before: ``` torch._dynamo.list_backends() ['aot_eager', 'aot_eager_decomp_partition', 'aot_torchxla_trace_once', 'aot_torchxla_trivial', 'aot_ts', 'aot_ts_nvfuser', 'cudagraphs', 'dynamo_accuracy_minifier_backend', 'dynamo_minifier_backend', 'eager', 'inductor', 'ipex', 'nvprims_aten', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'torchxla_trace_once', 'torchxla_trivial', 'ts', 'tvm'] ``` After: ``` torch._dynamo.list_backends() ['aot_ts_nvfuser', 'cudagraphs', 'inductor', 'ipex', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'tvm'] ``` Fixes https://github.com/pytorch/pytorch/issues/93733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93878 Approved by: https://github.com/voznesenskym	2023-02-04 00:50:51 +00:00
Jason Ansel	0a93e6db5a	Fix/refactor dynamo ipex backend (#93863 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93863 Approved by: https://github.com/desertfire	2023-02-03 21:42:27 +00:00
Jason Ansel	203b2cad3e	Remove fx2trt/torch2trt backends (#93822 ) These backends have been broken for some time. I tried to get them running again, but as far as I can tell they are not maintained. Installing torch_tensorrt downgrades PyTorch to 1.12. If I manually bypass that downgrade, I get import errors from inside fx2trt. Fixes that re-add these are welcome, but it might make sense to move these wrappers to the torch_tensorrt repo once PyTorch 2.0 support is added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93822 Approved by: https://github.com/frank-wei	2023-02-03 21:04:21 +00:00
Jason Ansel	a5ff40032d	Fix/refactor dynamo onnxrt backend (#93818 ) Fixes https://github.com/pytorch/pytorch/issues/90352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93818 Approved by: https://github.com/voznesenskym	2023-02-03 20:48:02 +00:00
Edward Z. Yang	2481fc0df4	Add count to FakeTensorMode.__torch_dispatch__ (#93936 ) Most calls to fake tensor never hit `FakeTensor.__torch_dispatch__` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93936 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-02-03 14:21:11 +00:00
Fabio Rocha	63115b70f0	Fixed issue with --diff-branch arg in dynamo benchmarks (#93989 ) As @peterbell10 pointed out, it was giving incorrect results for `compression_ratio` and `compression_latency` when you used `--diff-branch`. This fixes this by running a separate subprocess for each branch to make sure you are not being affected by run for other branch. Also added a couple of more significant figures to numbers in summary table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93989 Approved by: https://github.com/jansel	2023-02-03 08:36:57 +00:00
Jason Ansel	60e8c766b5	Refactor dynamo training backends (#93409 ) This splits training.py into many files and moves them from `dynamo.optimizations.training` to `dynamo.backends.*`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93409 Approved by: https://github.com/ezyang	2023-02-03 03:07:15 +00:00
atalman	6e285c479d	Remove cuda 11.6 from CI replace with 11.7 (#93406 ) Remove cuda 11.6 from CI replace with 11.7 Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406 Approved by: https://github.com/malfet, https://github.com/desertfire	2023-02-02 19:16:05 +00:00
Jason Ansel	d7b39b17ab	Remove torch/_dynamo/optimizations/{analysis,log_args}.py (#93279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93279 Approved by: https://github.com/voznesenskym	2023-02-02 02:34:36 +00:00
Edward Z. Yang	03b465a6d0	Add --iterations to benchmark script (#93858 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93858 Approved by: https://github.com/williamwen42	2023-02-01 21:56:49 +00:00
Edward Z. Yang	08041c5264	Configurable repro_tolerance for same_two_models (#93398 ) Fixes https://github.com/pytorch/pytorch/issues/93293 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93398 Approved by: https://github.com/SherlockNoMad	2023-02-01 01:41:48 +00:00
Edward Z. Yang	811e95a15e	--dynamic-ci-skips now works for all backends (#93369 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93369 Approved by: https://github.com/albanD	2023-01-31 20:07:58 +00:00
Edward Z. Yang	efee879695	Don't suppress warnings in CI. (#93269 ) Warnings are an important clue that something bad is going on. You want to see them in logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93269 Approved by: https://github.com/voznesenskym	2023-01-30 19:21:09 +00:00
Edward Z. Yang	9eb402d18e	Update dynamic benchmark skips (#93228 ) Data from https://github.com/pytorch/pytorch/pull/93223 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93228 Approved by: https://github.com/desertfire	2023-01-30 14:22:53 +00:00
XiaobingSuper	9a2becf60a	inductor: fix inplace op's wrong lowering issue when preop is NopKernel (#92247 ) For TIMM ghostnet_100, there has such case, concat+inplace_add: ``` import torch from torch._inductor import config config.debug = True torch._dynamo.config.verbose=True class MockModule(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x, y, z): out = torch.cat([x, y], dim=1) out+=z return out mod = MockModule().eval() inputs = ( torch.randn([1, 64, 16, 16]), torch.randn([1, 64, 16, 16]), torch.randn([1, 128, 16, 16]), ) ref = mod(inputs) with torch.no_grad(): opt_model = torch._dynamo.optimize('inductor')(mod) out = opt_model(inputs) out = opt_model(inputs) out = opt_model(inputs) print(torch.equal(ref, out)) ``` the inductor always get a wrong result, I find that inductor get a wrong code: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile from torch._inductor.select_algorithm import extern_kernels aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, const float* __restrict__ in_ptr1, const float* __restrict__ in_ptr2, const float* __restrict__ in_ptr3, float* __restrict__ out_ptr0, float* __restrict__ out_ptr1, float* __restrict__ out_ptr2) { { for(long i0=0; i0<1024; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 16i0); tmp0.store(out_ptr0 + 16i0); } #pragma omp simd simdlen(8) for(long i0=16384; i0<16384; i0+=1) { auto tmp0 = in_ptr0[i0]; out_ptr0[i0] = tmp0; } } { for(long i0=0; i0<1024; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 16i0); tmp0.store(out_ptr1 + 16i0); } #pragma omp simd simdlen(8) for(long i0=16384; i0<16384; i0+=1) { auto tmp0 = in_ptr1[i0]; out_ptr1[i0] = tmp0; } } { for(long i0=0; i0<2048; i0+=1) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 16i0); auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr3 + 16i0); auto tmp2 = tmp0 + tmp1; tmp2.store(out_ptr2 + 16i0); } #pragma omp simd simdlen(8) for(long i0=32768; i0<32768; i0+=1) { auto tmp0 = in_ptr2[i0]; auto tmp1 = in_ptr3[i0]; auto tmp2 = tmp0 + tmp1; out_ptr2[i0] = tmp2; } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1, arg2_1 = args args.clear() buf3 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) buf0 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1)) # alias buf1 = as_strided(buf3, (1, 64, 16, 16), (32768, 256, 16, 1), 16384) # alias buf2 = empty_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr())) del arg0_1 del arg1_1 del arg2_1 return (buf3, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32) arg1_1 = rand_strided((1, 64, 16, 16), (16384, 256, 16, 1), device='cpu', dtype=torch.float32) arg2_1 = rand_strided((1, 128, 16, 16), (32768, 256, 16, 1), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) ``` you can see that the add operation always adds a random value, see the ir code: 1. ir_pre_fusion.txt* ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf3'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf3'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` 2. ir_post_fusion.txt ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf3'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf3'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,))] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` From the ir code, you can see the buf3 always adds an empty buf2 which has never been written. The root cause is that there has a potential issue when doing the mutation for inplace add when its' input is a NopKernel. After this PR, the ir will be like(ir_pre_fusion.txt): ``` buf0: SchedulerNode(ComputedBuffer) buf0.writes = [MemoryDep(name='buf0', index=c0, size=(16384,))] buf0.unmet_dependencies = [] buf0.met_dependencies = [MemoryDep(name='arg0_1', index=c0, size=(16384,))] buf0.group.device = cpu buf0.group.iteration = ((16384,), ()) buf0.sizes = ([16384], []) buf0.aliases = ['buf2'] class buf0_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg0_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf0', get_index_1, load, None) return store buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep(name='buf1', index=c0, size=(16384,))] buf1.unmet_dependencies = [] buf1.met_dependencies = [MemoryDep(name='arg1_1', index=c0, size=(16384,))] buf1.group.device = cpu buf1.group.iteration = ((16384,), ()) buf1.sizes = ([16384], []) buf1.aliases = ['buf2'] class buf1_loop_body: var_ranges = {z0: 16384} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('arg1_1', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store buf2: NopKernelSchedulerNode(ConcatKernel) buf2.writes = [StarDep(name='buf2')] buf2.unmet_dependencies = [StarDep(name='buf0'), StarDep(name='buf1')] buf2.met_dependencies = [] buf3: SchedulerNode(ComputedBuffer) buf3.writes = [MemoryDep(name='buf3', index=c0, size=(32768,))] buf3.unmet_dependencies = [MemoryDep(name='buf2', index=c0, size=(32768,)), StarDep(name='buf2')] buf3.met_dependencies = [MemoryDep(name='arg2_1', index=c0, size=(32768,))] buf3.group.device = cpu buf3.group.iteration = ((32768,), ()) buf3.sizes = ([32768], []) buf3.mutations = ['buf2'] class buf3_loop_body: var_ranges = {z0: 32768} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf2', get_index) get_index_1 = self.get_index('index0') load_1 = ops.load('arg2_1', get_index_1) add = ops.add(load, load_1) get_index_2 = self.get_index('index0') store = ops.store('buf3', get_index_2, add, None) return store ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92247 Approved by: https://github.com/ngimel, https://github.com/desertfire, https://github.com/jansel	2023-01-29 05:35:21 +00:00
Edward Z. Yang	025ef99ddf	Get rid of dedicated inductor dynamic_shapes config (#93076 ) Instead, use Dynamo dynamic_shapes config Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93076 Approved by: https://github.com/voznesenskym	2023-01-27 02:58:16 +00:00
Edward Z. Yang	5e9fa0a8fc	Mark crossvit_9_240 as passing dynamic=True (#92981 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92981 Approved by: https://github.com/Chillee	2023-01-26 13:05:37 +00:00
Michael Voznesensky	d322f82b05	Add @count util to torch, use it to track benchmark stats (#93013 ) <img width="1333" alt="image" src="https://user-images.githubusercontent.com/4755252/214687911-f766f072-c162-4298-9aed-c889f1375336.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93013 Approved by: https://github.com/ezyang	2023-01-26 03:09:12 +00:00
Edward Z. Yang	2ee94633a1	Change ciflow/inductor to test inductor inference with dynamic shapes (#92771 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92771 Approved by: https://github.com/voznesenskym	2023-01-25 02:21:02 +00:00
Edward Z. Yang	f724ecbd52	Add dynamic shapes aot_eager to periodic (#92770 ) This means it overlaps with ciflow/inductor, but I'm about to change that soon. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92770 Approved by: https://github.com/voznesenskym, https://github.com/albanD, https://github.com/desertfire	2023-01-25 02:21:02 +00:00
Edward Z. Yang	fb46d3e138	Run all of the timm models shards in the periodic (#92900 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92900 Approved by: https://github.com/bdhirsh, https://github.com/atalman	2023-01-24 17:56:20 +00:00
Horace He	c0327eb463	Some more inductor fixes for symbolic shapes (#92867 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92867 Approved by: https://github.com/ezyang	2023-01-24 15:05:46 +00:00
PyTorch MergeBot	2cf03bbbab	Revert "Run all of the timm models shards in the periodic (#92743 )" This reverts commit `de69cedf98`. Reverted https://github.com/pytorch/pytorch/pull/92743 on behalf of https://github.com/atalman due to This needs to be landed after https://github.com/pytorch/pytorch/pull/92845 and https://github.com/pytorch/pytorch/pull/92846 are landed	2023-01-23 23:44:09 +00:00
Fabio Rocha	a43b55e135	A few usability improvements for the dynamo benchmarks. (#92713 ) --diff_main renamed to --diff-branch BRANCH and now works again Summary table splits results per branch. csv output now has column with branch name when run in this mode Added --progress flag so you can track how many models are going to be run. Example output: ``` $ python benchmarks/dynamo/torchbench.py --quiet --performance --backend inductor --float16 --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --filter 'alexnet\|vgg16' --progress --diff viable/strict Running model 1/2 batch size: 1024 cuda eval alexnet dynamo_bench_diff_branch 1.251x p=0.00 cuda eval alexnet viable/strict 1.251x p=0.00 Running model 2/2 batch size: 128 cuda eval vgg16 dynamo_bench_diff_branch 1.344x p=0.00 cuda eval vgg16 viable/strict 1.342x p=0.00 Summary for tag=dynamo_bench_diff_branch: speedup gmean=1.30x mean=1.30x abs_latency gmean=24.09x mean=25.26x compilation_latency mean=2.0 seconds compression_ratio mean=0.9x Summary for tag=viable/strict: speedup gmean=1.30x mean=1.30x abs_latency gmean=24.11x mean=25.29x compilation_latency mean=0.5 seconds compression_ratio mean=1.0x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92713 Approved by: https://github.com/jansel	2023-01-23 18:23:35 +00:00
Edward Z. Yang	4a3fb7bcbc	Make CI_SKIPS into a consolidated dict (#92769 ) This makes it easier to add more configurations without causing a thicket of if statements selecting the correct variable. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92769 Approved by: https://github.com/voznesenskym, https://github.com/desertfire	2023-01-23 14:57:18 +00:00
Edward Z. Yang	3cfd2fa1c7	Make --inductor imply --backend inductor (#92764 ) This is to make some downstream code more uniform (can always ask args.backend for backend) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92764 Approved by: https://github.com/voznesenskym, https://github.com/desertfire	2023-01-23 14:57:18 +00:00
Edward Z. Yang	c52567ec18	Switch CI exclusions to use exact match. (#92761 ) Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI." Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761 Approved by: https://github.com/jansel	2023-01-22 17:10:20 +00:00
Edward Z. Yang	de69cedf98	Run all of the timm models shards in the periodic (#92743 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92743 Approved by: https://github.com/kit1980	2023-01-21 18:39:17 +00:00
Michael Voznesensky	5778c04a15	Add `--timing` flag, phase timing to @dynamo_timed (#92637 ) Ex output: ``` TIMING: entire_frame_compile:8.574629999999999 backend_compile:5.26806 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637 Approved by: https://github.com/ezyang	2023-01-21 10:52:13 +00:00
Edward Z. Yang	27bf879b8c	Forward fix: restore sebotnet33ts_256 aot_eager skip (#92741 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92741 Approved by: https://github.com/kit1980	2023-01-21 08:10:23 +00:00
Edward Z. Yang	9ad0aca6e5	Update aot_eager CI failures (#92696 ) Based on https://hud.pytorch.org/pr/92689 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92696 Approved by: https://github.com/desertfire	2023-01-21 02:29:22 +00:00
PyTorch MergeBot	44132cc4b0	Revert "Add `--timing` flag, phase timing to @dynamo_timed (#92637 )" This reverts commit `773b513435`. Reverted https://github.com/pytorch/pytorch/pull/92637 on behalf of https://github.com/malfet due to Broke lint	2023-01-20 16:23:20 +00:00
Michael Voznesensky	773b513435	Add `--timing` flag, phase timing to @dynamo_timed (#92637 ) Ex output: ``` TIMING: entire_frame_compile:8.574629999999999 backend_compile:5.26806 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637 Approved by: https://github.com/ezyang	2023-01-20 05:01:21 +00:00
Edward Z. Yang	44e52ea514	Reenable mobilevit_s in CI, seems to pass (#92585 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92585 Approved by: https://github.com/Chillee	2023-01-19 15:24:45 +00:00
Edward Z. Yang	b92a7afed9	Reclassify some dynamic aot_eager failures as static failures (#92376 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92376 Approved by: https://github.com/Chillee	2023-01-18 19:27:11 +00:00
Wu, Chunyuan	3aa6cec18c	[dynamo] exclude reset_rng_state when measure timing (#92237 ) Fixes inductor performance regression on CPU: https://github.com/pytorch/torchdynamo/issues/2027, https://github.com/pytorch/torchdynamo/issues/2028 and https://github.com/pytorch/torchdynamo/issues/2029. The details are explained here: https://github.com/pytorch/torchdynamo/issues/2028#issuecomment-1381496678. ### Performance - Model: lennard_jones - Machine: IceLake (32 cores per socket) - Configuration: single instance, 32 cores per instance - jemalloc and iomp enabled ```bash python benchmarks/dynamo/torchbench.py --inductor-settings --inductor --performance --float32 -dcpu -n5000 --no-skip --dashboard --only=lennard_jones --quiet ``` <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> Time before regression \| Time after regression \| Time with this PR -- \| -- \| -- 0.00020483799744397402 \| 0.0002818034990923479 \| 0.00020241099991835654 </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92237 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-18 13:17:28 +00:00
Edward Z. Yang	fbbb19599a	Update dynamic skips after #92076 (#92103 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92103 Approved by: https://github.com/voznesenskym, https://github.com/Chillee	2023-01-13 04:05:10 +00:00
Edward Z. Yang	74cbf058a5	Support --dynamic-ci-skips (#91893 ) This makes it easier for us to run only the skipped benchmarks and see if that actually started passing. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91893 Approved by: https://github.com/albanD	2023-01-11 20:02:58 +00:00
Edward Z. Yang	d24324bf1d	s/INDCUTOR/INDUCTOR/ (#91885 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91885 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet	2023-01-11 12:28:19 +00:00
Edward Z. Yang	56ed976edf	hrnet_w18, tts_angular works with dynamic shapes (#91891 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91891 Approved by: https://github.com/voznesenskym	2023-01-11 11:40:16 +00:00
blzheng	0c1777acec	Dynamo benchmark: add CPU specific changes (#88477 ) This pr adds some CPU specific changes: - Add support for IPEX backend - https://github.com/pytorch/torchdynamo/issues/1618 - https://github.com/pytorch/torchdynamo/issues/1534 - Enable CPU launcher in runner.py. - Fix the issue that some environment variables are not support on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-07 09:26:06 +00:00

1 2 3

118 Commits