Huy Do
5315317b7b
Skip some detectron2_maskrcnn models with KeyError _ignore_torch_cuda_oom ( #99599 )
...
These tests are failing in trunk 233cc34d3b with `KeyError: '_ignore_torch_cuda_oom'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99599
Approved by: https://github.com/malfet
2023-04-20 18:11:35 +00:00
Jason Ansel
3233450d07
Add TorchXLA option to benchmark runner ( #99505 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99505
Approved by: https://github.com/voznesenskym
2023-04-19 22:44:52 +00:00
Will Constable
9ac2b041c9
Make opacus xfail instead of skip ( #99380 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380
Approved by: https://github.com/desertfire , https://github.com/anijain2305
2023-04-19 21:09:06 +00:00
Michael Voznesensky
113bd11cf4
Skip levit ( #99491 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99491
Approved by: https://github.com/ezyang
2023-04-19 07:41:42 +00:00
Edward Z. Yang
039faf0dbf
Add invariant that all symbolic shapes must be bound in graph ( #99089 )
...
Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards.
With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well.
This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089
Approved by: https://github.com/voznesenskym
2023-04-16 01:48:19 +00:00
Yanbo Liang
15fe5a0798
[Dynamo] Fix benchmark --verbose error ( #99224 )
...
Dynamo benchmark --verbose is broken:
```
Traceback (most recent call last):
File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 400, in <module>
torchbench_main()
File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 396, in torchbench_main
main(TorchBenchmarkRunner(), original_dir)
File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 1967, in main
return maybe_fresh_cache(
File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 993, in inner
return fn(*args, **kwargs)
File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 2135, in run
torch._dynamo.config.log_level = logging.DEBUG
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/config_utils.py", line 67, in __setattr__
raise AttributeError(f"{self.__name__}.{name} does not exist")
AttributeError: torch._dynamo.config.log_level does not exist
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99224
Approved by: https://github.com/voznesenskym
2023-04-15 20:18:50 +00:00
Bin Bao
34f681c13b
[CI] Remove inductor skip list for timm_models ( #98840 )
...
Summary: check against the expected csv file instead of skipping tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840
Approved by: https://github.com/ezyang
2023-04-15 13:54:41 +00:00
Bin Bao
e5501a967e
[inductor] Support IndexPutFallback in cpp_wrapper ( #98972 )
...
Summary:
1) Make the fallback index_put generate the right cpp code in cpp_wapper
2) Add a --cpp-wrapper option to common.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98972
Approved by: https://github.com/jgong5 , https://github.com/jansel
2023-04-13 15:41:03 +00:00
Edward Z. Yang
b8b840be3d
Convert logging f-strings to use % format, part five ( #98765 )
...
This does some annoying but simple cases by hand.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
Edward Z. Yang
b09722f540
Convert logging f-strings to use % format, part two ( #98700 )
...
This hits multi-line logging strings
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
9a8f71f23e
Convert logging f-strings to use % format ( #98697 )
...
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
bdb79a8f52
Turn off divisible_by_16 for dynamic shapes; support ablation ( #98471 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98471
Approved by: https://github.com/ngimel , https://github.com/voznesenskym
2023-04-06 12:57:07 +00:00
Edward Z. Yang
cf1bfca2ba
Require batch dimensions to be compiled dynamically ( #98334 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98334
Approved by: https://github.com/voznesenskym
2023-04-05 19:40:22 +00:00
Edward Z. Yang
b923f84805
Switch accuracy CI to dynamic batch only ( #98307 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98307
Approved by: https://github.com/wconstab
2023-04-05 01:20:12 +00:00
Elias Ellison
a3365e1d0d
Increment pending forwards after invocation ( #98101 )
...
Forwards are only pending following invocation, not before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98101
Approved by: https://github.com/ngimel
2023-04-05 00:04:39 +00:00
Bin Bao
69ff39d2e7
Skip gat, gcn and sage for TorchBench CUDA test ( #98244 )
...
Summary: The three models only support CPU for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244
Approved by: https://github.com/ezyang
2023-04-04 01:06:18 +00:00
Jason Ansel
55afaa46a4
Support functools.partial and itertools.product ( #98120 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98120
Approved by: https://github.com/anijain2305
2023-04-03 18:23:25 +00:00
Bin Bao
ba7ee00f00
Add a --inference flag to dynamo benchmark script ( #98173 )
...
Summary: When calling benchmark scripts, make it a requirement to pass
--inference or --training
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98173
Approved by: https://github.com/huydhn
2023-04-03 17:12:28 +00:00
Jason Ansel
92b46202ef
Add --stats option to benchmark scripts ( #98109 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98109
Approved by: https://github.com/anijain2305
2023-04-02 02:23:13 +00:00
Edward Z. Yang
5df59f957f
Fix G001,G002,G003 in logs to % syntax ( #97812 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812
Approved by: https://github.com/Skylion007 , https://github.com/kiukchung , https://github.com/malfet , https://github.com/mlazos
2023-04-01 01:43:33 +00:00
Bin Bao
c699ac17df
[CI] Bump up torchbench version to fix dynamo graph breaks in transformers ( #98003 )
...
Summary: When we bump up the torchbench version pin last time, we found
there were new graph breaks introduced with the trasformers version
upgrade, see https://github.com/pytorch/pytorch/pull/96782 . Turns out
they are already fixed upstream, see
https://github.com/huggingface/transformers/pull/21648 and https://github.com/pytorch/benchmark/pull/1511
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98003
Approved by: https://github.com/ngimel
2023-03-31 16:52:09 +00:00
Edward Z. Yang
97fc8ea5f4
Run the benchmark suite with dynamic batch only ( #97912 )
...
Symbolic shapes compile time on full CI with inductor is horribly long (even though our aot_eager local runs seemed to suggest that the added latency was only 10s per model.) To patch over the problem for now, run the benchmark suite with dynamic batch only. This should absolve a lot of sins.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97912
Approved by: https://github.com/janeyx99 , https://github.com/desertfire
2023-03-30 18:04:48 +00:00
Aaron Gokaslan
47dca20d80
[BE] Enable flake8-comprehension rule C417 ( #97880 )
...
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang , https://github.com/kit1980 , https://github.com/albanD
2023-03-30 14:34:24 +00:00
William Wen
b93e1f377e
[dynamo, benchmarks] Add inductor-mode (for max-autotune) and warm start options to dynamo benchmarks ( #97719 )
...
Title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97719
Approved by: https://github.com/shunting314
2023-03-29 21:09:00 +00:00
Edward Z. Yang
f754be897a
Disable speedup_experiment_ds ( #97806 )
...
It seems to be broken.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97806
Approved by: https://github.com/jansel
2023-03-29 01:27:31 +00:00
Bin Bao
a9a81ab7e3
[CI] Run benchmark test with dynamo_eager in periodic ( #97543 )
...
Summary: The idea is to catch any dynamo_eager regression earlier, and also
we can take that off the dashboard run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97543
Approved by: https://github.com/huydhn
2023-03-28 01:02:49 +00:00
Shunting Zhang
652592efa9
[inductor] use torch.prifiler in the triton wrapper ( #97405 )
...
I think it's helpful to use torch.profiler to profile the triton wrapper.
E.g., I tried it for nvidia_deeprecommender's infernece graph.
Even with max-autotune, we see the majority of the time the GPU is running 2 mm/addmm op. That's why max autotune does not help for this model since tuning does not affect the external mm ops.
<img width="711" alt="Screenshot 2023-03-22 at 5 49 28 PM" src="https://user-images.githubusercontent.com/52589240/227072474-2f0d7205-4a10-4929-b1b7-551214788c61.png ">
next step I'll check why the triton mm kernels are not picked.
EDIT: the above screenshot is captured without max-autotune due to a typo. below is the trace with max-autotune enabled:
<img width="712" alt="Screenshot 2023-03-22 at 6 43 26 PM" src="https://user-images.githubusercontent.com/52589240/227077624-fdccf928-be08-4211-871b-a9e3d7b76fbe.png ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97405
Approved by: https://github.com/ngimel
2023-03-27 21:54:25 +00:00
Edward Z. Yang
cff4826f28
pytorch_unet is now passing ( #97309 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97309
Approved by: https://github.com/janeyx99 , https://github.com/zou3519
2023-03-22 13:55:05 +00:00
Bin Bao
be49d3b170
[CI] Turn on debug logging for dla102 and gernet_l ( #97307 )
...
Summary: Log the generated code for those two flaky tests to see if
there is any codegen difference when they fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97307
Approved by: https://github.com/ezyang
2023-03-22 13:42:13 +00:00
Natalia Gimelshein
e7d9331688
[inductor] hoist symbolic padding expressions ( #97099 )
...
Towards fixing pnasnet5large, see #96709 . The generated kernel looks much better
```
@pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32', 4: 'i32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': [], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 6), equal_to_1=())]})
@triton.jit
def triton_(in_ptr0, out_ptr0, ks0, ks1, ks2, ks3, xnumel, XBLOCK : tl.constexpr):
xoffset = tl.program_id(0) * XBLOCK
xindex = xoffset + tl.arange(0, XBLOCK)[:]
xmask = xindex < xnumel
x1 = (xindex // ks0) % ks0
x0 = xindex % ks0
x2 = (xindex // ks3)
x4 = xindex
tmp0 = x1 + ((-1)*ks1)
tmp1 = 0
tmp2 = tmp0 >= tmp1
tmp3 = ks2
tmp4 = tmp0 < tmp3
tmp5 = x0 + ((-1)*ks1)
tmp6 = tmp5 >= tmp1
tmp7 = tmp5 < tmp3
tmp8 = tmp2 & tmp4
tmp9 = tmp8 & tmp6
tmp10 = tmp9 & tmp7
tmp11 = tl.load(in_ptr0 + (x0 + ((-1)*ks1) + (ks2*x1) + (x2*(ks2*ks2)) + ((-1)*ks1*ks2) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
tmp12 = tl.where(tmp10, tmp11, 0.0)
tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
```
Interestingly, removing `expand` in in index `simplify` function makes `load` expression a little bit better, but `store` fails to simplify to flat store in this case, so I'm leaving `expand` in.
Full pnasnet still chokes on `ceiling` in batch_norm kernels, additionally, it looks like shape propagation goofs in inductor and generates overly complicated expressions, we should switch to meta data from fx graph.
I'm still not adding `ceil` print to triton, because we should be able to hoist all indexing expression (and just printing ceil without converting to int64 doesn't work)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97099
Approved by: https://github.com/jansel
2023-03-21 21:43:32 +00:00
Edward Z. Yang
e74c5e5637
rexnet_100 is disabled for static, does not need dynamic listing ( #97100 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97100
Approved by: https://github.com/Skylion007
2023-03-19 20:57:49 +00:00
Bin Bao
577d930c39
[CI] Revert https://github.com/pytorch/pytorch/pull/96195 ( #96897 )
...
Summary: https://github.com/pytorch/pytorch/pull/96195 was an experiment
for debugging flaky failures on CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96897
Approved by: https://github.com/ngimel
2023-03-16 06:28:18 +00:00
Edward Z. Yang
3606f59366
Default specialize_int to False ( #96624 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-16 02:54:18 +00:00
Will Constable
54cd4a67d0
Output peak memory stats from dynamo torchbench perf CI ( #95666 )
...
Adds absolute memory usage numbers (in addition to compression ratio) to performance jobs.
Example output:
<img width="1211" alt="image" src="https://user-images.githubusercontent.com/4984825/225419950-500908c5-00ce-4711-afa2-c995bf90d35d.png ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95666
Approved by: https://github.com/ezyang , https://github.com/williamwen42
2023-03-15 19:24:47 +00:00
Bin Bao
33c7be360f
[reland][CI] switch torchbench to a pinned version ( #96782 )
...
Summary: This is reland of https://github.com/pytorch/pytorch/pull/96553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96782
Approved by: https://github.com/huydhn
2023-03-15 12:46:36 +00:00
Edward Z. Yang
037acd5a22
Update CI skips ( #96745 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96745
Approved by: https://github.com/wconstab
2023-03-14 22:19:10 +00:00
PyTorch MergeBot
be4eaa69c2
Revert "[CI] switch torchbench to a pinned version ( #96553 )"
...
This reverts commit 61d6ccd29a .
Reverted https://github.com/pytorch/pytorch/pull/96553 on behalf of https://github.com/desertfire due to land race
2023-03-14 21:39:45 +00:00
PyTorch MergeBot
ba4fb9b6ad
Revert "Default specialize_int to False ( #96624 )"
...
This reverts commit 1ac8782db2 .
Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py
2023-03-14 19:43:47 +00:00
Bin Bao
61d6ccd29a
[CI] switch torchbench to a pinned version ( #96553 )
...
Summary: Previously we were using a branch on torchbench which skips
torchaudio. We should switch to make sure a good test coverage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96553
Approved by: https://github.com/huydhn , https://github.com/ezyang
2023-03-14 18:42:22 +00:00
Edward Z. Yang
1ac8782db2
Default specialize_int to False ( #96624 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-14 18:37:47 +00:00
David Berard
6e3d51b08a
[inductor][CI] also skip rexnet_100 on non-dynamic shapes ( #96691 )
...
Recent failures show rexnet_100 accuracy is flaky also on non-dynamic shapes (was already disabled for dynamic shapes in #96474 ). The failure occurs for the same reason (stem.bn.weight.grad).
e.g. https://github.com/pytorch/pytorch/actions/runs/4402868441/jobs/7710977874
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96691
Approved by: https://github.com/desertfire
2023-03-14 18:11:59 +00:00
Edward Z. Yang
ff7e510d1e
Correctly use PythonPrinter for generating wrapper code referencing sympy ( #96710 )
...
Otherwise you get stuff like ceiling(s0) which is not valid Python code. Fixes volo_d1_224
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96710
Approved by: https://github.com/ngimel , https://github.com/jansel
2023-03-14 14:35:52 +00:00
Wang, Eikan
3cad8d23d0
[Inductor] Skip the hf_T5_base due to intermittent failure on CI ( #96649 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96649
Approved by: https://github.com/desertfire
2023-03-14 07:40:20 +00:00
Edward Z. Yang
507feb805f
Don't specialize torch.Size with specialize_int = False ( #96419 )
...
Fixes https://github.com/pytorch/pytorch/issues/95868
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96419
Approved by: https://github.com/jansel , https://github.com/ngimel
2023-03-14 01:32:58 +00:00
Edward Z. Yang
c7f39c0820
Update CI skips ( #96554 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96554
Approved by: https://github.com/janeyx99
2023-03-13 13:40:45 +00:00
David Berard
29cd60dfb7
[CI] handle more dynamo benchmark models that are not expected to be deterministic ( #96324 )
...
Follow-up to #96245 . alexnet, Background_Matting, vision_maskrcnn, and vgg16 all have the same problem; but on float32 they were also failing on the previous day so I missed this. Once the amp jobs became available I could see that these have the same issue (on both float32 and amp).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96324
Approved by: https://github.com/desertfire
2023-03-10 18:15:34 +00:00
Bin Bao
a651e6253a
[CI] Change compile_threads to 1 when running benchmark accuracy test on CI ( #96195 )
...
Summary: This is not a pretty solution, but it a way to verify if the flakiness is coming from parallel compilation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96195
Approved by: https://github.com/ngimel
2023-03-10 17:39:38 +00:00
Edward Z. Yang
ff2e14f200
Skip rexnet_100 in dynamic CI ( #96474 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96474
Approved by: https://github.com/yanboliang , https://github.com/msaroufim
2023-03-10 01:23:19 +00:00
Edward Z. Yang
c988de1040
[EASY] Update inductor training dynamic skips ( #96298 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96298
Approved by: https://github.com/Chillee , https://github.com/janeyx99
2023-03-08 19:31:46 +00:00
Bin Bao
b3a079810e
[CI] Add a workflow for quick perf comparison ( #96166 )
...
Summary: ciflow/inductor-perf-test-nightly now contains full dashboard
run which takes a very long time. Ed proposed a simplification of the
perf run there, but it is still worth to have a set of fast perf test
which only includes one configuration (--training --amp).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96166
Approved by: https://github.com/huydhn , https://github.com/weiwangmeta
2023-03-08 19:09:04 +00:00