Commit Graph

1332 Commits

Author SHA1 Message Date
xndcn
bebb8b7c1e [inductor] use native fetch_add function for trivial types (#101931)
floating-point is supported by std::atomic::fetch_add since C++20.
However, this code path is not activated yet because cpp_flags in codecache.py is hard-coded to "-std=c++17"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101931
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-06-01 03:47:56 +00:00
Animesh Jain
65631d4515 [benchmarks] Use train mode for accuracy checks for HF models (#102578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102578
Approved by: https://github.com/desertfire
2023-05-31 19:47:18 +00:00
Bin Bao
47b884a74c [inductor] Revert a CI remedy for Triton compilation error (#102541)
Summary: revert https://github.com/pytorch/pytorch/pull/91634

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102541
Approved by: https://github.com/ngimel
2023-05-31 13:13:51 +00:00
Animesh Jain
33a49eeae7 [benchmark] Flag to switch on activation checkpointing for HF models (#102557)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557
Approved by: https://github.com/ngimel, https://github.com/Chillee
2023-05-30 23:46:14 +00:00
Yanbo Liang
9ff1932d2b [Dynamo] Save global autocast state to restore on graph break (#102415)
Fixes #102414

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102415
Approved by: https://github.com/yf225
2023-05-30 23:03:21 +00:00
Horace He
e71ab21422 update triton pin (#101919)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101919
Approved by: https://github.com/ngimel
2023-05-30 17:16:05 +00:00
Animesh Jain
040d2cc969 [dynamo] Some torchrec_dlrm related fixes (#101953)
Issue 1 of https://github.com/pytorch/pytorch/issues/101918

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101953
Approved by: https://github.com/jansel
2023-05-28 17:56:08 +00:00
William Wen
3c77310752 fix benchmarks/dynamo/runner.py (#102311)
Benchmark performance csv's can now contain `infra_error` strings, leading to failed parses. Fix by converting strings in data to 0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102311
Approved by: https://github.com/yanboliang
2023-05-25 22:42:03 +00:00
Bin Bao
ee33bae5c7 Fix an issue where checking sameness throw an exception (#102279)
Summary: currently the exception is caught by outside and marked as
infra_error

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102279
Approved by: https://github.com/anijain2305
2023-05-25 19:49:23 +00:00
Jongsoo Park
b91eb97d34 [transformer benchmark] relax tolerance in sdp.py (#101965)
Summary:
Otherwise we get
```
Traceback (most recent call last):
  File "<string>", line 49, in <module>
  File "<string>", line 47, in __run
  File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp__/sdp#link-tree/caffe2/benchmarks/transformer/sdp.py", line 346, in <module>
    main(save_path)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp__/sdp#link-tree/caffe2/benchmarks/transformer/sdp.py", line 328, in main
    experiment = run_single_experiment(experiment_config)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp__/sdp#link-tree/caffe2/benchmarks/transformer/sdp.py", line 229, in run_single_experiment
    assert_close_tensors(nn_mha_output, composite_mha_output)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp__/sdp#link-tree/caffe2/benchmarks/transformer/sdp.py", line 196, in assert_close_tensors
    assert torch.allclose(a, b, atol=1e-3, rtol=1e-3)
AssertionError
```

Test Plan: buck run mode/dev-nosan //caffe2/benchmarks/transformer:sdp

Differential Revision: D45843836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101965
Approved by: https://github.com/drisspg
2023-05-23 06:54:08 +00:00
Jason Ansel
5ba16011d7 Suppress profiler spam in dynamo benchmarks (#101942)
Makes this stuff go away:
```
STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:311] Completed Stage: Warm Up
STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:317] Completed Stage: Collection
STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:321] Completed Stage: Post Processing
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101942
Approved by: https://github.com/shunting314, https://github.com/desertfire
2023-05-22 18:32:31 +00:00
Edward Z. Yang
22ca1a1124 Partially fix shape mismatch in vision_maskrcnn (#101477)
The bulk of the heavy lifting is happening in
https://github.com/pytorch/vision/pull/7592

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101477
Approved by: https://github.com/voznesenskym
2023-05-21 05:20:08 +00:00
PyTorch MergeBot
7f3fed125e Revert "separate out dynamo .requires_grad and .is_grad_enabled guards (#100570)"
This reverts commit 1fabee399d.

Reverted https://github.com/pytorch/pytorch/pull/100570 on behalf of https://github.com/PaliC due to breaking inductor tests along with #101219 ([comment](https://github.com/pytorch/pytorch/pull/100570#issuecomment-1555271267))
2023-05-19 21:29:09 +00:00
Elias Ellison
e5e451a9db Update batch size for a couple models (#101837)
The memory compression for these models is at parity, but because we interleave timings between torch.compile and eager run memory is duplicated between between eager and cudagraphs pool and causes OOM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101837
Approved by: https://github.com/anijain2305
2023-05-19 19:09:59 +00:00
Brian Hirsh
1fabee399d separate out dynamo .requires_grad and .is_grad_enabled guards (#100570)
Fixes https://github.com/pytorch/pytorch/issues/100977

This will hopefully fix this error (from [issue](https://github.com/pytorch/pytorch/issues/99616))

This PR fixes an internal model: we were running an inductor inference graph, but `torch.is_grad_enabled()` was True, causing us to error inside of the inference graph when we encountered an out= operator.

I haven't been able to create a smaller repro - before landing this, I want to create a smaller repro to convince myself of why we need to separate out these guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100570
Approved by: https://github.com/ezyang
2023-05-19 16:14:56 +00:00
Michael Voznesensky
4c1bc91f42 Support autograd.Function w/ grad (#99483)
This PR adds support for tracing autograd.Function with grad.

A few important bullet points outlining our approach:

1) Our goal is to verify soundness in order to add a call_function to the autograd.Function's `apply` to the graph.
2) We achieve (1) by either verifying soundness or rejecting soundness, by ensuring that both forward and backward of the autograd.Function are sound.
3) For the forward, if we verify soundness, we install its guards into the graph.
4) For the backward, if we verify soundness, we throw it out. However, backwards soundness verification is more onerous, and has a config driven set of banned attrs and methods for tensors.

1-4 above are achieved by turning the forward and backward into UserDefinedFunctionVariables, and inlining through them, relying on dynamo's soundness detection. If we graph break in these, we raise and treat them as unsound. As noted above, backwards is stricter yet.

For the tracing, the safety comes from dynamo's HigherOrderOperator system. That system ensures that not only do we trace soundly, but that no new variables are lifted into inputs during the tracing, and that the forward and backwards are entirely self contained.

Whenever we reject a function as unsound, we restore back, as usual.

Due to some limitations in the lifting logic, we have an escape hatch we implemented for tensors that are known in forward, but cross into backwards through save_tensors (save) /saved_tensors (load). We escape hatch here to avoid having the known saved tensors coming from forward end up being accidentally treated as lifted variables (and rejected). This is sound, but a little hacky feeling.

Additionally, due to some limitations in fx node removal, combined with how we produce subgraphs for the traces installed from HigherOrderOperators, we had to improve our node removal logic. In the event of a restore, we remove the old nodes from the graph, as usual in dynamo. However, because the references to these nodes may exist in subgraphs, we traverse any nodes users and remove them first if and only if they are in another graph. This is always sound, because removal should only be downstream of restoration at this point.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99483
Approved by: https://github.com/zou3519
2023-05-19 01:26:21 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Animesh Jain
794cc3952e adding moco to CI (#101098)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101098
Approved by: https://github.com/desertfire
2023-05-18 10:01:49 +00:00
chuanqiw
b315c9b5ab [CI] Enlarge memory for OOM models in inductor cpu HF accuracy test (#101395)
Change the Inductor CPU HF accuracy test node from `linux.4xlarge` (32GB) to `linux.24xlarge` (192GB) to enlarge the node memory. Also add 3 HF models back to CI test.

Fixes #101390

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101395
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/huydhn
2023-05-18 09:23:30 +00:00
Animesh Jain
dafa009c3c [dynamo][moco] Save global torch state to restore on graph break (#101201)
This is relevant to  https://github.com/pytorch/pytorch/pull/100570 as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101201
Approved by: https://github.com/voznesenskym
2023-05-18 01:03:15 +00:00
Peter Bell
ef512db0f8 [inductor] Constant and index_expr propagation pass (#101077)
This pass does a limited form of constant propagation, as well as propagation of
sympy indexing expressions. For example, say you have the function:
```python
def flip(x):
    i = torch.arange(x.size(0) - 1, -1, -1, device=x.device)
    return x[i]
```

On current main this results in indirect indexing:
```python
class buf0_loop_body:
    var_ranges = {z0: 4, z1: 3}
    index0 = 3 - z0
    index1 = 3*indirect0 + z1
    index2 = 3*z0 + z1
    def body(self, ops):
        get_index = self.get_index('index0')
        index_expr = ops.index_expr(get_index, torch.int64)
        set_indirect0 = self.set_indirect0(index_expr)
        get_index_1 = self.get_index('index1')
        load = ops.load('arg0_1', get_index_1)
        get_index_2 = self.get_index('index2')
        store = ops.store('buf0', get_index_2, load, None)
        return store
```

With this PR the indexing is propagated through the computation and into direct
indexing:

```python
class buf0_loop_body:
    var_ranges = {z0: 4, z1: 3}
    index0 = -3*z0 + z1 + 9
    index1 = 3*z0 + z1
    def body(self, ops):
        get_index = self.get_index('index0')
        load = ops.load('arg0_1', get_index)
        get_index_1 = self.get_index('index1')
        store = ops.store('buf0', get_index_1, load, None)
        return store
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101077
Approved by: https://github.com/lezcano, https://github.com/ngimel
2023-05-17 23:36:24 +00:00
Jongsoo Park
ebae77e891 [transformer benchmark] sort by cuda time (#101349)
Summary: The benchmark is running on CUDA

Test Plan: buck run mode/opt //caffe2/benchmarks/transformer:sdp_backwards

Differential Revision: D45843837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101349
Approved by: https://github.com/drisspg
2023-05-17 15:38:56 +00:00
Jason Ansel
403ce1a1c9 Fix benchmark model names printouts with tqdm (#101627)
With the TQDM changes in #100969 -- the models names ended up getting hidden from the benchmark printouts.  We would print the model name with no newline, then tqdm would print a `\r` and overwrite the name of the running model.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101627
Approved by: https://github.com/ezyang
2023-05-17 15:31:11 +00:00
PaliC
e0fc24cdc5 add retries to inductor benchmark suite (#101019)
This pr accomplishes
1) Enables retries for downloading torchbenchmark and huggingface models in a similar method to how we do it for timm models right now.
2) creates a `_download_model` function for the hugging face and TIMM runners whose output I plan to use to preload the models somewhere if possible (please double check I'll be saving the right thing). Instead of retries, we plan to just add torchbench to a docker image as it is relatively small.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 3361a4c</samp>

> _We're the brave and bold coders of the `common.py` module_
> _We've made a handy function for downloading models_
> _We've shared it with our mates in the other runners_
> _So pull and push and try again, we'll get them all in time_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101019
Approved by: https://github.com/huydhn, https://github.com/desertfire
2023-05-16 21:41:50 +00:00
Edward Z. Yang
41468833fb vision_maskrcnn is now deterministic (#101116)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101116
Approved by: https://github.com/ngimel
2023-05-16 21:32:17 +00:00
Edward Z. Yang
23d1cc3811 Update llama to failing (#101565)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101565
Approved by: https://github.com/janeyx99
2023-05-16 14:12:26 +00:00
Yanbo Liang
e4eaf33346 Re-enable detectron2_maskrcnn on CI (#100791)
#99665 has been fixed, we can re-enable these models on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100791
Approved by: https://github.com/huydhn
2023-05-16 04:25:58 +00:00
Edward Z. Yang
f48718f749 Update torchbench pin (#101365)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101365
Approved by: https://github.com/albanD, https://github.com/awgu
2023-05-15 16:52:31 +00:00
Edward Z. Yang
fcf2fb273c Make missing model import error marginally better (#101221)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101221
Approved by: https://github.com/albanD, https://github.com/anijain2305
2023-05-14 19:57:01 +00:00
Jongsoo Park
8876c0b282 [transformer benchmark] fix in sdp_bwd for scaled_dot_product_attention return type (#101341)
Summary:
Otherwise we get
```
Traceback (most recent call last):
  File "<string>", line 49, in <module>
  File "<string>", line 47, in __run
  File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/caffe2/benchmarks/transformer/sdp_backwards.py", line 188, in <module>
    main()
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/caffe2/benchmarks/transformer/sdp_backwards.py", line 184, in main
    run_timing(min_run_time, batch_size, embed_dim, num_heads, max_seq_len, dtype)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/caffe2/benchmarks/transformer/sdp_backwards.py", line 105, in run_timing
    rand_fused_upward = cpt(x, x, x, mask).clone().detach()
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/users/jongsoo/fbsource/buck-out/v2/gen/fbcode/ef4169ac7f95fb74/caffe2/benchmarks/transformer/__sdp_backwards__/sdp_backwards#link-tree/caffe2/benchmarks/transformer/sdp_backwards.py", line 39, in forward
    attn, _ = torch.nn.functional.scaled_dot_product_attention(
ValueError: too many values to unpack (expected 2)
```

Test Plan: buck run mode/dev-nosan //caffe2/benchmarks/transformer:sdp_backwards

Differential Revision: D45843838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101341
Approved by: https://github.com/drisspg
2023-05-14 01:34:51 +00:00
Natalia Gimelshein
49578913fb update timm commit (#100931)
Fixes #100903

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100931
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-05-12 04:22:08 +00:00
Edward Z. Yang
41a4e22015 Update torchbench pin (#101071)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101071
Approved by: https://github.com/malfet
2023-05-11 18:09:40 +00:00
Edward Z. Yang
ad070b6dfa Check canary_models for models too in torchbench.py (#101081)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101081
Approved by: https://github.com/desertfire
2023-05-11 13:23:17 +00:00
lezcano
8b4e28d65d Fix microbenchmarks (#101065)
As per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101065
Approved by: https://github.com/jansel
2023-05-11 09:14:22 +00:00
Jason Ansel
036a8d6b4a Remove NullContext() from benchmark runners (#100309)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100309
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
2023-05-11 06:42:27 +00:00
XiaobingSuper
c84627c2ee benchmarks: make --amp works for cpu path (#101057)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101057
Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel
2023-05-11 02:51:38 +00:00
Edward Z. Yang
c658732950 [RFC] Add tqdm to benchmarking script (#100969)
Here's what it looks like, on a slower running benchmark:

https://github.com/pytorch/pytorch/assets/13564/47c4a5bd-e963-45de-a15c-2fd943de0fa4

There's actually quite a bit of dead time, it's possible there are more spots we should add tqdm to. Looking for opinions on utility of this.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100969
Approved by: https://github.com/Skylion007
2023-05-10 15:39:24 +00:00
Edward Z. Yang
1e89a56a5b Apply static policy correctly to unspec (#98983)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98983
Approved by: https://github.com/ezyang
2023-05-10 05:59:12 +00:00
Bin Bao
76cc3ab4f3 [CI] Delete skips from https://github.com/pytorch/pytorch/issues/93847 (#96049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96049
Approved by: https://github.com/jansel
2023-05-10 01:27:27 +00:00
Edward Z. Yang
9eab13fc90 Reenable llama benchmark (#100877)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100877
Approved by: https://github.com/albanD
2023-05-09 01:12:54 +00:00
Natalia Gimelshein
9790f9174a skip lcnet (#100726)
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100726
Approved by: https://github.com/voznesenskym
2023-05-05 23:19:42 +00:00
Animesh Jain
3f025c607c summarize graph breaks (#100696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696
Approved by: https://github.com/yanboliang
2023-05-05 22:27:47 +00:00
Natalia Gimelshein
4ca26d183a [CI] update hf version for ci (#100666)
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100666
Approved by: https://github.com/malfet
2023-05-05 18:12:53 +00:00
Animesh Jain
8994d9e610 [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590)
For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590
Approved by: https://github.com/voznesenskym, https://github.com/wconstab
2023-05-04 18:52:21 +00:00
Edward Z. Yang
c58d9642d0 Don't build Triton from source in benchmarks/dynamo/Makefile (#100613)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100613
Approved by: https://github.com/voznesenskym
2023-05-04 06:13:42 +00:00
Edward Z. Yang
d25c93f919 Remove speech_transformer workaround, torchbench handles it correctly now (#100558)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100558
Approved by: https://github.com/albanD
2023-05-04 01:14:24 +00:00
Yanbo Liang
896eb1db26 [Dynamo] Skip TB Background_Matting model eager accuracy check because of non deterministic (#100513)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100513
Approved by: https://github.com/anijain2305
2023-05-03 07:06:50 +00:00
Animesh Jain
0acfe2ce09 [dashboard] higher tolerance for AlbertForQuestionAnswering (#100277)
@desertfire

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100277
Approved by: https://github.com/desertfire
2023-05-02 23:51:08 +00:00
Jason Ansel
fdc853b14c Add --baseline option to benchmark runners (#100266)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100266
Approved by: https://github.com/ngimel
2023-05-02 02:35:11 +00:00
Edward Z. Yang
e918fd18e7 Disable densenet121 as it is flaky (#100371)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100371
Approved by: https://github.com/voznesenskym
2023-05-02 01:49:11 +00:00
Michael Voznesensky
aafc6ce8cc Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-05-01 21:32:11 +00:00
Edward Z. Yang
5d93265cce Report timeout/infra_error instead of 0.0000 on infra error (#100372)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100372
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-05-01 14:56:01 +00:00
PyTorch MergeBot
89c43f4108 Revert "Produce constant variables in cases where a SymNode is created with a constant (#100144)"
This reverts commit d7bdfd3454.

Reverted https://github.com/pytorch/pytorch/pull/100144 on behalf of https://github.com/ezyang due to ci failure is real ([comment](https://github.com/pytorch/pytorch/pull/100144#issuecomment-1529587039))
2023-05-01 11:10:48 +00:00
Michael Voznesensky
d7bdfd3454 Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-04-30 17:13:57 +00:00
Animesh Jain
006785cd46 [dynamo][hf_bigbird] Actually graph break on tensor.unsqueeze_/resize_ (#99986)
Currently, we return `unimplemented` w/o a graph break on seeing a x.unsqueeze_ when x is input. This essentially means we fall back to the original frame.

This PR actually graph breaks so that we can generate the continuation frame for the rest of the function. Instead of graph breaking at LOAD_ATTR, we delay the graph break to the actual CALL_FUNCTION, where its cleaner to graph break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99986
Approved by: https://github.com/jansel
2023-04-26 18:50:06 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Huy Do
9a69634b28 Skip some failing dynamic shape models on periodic (#99895)
After some recent changes, these tests are failing in periodic trunk.  So let's move them to unstable while waiting for the team to root cause the issue https://github.com/pytorch/pytorch/issues/99893.  Note that a forward fix can use `ciflow/unstable` to run those unstable jobs to confirm that they are fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99895
Approved by: https://github.com/malfet
2023-04-25 07:05:08 +00:00
Edward Z. Yang
04e8df4dd7 Return full accuracy status for printing, not abbreviated version (#99894)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99894
Approved by: https://github.com/jansel
2023-04-25 05:17:10 +00:00
Jiong Gong
e5c9a0fcf5 [dynamo] avoid graph break on repeat_interleave.self_int (#99528)
Address convit_base failure: https://github.com/pytorch/torchdynamo/issues/1886 mentioned in https://github.com/pytorch/pytorch/issues/93777
Also for models like EleutherAI/gpt-j-6B.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99528
Approved by: https://github.com/ezyang
2023-04-25 04:47:39 +00:00
Edward Z. Yang
cd61707167 yolov3 dynamic training accuracy is fixed (#99896)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99896
Approved by: https://github.com/albanD
2023-04-25 01:15:24 +00:00
Edward Z. Yang
0b545bc667 Stop marking sequence length as dynamic (#99889)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99889
Approved by: https://github.com/voznesenskym, https://github.com/huydhn
2023-04-25 01:04:16 +00:00
chuanqiw
e9e5ffe83e Re-enable dynamic shapes test in dynamo benchmark (#99816)
Set `torch._dynamo.config.assume_static_by_default = False` for dynamic shapes flag enabled

Fixes #99815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99816
Approved by: https://github.com/jgong5, https://github.com/ezyang
2023-04-24 20:34:52 +00:00
Yanbo Liang
3009c42e7d [CI Testing] Re-enable timm_efficientdet training (#99787)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99787
Approved by: https://github.com/desertfire
2023-04-24 20:05:15 +00:00
Edward Z. Yang
dc1c0924ec Properly parenthesize dynamo_dynamic_indices test (#99823)
I've got the E2E test case which triggered this in https://github.com/pytorch/pytorch/pull/99809

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99823
Approved by: https://github.com/voznesenskym
2023-04-23 22:41:34 +00:00
Edward Z. Yang
f602b3a6ae Preserve mark_dynamic when cloning inputs (#99617)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617
Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305
2023-04-22 19:46:31 +00:00
Natalia Gimelshein
bfbc4e74ab adjust batch sizes for hf suite (#99691)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99691
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2023-04-21 23:57:53 +00:00
Bin Bao
e09f785a72 [CI] Remove inductor skip list for Huggingface (#99375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99375
Approved by: https://github.com/anijain2305
2023-04-21 18:13:22 +00:00
Edward Z. Yang
fc8fa6c356 Require at least one tensor to be marked dynamic with --dynamic-batch-only (#99620)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99620
Approved by: https://github.com/voznesenskym
2023-04-21 00:17:08 +00:00
Huy Do
5315317b7b Skip some detectron2_maskrcnn models with KeyError _ignore_torch_cuda_oom (#99599)
These tests are failing in trunk 233cc34d3b with `KeyError: '_ignore_torch_cuda_oom'`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99599
Approved by: https://github.com/malfet
2023-04-20 18:11:35 +00:00
Shunting Zhang
68bc0fc012 [inductor] a script to benchmark the perf impact from tensor layout (#99583)
Follow up on Jason's idea of tensor layout tuning. Add a script to show the perf impact of layout to convolution (will add more cases like batch/layer norm, reduction to the scripts).

For convolution, a quick test shows using channels last layout, we get 1.4x speedup for convolution:
```
baseline 4.509183883666992 test 3.178528070449829 speedup 1.419x
```

The speedup definitely also depends on input/weight shapes. E.g., change input channel from 3 in the test to 8, we see speedup to be 2.1x

The trace shows cudnn calls different kernels when input layout changes to channels last.

<img width="997" alt="Screenshot 2023-04-19 at 5 27 54 PM" src="https://user-images.githubusercontent.com/52589240/233228656-4bdcac0a-7633-416a-82e1-17d8dc8ea9a6.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99583
Approved by: https://github.com/jansel
2023-04-20 06:26:10 +00:00
Jason Ansel
3233450d07 Add TorchXLA option to benchmark runner (#99505)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99505
Approved by: https://github.com/voznesenskym
2023-04-19 22:44:52 +00:00
Will Constable
9ac2b041c9 Make opacus xfail instead of skip (#99380)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380
Approved by: https://github.com/desertfire, https://github.com/anijain2305
2023-04-19 21:09:06 +00:00
Huy Do
5d395769a6 Skip vision_maskrcnn after #98923 (#99394)
This is failing in trunk as documented in https://github.com/pytorch/pytorch/issues/99438

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99394
Approved by: https://github.com/desertfire
2023-04-19 17:07:07 +00:00
Michael Voznesensky
113bd11cf4 Skip levit (#99491)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99491
Approved by: https://github.com/ezyang
2023-04-19 07:41:42 +00:00
Edward Z. Yang
e60557793f Make hash update script more robust and run it (#99370)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99370
Approved by: https://github.com/Chillee, https://github.com/voznesenskym
2023-04-19 02:26:03 +00:00
Bin Bao
46b9377190 [CI] Collect inductor max-autotune performance every Sunday (#99387)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99387
Approved by: https://github.com/malfet, https://github.com/huydhn
2023-04-18 13:20:13 +00:00
PyTorch MergeBot
ce7c4ba11d Revert "Mark doctr_det_predictor as broken on master (#99370)"
This reverts commit b290381e09.

Reverted https://github.com/pytorch/pytorch/pull/99370 on behalf of https://github.com/ezyang due to malfet already directly fixed it
2023-04-18 13:18:10 +00:00
Edward Z. Yang
b290381e09 Mark doctr_det_predictor as broken on master (#99370)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99370
Approved by: https://github.com/Chillee, https://github.com/voznesenskym
2023-04-18 06:58:47 +00:00
Edward Z. Yang
039faf0dbf Add invariant that all symbolic shapes must be bound in graph (#99089)
Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards.

With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well.

This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089
Approved by: https://github.com/voznesenskym
2023-04-16 01:48:19 +00:00
Yanbo Liang
15fe5a0798 [Dynamo] Fix benchmark --verbose error (#99224)
Dynamo benchmark --verbose is broken:
```
Traceback (most recent call last):
  File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 400, in <module>
    torchbench_main()
  File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 396, in torchbench_main
    main(TorchBenchmarkRunner(), original_dir)
  File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 1967, in main
    return maybe_fresh_cache(
  File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 993, in inner
    return fn(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 2135, in run
    torch._dynamo.config.log_level = logging.DEBUG
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/config_utils.py", line 67, in __setattr__
    raise AttributeError(f"{self.__name__}.{name} does not exist")
AttributeError: torch._dynamo.config.log_level does not exist
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99224
Approved by: https://github.com/voznesenskym
2023-04-15 20:18:50 +00:00
Bin Bao
34f681c13b [CI] Remove inductor skip list for timm_models (#98840)
Summary: check against the expected csv file instead of skipping tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840
Approved by: https://github.com/ezyang
2023-04-15 13:54:41 +00:00
Bin Bao
a595a50653 [CI] Use expected accuracy csv files to check benchmark test status (#98839)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98839
Approved by: https://github.com/ezyang
2023-04-15 13:54:41 +00:00
Will Constable
6eab5e88c8 Graph-break on allowed modules if they have hooks (#97184)
Allowed modules are stuck into dynamo's fx graph as call_module
nodes, without dynamo doing any tracing of the module.  This means
during AOT trace time, hooks will fire during tracing when the
call_module is executed, but the hooks themselves will disappear
after that and not be present in the compiled program.
  (worse, if they performed any tensor operations, those would get
   traced so you could end up with part of the hook's functionality).

To circumvent this, there are two options for 'allowed modules' with hooks.
1) don't treat them as 'allowed' - trace into them
2) graph-break, so the module is no longer part of the dynamo trace at all

(1) will fail for users that opted into allowed modules becuase they know
    their module has problems being traced by dynamo.
(2) causes graph breaks on common modules such as nn.Linear, just because they
    are marked as 'allowed'.

It would help matters if we could differentiate between types of allowed modules
  (A) allowed to avoid overheads - used for common ops like nn.Linear
  (B) allowed to avoid dynamo graphbreaks caused by unsupported code

Ideally, we'd use method (1) for group (A) and (2) for (B).

For now, graph-break on all cases of allowed modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97184
Approved by: https://github.com/jansel
2023-04-15 01:46:15 +00:00
lezcano
1e78a2edcc Make summarize_perf.py work with perf-compare (#99095)
[perf-compare](https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-compare.yml) has a different structure than that of the nightlies.
For these files, the script now generates:

```
# cuda float32 training performance results
## Geometric mean speedup
            huggingface    timm_models    torchbench
--------  -------------  -------------  ------------
inductor           1.46            1.4          1.17

## Mean compilation time
            huggingface    timm_models    torchbench
--------  -------------  -------------  ------------
inductor          57.85          97.63         60.18

## Peak memory compression ratio
            huggingface    timm_models    torchbench
--------  -------------  -------------  ------------
inductor           1.06           1.01          0.83
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99095
Approved by: https://github.com/ezyang
2023-04-14 12:10:54 +00:00
Bin Bao
e5501a967e [inductor] Support IndexPutFallback in cpp_wrapper (#98972)
Summary:
1) Make the fallback index_put generate the right cpp code in cpp_wapper
2) Add a --cpp-wrapper option to common.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98972
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-04-13 15:41:03 +00:00
Will Constable
296822c475 Make update_expected not fail on one missing file (#98982)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98982
Approved by: https://github.com/voznesenskym
2023-04-13 03:59:20 +00:00
PyTorch MergeBot
629377ea8b Revert "Replace _dynamo.config with an object instead of module (#96455)"
This reverts commit 420104a886.

Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely
2023-04-12 15:06:14 +00:00
Han Qi
420104a886 Replace _dynamo.config with an object instead of module (#96455)
Summary:
    Replace _dynamo.config with an object instead of module

    Current usage patterns of setting and reading fields on config will work
    unchanged.

    Only changes needed going forward:
    1. import torch._dynamo.config will not work. However, just doing
       import torch._dynamo is sufficient to access dynamo config
       as torch._dynamo.config.

    2. Files inside of _dynamo folder need to access config via
       from torch._dynamo.config_util import config instead of
       from torch._dynamo import config. Because _dynamo/__init__.py
       imports some of the files so it would be circular import.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455
Approved by: https://github.com/williamwen42
2023-04-11 21:23:32 +00:00
Edward Z. Yang
16beb636b8 Generalize summary script to work with more CSV names (#98500)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98500
Approved by: https://github.com/wconstab
2023-04-11 19:05:18 +00:00
Edward Z. Yang
b8b840be3d Convert logging f-strings to use % format, part five (#98765)
This does some annoying but simple cases by hand.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
Edward Z. Yang
b09722f540 Convert logging f-strings to use % format, part two (#98700)
This hits multi-line logging strings

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
9a8f71f23e Convert logging f-strings to use % format (#98697)
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Jason Ansel
f4858fa8ef Improve dynamo support for autograd.Function (#98158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98158
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2023-04-10 00:33:51 +00:00
Bin Bao
5210d7c423 [CI] Mark vision_maskrcnn as NONDETERMINISTIC (#98570)
Summary: vision_maskrcnn fails eager checking, so mark it as
NONDETERMINISTIC to reduce noise on the dashboard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98570
Approved by: https://github.com/eellison, https://github.com/huydhn
2023-04-07 19:33:20 +00:00
PyTorch MergeBot
e394f6db5a Revert "Improve dynamo support for autograd.Function (#98158)"
This reverts commit 4716fa2411.

Reverted https://github.com/pytorch/pytorch/pull/98158 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to breaks MacOS trunk job 4716fa2411.  The signal was missing from the PR because we disabled MacOS job yesterday due to https://github.com/pytorch/pytorch/issues/98362
2023-04-06 18:15:02 +00:00
William Wen
bb33173962 Add max-autotune compilers to benchmarks (#98464)
Title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98464
Approved by: https://github.com/shunting314
2023-04-06 17:13:02 +00:00
Jason Ansel
4716fa2411 Improve dynamo support for autograd.Function (#98158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98158
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2023-04-06 16:44:37 +00:00
Edward Z. Yang
bdb79a8f52 Turn off divisible_by_16 for dynamic shapes; support ablation (#98471)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98471
Approved by: https://github.com/ngimel, https://github.com/voznesenskym
2023-04-06 12:57:07 +00:00
Bin Bao
007587aa00 [CI] Update update_expected.py to make it generate a combined csv file (#98407)
Summary: make update_expected.py combine csv files from all shards into a single csv file for each test suite

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98407
Approved by: https://github.com/wconstab, https://github.com/ezyang
2023-04-06 00:00:58 +00:00
Edward Z. Yang
37b9143206 Require sequence length in huggingface to be dynamic (#98335)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98335
Approved by: https://github.com/voznesenskym
2023-04-05 19:40:22 +00:00
Edward Z. Yang
cf1bfca2ba Require batch dimensions to be compiled dynamically (#98334)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98334
Approved by: https://github.com/voznesenskym
2023-04-05 19:40:22 +00:00
Bin Bao
c4de7fdef5 [CI] Mark sebotnet33ts_256 as nondeterministic (#98356)
Summary: The goal is make sure the new dashboard doesn't give noisy
alert on this test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98356
Approved by: https://github.com/ezyang
2023-04-05 12:05:47 +00:00
Edward Z. Yang
b923f84805 Switch accuracy CI to dynamic batch only (#98307)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98307
Approved by: https://github.com/wconstab
2023-04-05 01:20:12 +00:00
Elias Ellison
a3365e1d0d Increment pending forwards after invocation (#98101)
Forwards are only pending following invocation, not before.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98101
Approved by: https://github.com/ngimel
2023-04-05 00:04:39 +00:00
Bin Bao
bd6db54285 [CI] Mark mobilenet_v3_large as nondeterministic (#98314)
Summary: Skip mobilenet_v3_large for accuracy checking to reduce
noise on the dashboard. The root cause still needs to be investigated.

mobilenet_v3_large shows random accuracy check failures with different
error values from time to time, and here are some examples:
```
cuda train mobilenet_v3_large                  [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.02172, (ref-fp64): 0.01068 and shape=torch.Size([960, 1, 5, 5])
[2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.14.block.1.0.weight.grad
```
```
cuda train mobilenet_v3_large                  [2023-04-04 14:57:59,972] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.07744, (ref-fp64): 0.03073 and shape=torch.Size([72, 1, 5, 5])
[2023-04-04 14:57:59,973] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.4.block.1.0.weight.grad
```

One observation is turnning off cudnn in the eager mode with
`torch.backends.cudnn.enabled = False` makes the non-deterministic
behvior go away but meanwhile it fails accuaracy checking consistently.
Minifier didn't help to narrow down the error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98314
Approved by: https://github.com/huydhn
2023-04-04 21:55:23 +00:00
Edward Z. Yang
3c36f82fa2 [EASY] Handle new inference csv from CI (#98294)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98294
Approved by: https://github.com/wconstab
2023-04-04 20:37:51 +00:00
William Wen
4cf3e7c255 [dynamo benchmarks] Fix inference benchmark runs (#98248)
Update flags for dynamo inference benchmark runs. Add flag to not compute regressions/metric graphs (useful if there aren't previous runs to compare with).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98248
Approved by: https://github.com/shunting314
2023-04-04 01:24:13 +00:00
Bin Bao
69ff39d2e7 Skip gat, gcn and sage for TorchBench CUDA test (#98244)
Summary: The three models only support CPU for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244
Approved by: https://github.com/ezyang
2023-04-04 01:06:18 +00:00
Jason Ansel
55afaa46a4 Support functools.partial and itertools.product (#98120)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98120
Approved by: https://github.com/anijain2305
2023-04-03 18:23:25 +00:00
Bin Bao
ba7ee00f00 Add a --inference flag to dynamo benchmark script (#98173)
Summary: When calling benchmark scripts, make it a requirement to pass
--inference or --training

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98173
Approved by: https://github.com/huydhn
2023-04-03 17:12:28 +00:00
Jason Ansel
76074dc0a3 Improve support for dict subclasses (#98154)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98154
Approved by: https://github.com/anijain2305
2023-04-03 01:42:08 +00:00
Jason Ansel
bc9dd969e1 Support inlining no_grad() decorator (#98121)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98121
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
2023-04-03 00:24:56 +00:00
Jason Ansel
92b46202ef Add --stats option to benchmark scripts (#98109)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98109
Approved by: https://github.com/anijain2305
2023-04-02 02:23:13 +00:00
Edward Z. Yang
5df59f957f Fix G001,G002,G003 in logs to % syntax (#97812)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812
Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos
2023-04-01 01:43:33 +00:00
Animesh Jain
6b319d1525 [dynamo][graph break fix] inplace add for empty tuple (#97923)
Fixes one of the frequent graph breaks in HF models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97923
Approved by: https://github.com/yanboliang, https://github.com/jansel
2023-04-01 00:11:16 +00:00
Bin Bao
c699ac17df [CI] Bump up torchbench version to fix dynamo graph breaks in transformers (#98003)
Summary: When we bump up the torchbench version pin last time, we found
there were new graph breaks introduced with the trasformers version
upgrade, see https://github.com/pytorch/pytorch/pull/96782. Turns out
they are already fixed upstream, see
https://github.com/huggingface/transformers/pull/21648 and https://github.com/pytorch/benchmark/pull/1511

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98003
Approved by: https://github.com/ngimel
2023-03-31 16:52:09 +00:00
Edward Z. Yang
91ad5984d8 Add script to summarize performance from CI performance run (#97977)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97977
Approved by: https://github.com/wconstab
2023-03-31 12:44:48 +00:00
Edward Z. Yang
97fc8ea5f4 Run the benchmark suite with dynamic batch only (#97912)
Symbolic shapes compile time on full CI with inductor is horribly long (even though our aot_eager local runs seemed to suggest that the added latency was only 10s per model.) To patch over the problem for now, run the benchmark suite with dynamic batch only.  This should absolve a lot of sins.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97912
Approved by: https://github.com/janeyx99, https://github.com/desertfire
2023-03-30 18:04:48 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
Will Constable
2f86c9bc0b Update query version for update_expected.py (#97898)
Unclear why this wobbled, but rocks had an outage and fixed it,
maybe new endpoints were generated as a result of that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97898
Approved by: https://github.com/huydhn
2023-03-29 21:50:19 +00:00
William Wen
b93e1f377e [dynamo, benchmarks] Add inductor-mode (for max-autotune) and warm start options to dynamo benchmarks (#97719)
Title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97719
Approved by: https://github.com/shunting314
2023-03-29 21:09:00 +00:00
Edward Z. Yang
f754be897a Disable speedup_experiment_ds (#97806)
It seems to be broken.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97806
Approved by: https://github.com/jansel
2023-03-29 01:27:31 +00:00
Aaron Gokaslan
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
Bin Bao
a9a81ab7e3 [CI] Run benchmark test with dynamo_eager in periodic (#97543)
Summary: The idea is to catch any dynamo_eager regression earlier, and also
we can take that off the dashboard run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97543
Approved by: https://github.com/huydhn
2023-03-28 01:02:49 +00:00
Shunting Zhang
652592efa9 [inductor] use torch.prifiler in the triton wrapper (#97405)
I think it's helpful to use torch.profiler to profile the triton wrapper.

E.g., I tried it for nvidia_deeprecommender's infernece graph.

Even with max-autotune, we see the majority of the time the GPU is running 2 mm/addmm op. That's why max autotune does not help for this model since tuning does not affect the external mm ops.

<img width="711" alt="Screenshot 2023-03-22 at 5 49 28 PM" src="https://user-images.githubusercontent.com/52589240/227072474-2f0d7205-4a10-4929-b1b7-551214788c61.png">

next step I'll check why the triton mm kernels are not picked.

EDIT: the above screenshot is captured without max-autotune due to a typo. below is the trace with max-autotune enabled:
<img width="712" alt="Screenshot 2023-03-22 at 6 43 26 PM" src="https://user-images.githubusercontent.com/52589240/227077624-fdccf928-be08-4211-871b-a9e3d7b76fbe.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97405
Approved by: https://github.com/ngimel
2023-03-27 21:54:25 +00:00
Yanbo Liang
d305d4a57f [Dynamo] Fix TIMM benchmark compute_loss (#97423)
Fixes #97382

#95416 fixed a critical bug in dynamo benchmark, where AMP tests fall back to eager mode before that PR. However, after that PR, we found [a list of TIMM models amp + eager + training testing failed](https://docs.google.com/spreadsheets/d/1DEhirVOkj15Lu4UNawIUon9MqkVLaWqyT-DQPif5NHk/edit#gid=0).
Now we identified the root cause is: high loss values make gradient checking harder, as small changes in accumulation order upset accuracy checks. We should switch to the helper function ```reduce_to_scalar_loss``` which has been used by Torchbench tests.
After switching to ```reduce_to_scalar_loss```, TIMM models accuracy pass rate grows from 67.74% to 91.94% in my local test. The rest 5 failed models(ese_vovnet19b_dw, fbnetc_100, mnasnet_100, mobilevit_s, sebotnet33ts_256) need further investigation and handling, but I think it should be similar reason.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97423
Approved by: https://github.com/Chillee
2023-03-24 16:50:28 +00:00
Scott Wolchok
3b54592050 [PyTorch] Add annotation_str benchmark (#96496)
To be used to evaluate performance of following improvements. Baseline numbers:

https://gist.github.com/swolchok/c8bcb92be1dc6e67c4f7efad498becd5

Differential Revision: [D43919653](https://our.internmc.facebook.com/intern/diff/D43919653/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D43919653/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96496
Approved by: https://github.com/Skylion007
2023-03-23 04:18:07 +00:00
Jason Ansel
9370f253e3 [inductor] Rewrite convolution triton templates (#95556)
Fixes #95775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95556
Approved by: https://github.com/Chillee, https://github.com/ngimel
2023-03-22 18:12:23 +00:00
Edward Z. Yang
cff4826f28 pytorch_unet is now passing (#97309)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97309
Approved by: https://github.com/janeyx99, https://github.com/zou3519
2023-03-22 13:55:05 +00:00
Bin Bao
be49d3b170 [CI] Turn on debug logging for dla102 and gernet_l (#97307)
Summary: Log the generated code for those two flaky tests to see if
there is any codegen difference when they fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97307
Approved by: https://github.com/ezyang
2023-03-22 13:42:13 +00:00
jjsjann123
2b32a74ab0 moving nvfuser benchmark to third_party/nvfuser (#96725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96725
Approved by: https://github.com/davidberard98
2023-03-21 23:19:15 +00:00
Natalia Gimelshein
e7d9331688 [inductor] hoist symbolic padding expressions (#97099)
Towards fixing pnasnet5large, see #96709. The generated kernel looks much better
```
@pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32', 4: 'i32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': [], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 6), equal_to_1=())]})
@triton.jit
def triton_(in_ptr0, out_ptr0, ks0, ks1, ks2, ks3, xnumel, XBLOCK : tl.constexpr):
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x1 = (xindex // ks0) % ks0
    x0 = xindex % ks0
    x2 = (xindex // ks3)
    x4 = xindex
    tmp0 = x1 + ((-1)*ks1)
    tmp1 = 0
    tmp2 = tmp0 >= tmp1
    tmp3 = ks2
    tmp4 = tmp0 < tmp3
    tmp5 = x0 + ((-1)*ks1)
    tmp6 = tmp5 >= tmp1
    tmp7 = tmp5 < tmp3
    tmp8 = tmp2 & tmp4
    tmp9 = tmp8 & tmp6
    tmp10 = tmp9 & tmp7
    tmp11 = tl.load(in_ptr0 + (x0 + ((-1)*ks1) + (ks2*x1) + (x2*(ks2*ks2)) + ((-1)*ks1*ks2) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
    tmp12 = tl.where(tmp10, tmp11, 0.0)
    tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
 ```
Interestingly, removing `expand` in in index `simplify` function makes `load` expression a little bit better, but `store` fails to simplify to flat store in this case, so I'm leaving `expand` in.
 Full pnasnet still chokes on `ceiling` in batch_norm kernels, additionally, it looks like shape propagation goofs in inductor and generates overly complicated expressions, we should switch to meta data from fx graph.
 I'm still not adding `ceil` print to triton, because we should be able to hoist all indexing expression (and just printing ceil without converting to int64 doesn't work)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97099
Approved by: https://github.com/jansel
2023-03-21 21:43:32 +00:00
Bin Bao
ead5186462 [CI] Change tests used by the new dashboard (#96986)
Summary: Stop using runn.py to trigger the new dashboard run. Instead,
we spell out the actual cmd which will be easier to extend. Dropping
perf tests for dynamo_eager and aot_eager in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96986
Approved by: https://github.com/huydhn, https://github.com/weiwangmeta
2023-03-20 17:28:12 +00:00
Edward Z. Yang
e74c5e5637 rexnet_100 is disabled for static, does not need dynamic listing (#97100)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97100
Approved by: https://github.com/Skylion007
2023-03-19 20:57:49 +00:00
David Berard
a4c706bcbc [dynamo][dashboard] fix triton clone step in dashboard (#96623)
previously this would clone triton, and then try to checkout without being in the git repo directory. This wasn't usually a problem because the environment already had a triton repo downloaded; but I ran into this while trying to construct a new environment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96623
Approved by: https://github.com/anijain2305
2023-03-17 22:36:26 +00:00
Bin Bao
577d930c39 [CI] Revert https://github.com/pytorch/pytorch/pull/96195 (#96897)
Summary: https://github.com/pytorch/pytorch/pull/96195 was an experiment
for debugging flaky failures on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96897
Approved by: https://github.com/ngimel
2023-03-16 06:28:18 +00:00
Edward Z. Yang
3606f59366 Default specialize_int to False (#96624)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-16 02:54:18 +00:00
Will Constable
54cd4a67d0 Output peak memory stats from dynamo torchbench perf CI (#95666)
Adds absolute memory usage numbers (in addition to compression ratio) to performance jobs.

Example output:
<img width="1211" alt="image" src="https://user-images.githubusercontent.com/4984825/225419950-500908c5-00ce-4711-afa2-c995bf90d35d.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95666
Approved by: https://github.com/ezyang, https://github.com/williamwen42
2023-03-15 19:24:47 +00:00
Bin Bao
33c7be360f [reland][CI] switch torchbench to a pinned version (#96782)
Summary: This is reland of https://github.com/pytorch/pytorch/pull/96553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96782
Approved by: https://github.com/huydhn
2023-03-15 12:46:36 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Edward Z. Yang
037acd5a22 Update CI skips (#96745)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96745
Approved by: https://github.com/wconstab
2023-03-14 22:19:10 +00:00
PyTorch MergeBot
be4eaa69c2 Revert "[CI] switch torchbench to a pinned version (#96553)"
This reverts commit 61d6ccd29a.

Reverted https://github.com/pytorch/pytorch/pull/96553 on behalf of https://github.com/desertfire due to land race
2023-03-14 21:39:45 +00:00
PyTorch MergeBot
2951a75c3a Revert "Update perf smoke test threshold in check_hf_bert_perf_csv.py (#96772)"
This reverts commit 2eed44933b.

Reverted https://github.com/pytorch/pytorch/pull/96772 on behalf of https://github.com/desertfire due to land race
2023-03-14 21:37:30 +00:00
Wei Wang
2eed44933b Update perf smoke test threshold in check_hf_bert_perf_csv.py (#96772)
Reduce the threshold a little further due to runner to runner performance variations.  e.g. https://github.com/pytorch/pytorch/actions/runs/4419276220/jobs/7747985757  https://github.com/pytorch/pytorch/actions/runs/4419548525/jobs/7748553775  failed to meet 1.145 but were above 1.140.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96772
Approved by: https://github.com/seemethere, https://github.com/huydhn, https://github.com/atalman
2023-03-14 21:00:13 +00:00
PyTorch MergeBot
ba4fb9b6ad Revert "Default specialize_int to False (#96624)"
This reverts commit 1ac8782db2.

Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py
2023-03-14 19:43:47 +00:00
Will Constable
66871d61bb One line print for check_graph_breaks (#96750)
New output looks like this

<img width="1040" alt="image" src="https://user-images.githubusercontent.com/4984825/225059313-fbac5152-ea8b-46ba-893d-dc1e2f8d82cc.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96750
Approved by: https://github.com/ezyang
2023-03-14 19:35:54 +00:00
Bin Bao
61d6ccd29a [CI] switch torchbench to a pinned version (#96553)
Summary: Previously we were using a branch on torchbench which skips
torchaudio. We should switch to make sure a good test coverage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96553
Approved by: https://github.com/huydhn, https://github.com/ezyang
2023-03-14 18:42:22 +00:00
Edward Z. Yang
1ac8782db2 Default specialize_int to False (#96624)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-14 18:37:47 +00:00
David Berard
6e3d51b08a [inductor][CI] also skip rexnet_100 on non-dynamic shapes (#96691)
Recent failures show rexnet_100 accuracy is flaky also on non-dynamic shapes (was already disabled for dynamic shapes in #96474). The failure occurs for the same reason (stem.bn.weight.grad).
e.g. https://github.com/pytorch/pytorch/actions/runs/4402868441/jobs/7710977874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96691
Approved by: https://github.com/desertfire
2023-03-14 18:11:59 +00:00
Edward Z. Yang
ff7e510d1e Correctly use PythonPrinter for generating wrapper code referencing sympy (#96710)
Otherwise you get stuff like ceiling(s0) which is not valid Python code. Fixes volo_d1_224

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96710
Approved by: https://github.com/ngimel, https://github.com/jansel
2023-03-14 14:35:52 +00:00
Will Constable
f1d4d291b0 update_expected.py to parse artifacts and update graph break stats (#96480)
TODO (cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire @ZainRizvi) hopefully i can convert the rocks query i'm using to a public API and delete the rocs api usage (and need for apikey) from this before landing.  If that's not easy or if i need to make a new query first, maybe i should land this as-is and at least people can use it if they get an apikey.  Also, any bad practices in how i parsed/mangled the filenames?  Would be nice to make the naming of artifacts more consistent with the job names so less mangling is needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96480
Approved by: https://github.com/ZainRizvi
2023-03-14 13:37:21 +00:00
Wang, Eikan
3cad8d23d0 [Inductor] Skip the hf_T5_base due to intermittent failure on CI (#96649)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96649
Approved by: https://github.com/desertfire
2023-03-14 07:40:20 +00:00
Will Constable
218eeacacd Check dynamo graph-breaks in CI (#96346)
- add graph-breaks baselines
- add check_graph_breaks script (message users on regress or improvement)
- hook up test.sh for existing accuracy job

Refactor graph-break CI check

Take steps toward merging checker with existing check flow,
consider merging it all the way inside the bench runner.

csvs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96346
Approved by: https://github.com/ezyang
2023-03-14 03:39:36 +00:00
Edward Z. Yang
507feb805f Don't specialize torch.Size with specialize_int = False (#96419)
Fixes https://github.com/pytorch/pytorch/issues/95868

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96419
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-03-14 01:32:58 +00:00
David Berard
1d792288a5 [dynamo][dashboard] Clear local changes before pulling git repos (#96667)
Current dashboard issue is due to a .pt file in torchbench that has beeen modified for some reason. This clears any local changes before pulling.

Tested in a duplicate dashboard environment with the same .pt file modified:
* Before the change to this makefile, `make pull-deps` fails
* After the change to this makefile, `make pull-deps` succeeds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96667
Approved by: https://github.com/anijain2305
2023-03-13 22:50:38 +00:00
Edward Z. Yang
c7f39c0820 Update CI skips (#96554)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96554
Approved by: https://github.com/janeyx99
2023-03-13 13:40:45 +00:00
Huy Do
c3614c7a61 Add a flag to benchmarks script to keep the test report directory (#96398)
I notice from the Rockset data that there are only `float32` records, while there should be both dtypes there.  It turns out that the benchmarks script generated by `runner.py` always removes the output directory by default, so there are only records from `float32` running later left.

For example, `rm -rf /var/lib/jenkins/workspace/test/test-reports` appeared twice in the CI log https://ossci-raw-job-status.s3.amazonaws.com/log/11840774308.

I'm adding a new flag `--keep-output-dir` to keep the output directory.  This is off by default as I'm not sure how this script is used internally, people probably expect to see the output directory cleaned up everytime.

### Testing

Not really want to start the 10h jobs just to test this small flag, so I'm triple check the change to make sure that there is no bug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96398
Approved by: https://github.com/weiwangmeta
2023-03-11 03:16:56 +00:00
Yanbo Liang
7fcf8b1829 [Dynamo] Support torch.{cuda/cpu}.amp.autocast (#95416)
For Meta internal use cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95416
Approved by: https://github.com/jansel
2023-03-10 21:48:08 +00:00
Wei Wang
49eed50d19 [Inductor Perf CI] Lower the threshold of performance smoke test speedup. (#96531)
Avoids issues with https://github.com/pytorch/pytorch/issues/96530

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96531
Approved by: https://github.com/seemethere
2023-03-10 18:58:28 +00:00
David Berard
29cd60dfb7 [CI] handle more dynamo benchmark models that are not expected to be deterministic (#96324)
Follow-up to #96245. alexnet, Background_Matting, vision_maskrcnn, and vgg16 all have the same problem; but on float32 they were also failing on the previous day so I missed this. Once the amp jobs became available I could see that these have the same issue (on both float32 and amp).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96324
Approved by: https://github.com/desertfire
2023-03-10 18:15:34 +00:00
Bin Bao
a651e6253a [CI] Change compile_threads to 1 when running benchmark accuracy test on CI (#96195)
Summary: This is not a pretty solution, but it a way to verify if the flakiness is coming from parallel compilation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96195
Approved by: https://github.com/ngimel
2023-03-10 17:39:38 +00:00
Edward Z. Yang
ff2e14f200 Skip rexnet_100 in dynamic CI (#96474)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96474
Approved by: https://github.com/yanboliang, https://github.com/msaroufim
2023-03-10 01:23:19 +00:00
Horace He
5bbec680d7 Fix usages of contextmanager without finally (#96170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-03-08 20:59:27 +00:00
Edward Z. Yang
c988de1040 [EASY] Update inductor training dynamic skips (#96298)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96298
Approved by: https://github.com/Chillee, https://github.com/janeyx99
2023-03-08 19:31:46 +00:00
Bin Bao
b3a079810e [CI] Add a workflow for quick perf comparison (#96166)
Summary: ciflow/inductor-perf-test-nightly now contains full dashboard
run which takes a very long time. Ed proposed a simplification of the
perf run there, but it is still worth to have a set of fast perf test
which only includes one configuration (--training --amp).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96166
Approved by: https://github.com/huydhn, https://github.com/weiwangmeta
2023-03-08 19:09:04 +00:00
Bin Bao
664381b293 [CI] Avoid calling torch.use_deterministic_algorithms for some models (#96245)
tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96245
Approved by: https://github.com/davidberard98
2023-03-08 03:35:32 +00:00
Edward Z. Yang
d0641ed247 [TEST] Turn on unspecialize int dynamic training inductor CI (#96058)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96058
Approved by: https://github.com/janeyx99, https://github.com/voznesenskym
2023-03-07 16:08:45 +00:00
Edward Z. Yang
a6e3e7905e Turn on unspecialize int dynamic inductor CI (#96034)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96034
Approved by: https://github.com/voznesenskym
2023-03-07 12:39:55 +00:00
Jason Ansel
95d17dc93d [inductor] Reland #95567 part 1 (#96023)
This is the non-problematic part of #95567.  The errors were coming from
IR printing changes which will be next in the stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96023
Approved by: https://github.com/ngimel, https://github.com/mlazos
2023-03-06 22:57:22 +00:00
Edward Z. Yang
1fd7ea1ba8 Update skips for RecursionError (#96109)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96109
Approved by: https://github.com/huydhn
2023-03-06 17:55:38 +00:00
Bin Bao
02792ff16f [CI] Make inductor-perf-test-nightly produce data for dashboard (#95685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95685
Approved by: https://github.com/ezyang, https://github.com/huydhn
2023-03-06 03:14:03 +00:00
Bin Bao
60cf95610d [CI] Skip xcit_large_24_p8_224 in TIMM (#96048)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96048
Approved by: https://github.com/jansel
2023-03-05 14:54:46 +00:00
Bin Bao
1359d16fe8 [CI] Further tighten the checking of two eager runs (#95902)
Summary: To catch nondeterminism in eager if there is any.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95902
Approved by: https://github.com/jansel
2023-03-05 14:53:02 +00:00
Edward Z. Yang
c7c4a20321 Update dynamic skips (#95966)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95966
Approved by: https://github.com/janeyx99, https://github.com/voznesenskym
2023-03-04 23:01:58 +00:00
Jason Ansel
43dd043ea7 Revert "[inductor] Improve error messages (#95567)" (#96014)
This reverts commit 62b775583f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96014
Approved by: https://github.com/Chillee
2023-03-04 04:03:31 +00:00
Edward Z. Yang
d303665d33 Make int unspecialization actually work (#95621)
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.

The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.

* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.

Fixes https://github.com/pytorch/pytorch/issues/95469

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-03-04 01:22:08 +00:00
Jason Ansel
62b775583f [inductor] Improve error messages (#95567)
Example error message before/after (710 to 131 lines):
https://gist.github.com/jansel/6fecad057738089fa95bf08c3de9fc8a

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95567
Approved by: https://github.com/mlazos
2023-03-02 02:20:55 +00:00
Bin Bao
879f0c3fee [CI] Increate the timeout limit for benchmark test (#95787)
Summary: xcit_large_24_p8_224 occasionally hits TIMEOUT on CI. Bump up
the limit to reduce flakiness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95787
Approved by: https://github.com/ezyang, https://github.com/ZainRizvi
2023-03-01 19:54:25 +00:00
Bin Bao
e79b2b7792 [CI] Force clear triton cache between running each test (#95729)
Summary: The idea is to see if this reduces some of the flakiness
we have seen on CI. If it does help, then we have a problem in our
caching implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95729
Approved by: https://github.com/ngimel
2023-03-01 04:10:03 +00:00
William Wen
cf3638a9cc [dynamo] Clear cache on dynamo dashboard accuracy tests (#95726)
Might fix some flaky accuracy tests?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95726
Approved by: https://github.com/ngimel, https://github.com/anijain2305, https://github.com/desertfire
2023-03-01 00:50:19 +00:00
Will Constable
1a72712645 Add dynamo graph break stats to CI (#95635)
Adds columns to csv produced by accuracy job including dynamo graph break stats.

Example output from torchbench CI job:
<img width="771" alt="image" src="https://user-images.githubusercontent.com/4984825/221716236-9276684e-1be8-43e1-837e-f41671d4e0e3.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95635
Approved by: https://github.com/ezyang
2023-02-28 16:17:46 +00:00
Edward Z. Yang
3762e801ba Update dynamic skips (#95587)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95587
Approved by: https://github.com/voznesenskym
2023-02-28 03:26:55 +00:00
Bin Bao
fa5a4b0dfc [CI] Do not compare two eager run results against fp64 result (#95616)
Summary: When running the benchmark test with --accuracy, two eager runs
should return the same result. If not, we want to detect it early, but
comparing against fp64_output may hide the non-deterministism in eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95616
Approved by: https://github.com/ZainRizvi
2023-02-27 20:11:21 +00:00
Bin Bao
ab1ab3ab19 [CI] Specify more torch.backends.cudnn options to reduce non-determinism (#95478)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95478
Approved by: https://github.com/ezyang
2023-02-25 18:54:12 +00:00
Edward Z. Yang
b8151d2ba9 Utility for running delta comparisons between two flag configs (#95411)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95411
Approved by: https://github.com/Chillee
2023-02-25 02:30:22 +00:00
Bin Bao
4c8ad93a7c [Inductor][CI] Remove hf_GPT2_large from CPU inference test (#95473)
Summary: hf_GPT2_large shows random failure on CI for the CPU inference. Created https://github.com/pytorch/pytorch/issues/95474 for the Intel team to investigate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95473
Approved by: https://github.com/anijain2305
2023-02-24 18:21:36 +00:00
Will Constable
8de4238a31 Add dynamo bench arg --per_process_memory_fraction (#95260)
Simply pipes the arg to the existing torch.cuda API by the same name.

Useful for locally debugging OOMs that happened on a smaller GPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95260
Approved by: https://github.com/davidberard98
2023-02-22 05:11:18 +00:00
Edward Z. Yang
08370ddad8 Update model skips (#95089)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95089
Approved by: https://github.com/albanD
2023-02-20 13:24:49 +00:00
Wang, Eikan
954c767bc6 [Inductor] Enable accuracy test for CPPBackend (#94898)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94898
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-02-20 05:02:15 +00:00
Edward Z. Yang
a2f44d82f8 Flag guard unbacked SymInt/SymFloat support (#94987)
I believe this fixes the AllenaiLongformerBase problem in periodic.

The longer version of the problem is here is we are currently optimistically converting all item() calls into unbacked SymInt/SymFloat, but sometimes this results in a downstream error due to a data-dependent guard. Fallbacks for this case are non-existent; this will just crash the model. This is bad. So we flag guard until we get working fallbacks.

What could these fallbacks look like? One idea I have is to optimistically make data-dependent calls unbacked, but then if it results in a crash, restart Dynamo analysis with the plan of graph breaking when the item() call immediately happened.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94987
Approved by: https://github.com/Skylion007, https://github.com/malfet
2023-02-17 00:25:05 +00:00
Edward Z. Yang
5747a51657 Fix flaky StaticRuntime.Nonzero test (#94418)
If the operator produces a zero size tensor, the memory
may be equal to the original.  With nonzero, we would sometimes
get unlucky and everything was zero.

See failing tests at https://hud.pytorch.org/failure/%5B%20%20FAILED%20%20%5D%20StaticRuntime.Nonzero

Arguably we should also fix the seeding but it was less obvious
to me where to do that.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94418
Approved by: https://github.com/albanD
2023-02-16 21:25:15 +00:00
Edward Z. Yang
7aaebe00ee Fail dynamic_aot_eager AllenaiLongformerBase model (#94986)
```
GuardOnDataDependentSymNode: It appears that you're trying to get a value out of symbolic int/float whose value is data-dependent (and thus we do not know the true value.)  The expression we were trying to evaluate is Eq(i3, -1).  Scroll up to see where each of these data-dependent accesses originally occurred.

While executing %as_strided : [#users=1] = call_method[target=as_strided](args = (%pad,), kwargs = {size: (12, %add, 768, 64), stride: (%getitem, %mul, %getitem_1, %getitem_2)})
Original traceback:
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py", line 928, in <graph break in _sliding_chunks_matmul_attn_probs_value>
    chunked_value = padded_value.as_strided(size=chunked_value_size, stride=chunked_value_stride)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94986
Approved by: https://github.com/albanD
2023-02-16 20:02:46 +00:00
Wei Wang
5705199fb1 Update smoke test threshold (#94888)
https://github.com/pytorch/pytorch/pull/94249 touched upon what values we should set. It turns out 1.17 is too high, as seemingly innocent commits are failing to yield 1.17x. They yielded ~1.168x.
https://github.com/pytorch/pytorch/actions/runs/4180998255/jobs/7242758816
<img width="881" alt="image" src="https://user-images.githubusercontent.com/109318740/218951536-476d3764-1aa6-481b-bd92-f55d1c50e385.png">

Setting it to 1.165x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94888
Approved by: https://github.com/ngimel
2023-02-15 07:29:41 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Natalia Gimelshein
f2aee8b8d5 small fixes for mlir backend (#94717)
Fixes for skipped tests with mlir triton backend (will unskip once #94249 lands)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94717
Approved by: https://github.com/malfet, https://github.com/atalman
2023-02-13 22:42:53 +00:00
Aaron Gokaslan
0444a6c90a [BE] Remove deprecated logging warn method (#94708)
Swaps all logging.warn calls to logging.warning since the former is deprecated and even raises a deprecation warning now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94708
Approved by: https://github.com/ezyang
2023-02-13 18:24:52 +00:00
Edward Z. Yang
ae7a628b03 Dynamic shapes CI updates (#94690)
Data from https://github.com/pytorch/pytorch/pull/94683

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94690
Approved by: https://github.com/cpuhrsch
2023-02-13 18:20:12 +00:00
Nikita Shulga
4869929f32 Update Triton hash (#94249)
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition

Disable a few tests to make everything be correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Xuehai Pan
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00