Commit Graph

42 Commits

Author SHA1 Message Date
Bin Bao
8a90249bc2 [inductor] Update triton pin (#114772)
Differential Revision: [D51761353](https://our.internmc.facebook.com/intern/diff/D51761353)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114772
Approved by: https://github.com/shunting314, https://github.com/atalman
2023-12-02 19:13:56 +00:00
Bin Bao
1f845d5898 [CI] Fix a REQUIRE_HIGHER_TOLERANCE comparison bug (#114870)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114870
Approved by: https://github.com/jansel
2023-11-30 21:11:15 +00:00
Bin Bao
212f668408 [CI] Remove CI skip list for inductor integration tests (#113446)
Summary: Switch to completely rely on checking against expected result files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113446
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/jansel
ghstack dependencies: #113574, #113575
2023-11-21 21:20:41 +00:00
Jane Xu
ac08022137 [BE][benchmarks] Minor comment cleanup, typos (#113898)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113898
Approved by: https://github.com/desertfire
2023-11-17 19:03:41 +00:00
eellison
605236af06 Force fp16 for vision_maskrcnn inference (#113110)
For fp16 for maskrcnn inference (doesnt support bf16). Also skip phi_1_5 in training - it OOMs even with batch size 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113110
Approved by: https://github.com/xmfan
2023-11-10 02:25:11 +00:00
Simon Fan
54c5f474a7 Forward rank and world size info to Torchbench models when using dynamo runner (#108438)
Adding support to pass rank and world_size to torchbench model, via its extra_args parameter: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L83C80-L83C90

This is used for models which distribute over multiple GPUs e.g. simple_gpt https://github.com/pytorch/benchmark/pull/1867

Also add an option to skip multiprocess only gpu models

Testing via `python benchmarks/dynamo/torchbench.py -d cuda --output=benchmark_logs/performance.csv --inference --performance --timing --print-memory --multiprocess --only simple_gpt`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108438
Approved by: https://github.com/Chillee
2023-09-14 21:01:20 +00:00
Elias Ellison
d960664842 Lower batch on cait_m36_384 (#106091)
The memory compression for this model is 0.9839, but we OOM w cudagraphs because we interleave the eager runs with cudagraph so it duplicates the memory bc of cudagraph memory pool.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106091
Approved by: https://github.com/anijain2305
2023-07-27 19:33:38 +00:00
Justin Chu
5ef023b05a [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429
Approved by: https://github.com/malfet
2023-07-19 04:46:37 +00:00
Akila Premachandra
1f1fb58b8a [dynamo] Fix TimmRunner typo in benchmarks (#104052)
Minor fix - removes extra n from TimmRunner class object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104052
Approved by: https://github.com/kit1980, https://github.com/malfet
2023-06-22 22:25:25 +00:00
Bin Bao
a2988c9e6a [CI] Switch inference accuracy and performance tests to bfloat16 (#103535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103535
Approved by: https://github.com/eellison
2023-06-17 00:24:37 +00:00
Animesh Jain
33a49eeae7 [benchmark] Flag to switch on activation checkpointing for HF models (#102557)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557
Approved by: https://github.com/ngimel, https://github.com/Chillee
2023-05-30 23:46:14 +00:00
Elias Ellison
e5e451a9db Update batch size for a couple models (#101837)
The memory compression for these models is at parity, but because we interleave timings between torch.compile and eager run memory is duplicated between between eager and cudagraphs pool and causes OOM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101837
Approved by: https://github.com/anijain2305
2023-05-19 19:09:59 +00:00
PaliC
e0fc24cdc5 add retries to inductor benchmark suite (#101019)
This pr accomplishes
1) Enables retries for downloading torchbenchmark and huggingface models in a similar method to how we do it for timm models right now.
2) creates a `_download_model` function for the hugging face and TIMM runners whose output I plan to use to preload the models somewhere if possible (please double check I'll be saving the right thing). Instead of retries, we plan to just add torchbench to a docker image as it is relatively small.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 3361a4c</samp>

> _We're the brave and bold coders of the `common.py` module_
> _We've made a handy function for downloading models_
> _We've shared it with our mates in the other runners_
> _So pull and push and try again, we'll get them all in time_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101019
Approved by: https://github.com/huydhn, https://github.com/desertfire
2023-05-16 21:41:50 +00:00
Bin Bao
34f681c13b [CI] Remove inductor skip list for timm_models (#98840)
Summary: check against the expected csv file instead of skipping tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840
Approved by: https://github.com/ezyang
2023-04-15 13:54:41 +00:00
Bin Bao
c4de7fdef5 [CI] Mark sebotnet33ts_256 as nondeterministic (#98356)
Summary: The goal is make sure the new dashboard doesn't give noisy
alert on this test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98356
Approved by: https://github.com/ezyang
2023-04-05 12:05:47 +00:00
Yanbo Liang
d305d4a57f [Dynamo] Fix TIMM benchmark compute_loss (#97423)
Fixes #97382

#95416 fixed a critical bug in dynamo benchmark, where AMP tests fall back to eager mode before that PR. However, after that PR, we found [a list of TIMM models amp + eager + training testing failed](https://docs.google.com/spreadsheets/d/1DEhirVOkj15Lu4UNawIUon9MqkVLaWqyT-DQPif5NHk/edit#gid=0).
Now we identified the root cause is: high loss values make gradient checking harder, as small changes in accumulation order upset accuracy checks. We should switch to the helper function ```reduce_to_scalar_loss``` which has been used by Torchbench tests.
After switching to ```reduce_to_scalar_loss```, TIMM models accuracy pass rate grows from 67.74% to 91.94% in my local test. The rest 5 failed models(ese_vovnet19b_dw, fbnetc_100, mnasnet_100, mobilevit_s, sebotnet33ts_256) need further investigation and handling, but I think it should be similar reason.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97423
Approved by: https://github.com/Chillee
2023-03-24 16:50:28 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Bin Bao
02792ff16f [CI] Make inductor-perf-test-nightly produce data for dashboard (#95685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95685
Approved by: https://github.com/ezyang, https://github.com/huydhn
2023-03-06 03:14:03 +00:00
Xuehai Pan
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
Michael Voznesensky
333e771394 Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94146
Approved by: https://github.com/ezyang
2023-02-08 01:18:38 +00:00
Edward Z. Yang
498c6ed8d8 Add missing format string (#93866)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93866
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-02-01 20:56:46 +00:00
soulitzer
f646126ecd Running timm benchmarks no longer silently retries (#93030)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93030
Approved by: https://github.com/eellison
2023-01-26 03:44:38 +00:00
Edward Z. Yang
c52567ec18 Switch CI exclusions to use exact match. (#92761)
Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI."

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761
Approved by: https://github.com/jansel
2023-01-22 17:10:20 +00:00
blzheng
0c1777acec Dynamo benchmark: add CPU specific changes (#88477)
This pr adds some CPU specific changes:

- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-07 09:26:06 +00:00
Bin Bao
84e73e1269 [inductor] small CI improvements (#91140)
Summary: 1) Increase timm_model download retry times; 2) Skip certain
random triton failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91140
Approved by: https://github.com/williamwen42
2022-12-20 17:26:12 +00:00
William Wen
7ebc45eadd [dynamo] Better error message for bad timm model name (#91049)
Fixes https://github.com/pytorch/torchdynamo/issues/1995

Running `python benchmarks/dynamo/timm_models.py --performance --float32 -dcuda --output=out.csv --training --inductor --only bad_model_name` gives
```
Traceback (most recent call last):
  File "benchmarks/dynamo/timm_models.py", line 338, in <module>
    main(TimmRunnner())
  File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 1660, in main
    return maybe_fresh_cache(run, args.cold_start_latency and args.only)(
  File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 833, in inner
    return fn(*args, **kwargs)
  File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 2000, in run
    ) = runner.load_model(device, model_name, batch_size=batch_size)
  File "benchmarks/dynamo/timm_models.py", line 215, in load_model
    raise RuntimeError(f"Failed to load model '{model_name}'")
RuntimeError: Failed to load model 'bad_model_name'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91049
Approved by: https://github.com/ezyang
2022-12-19 22:37:34 +00:00
Michael Lazos
7c524221ba [reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-17 06:27:15 +00:00
PyTorch MergeBot
6bc6fb21db Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)"
This reverts commit 8bc38ae4e2.

Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures
2022-12-16 19:28:05 +00:00
Michael Lazos
8bc38ae4e2 [reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-16 13:33:38 +00:00
eqy
57e2090e21 [Dynamo][TIMM][Benchmarks] Fix TIMM 0.8.0dev breaking the timm_models.py script's data config (#90404)
It seems `0.8.0dev` breaks the current argument passing by expecting a dictionary instead of a namespace after 0dadb4a6e9

CC @desertfire @ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90404
Approved by: https://github.com/ngimel
2022-12-15 22:21:19 +00:00
Bin Bao
ff1bbc2773 Revert "[reland][dynamo] use optimizers correctly in benchmarking (#87492)" (#90746)
This reverts commit d91d7a3221.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746
Approved by: https://github.com/anijain2305
2022-12-13 11:37:16 +00:00
Animesh Jain
d91d7a3221 [reland][dynamo] use optimizers correctly in benchmarking (#87492)
Reland https://github.com/pytorch/pytorch/pull/87311

mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492
Approved by: https://github.com/desertfire
2022-12-09 20:32:53 +00:00
Bin Bao
f7cdd3a7a0 [inductor] Use a large tolerance for botnet26t_256 (#90383)
Summary: botnet26t_256 shows random tolerance failure on CI. The root
cause of this randomness is still to-be-invesitgated, but let's use a
larger tolerance for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383
Approved by: https://github.com/ezyang
2022-12-07 19:35:06 +00:00
Animesh Jain
3162a48a77 [dynamo][benchmarks] Call zero grad (#90026)
Hoping that it might reduce some flakiness

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
Animesh Jain
68805b08d1 [benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780)
Moving to train mode for TIMM models and also raising batch size for accuracy testing.

Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780
Approved by: https://github.com/ngimel
2022-11-30 12:57:35 +00:00
Animesh Jain
1b575782a0 [dynamo][benchmarks] use fresh inductor cache and raise batch size wherever possible (#88044)
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88044
Approved by: https://github.com/ngimel
2022-10-30 17:10:17 +00:00
Bin Bao
57b36bf353 Bring back TIMM model inductor CI test (#87730)
Summary: https://github.com/pytorch/pytorch/pull/87588 has solved the
inductor compilation speed regression, so we can try to run TIMM models
with fewer shards and also enable pretained model downloading which
should resolve the flakyness we have seen previously.

cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87730
Approved by: https://github.com/anijain2305
2022-10-26 00:15:35 +00:00
Bin Bao
f047dadab9 Enable inductor CI for TIMM (#87462)
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87462
Approved by: https://github.com/anijain2305
2022-10-22 05:50:00 +00:00
PyTorch MergeBot
f38a88c4dd Revert "[dynamo] use optimizers correctly in benchmarking (#87311)"
This reverts commit 703c19008d.

Reverted https://github.com/pytorch/pytorch/pull/87311 on behalf of https://github.com/anijain2305 due to Bin (desertfire) is trying to get torchbench models in CI, and this PR prevents that. I will bring this back after models are in CI.
2022-10-20 22:01:51 +00:00
Animesh Jain
703c19008d [dynamo] use optimizers correctly in benchmarking (#87311)
We were not setting optimizers correctly

* This hid the issue that we see here - https://github.com/pytorch/torchdynamo/issues/1687
* This has also revealed that we are activating profilers for every dynamo optimized model call. This could affect speedup

cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87311
Approved by: https://github.com/mlazos, https://github.com/yanboliang
2022-10-20 05:46:25 +00:00
Animesh Jain
c30cfb07ab [dynamo][dashboard] Run 2 iterations for the correctness runs (#87104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104
Approved by: https://github.com/soumith
2022-10-18 15:53:40 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00