Commit Graph

128 Commits

Author SHA1 Message Date
Michael Voznesensky
333e771394 Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94146
Approved by: https://github.com/ezyang
2023-02-08 01:18:38 +00:00
atalman
6e285c479d Remove cuda 11.6 from CI replace with 11.7 (#93406)
Remove cuda 11.6 from CI replace with 11.7
Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406
Approved by: https://github.com/malfet, https://github.com/desertfire
2023-02-02 19:16:05 +00:00
Edward Z. Yang
c52567ec18 Switch CI exclusions to use exact match. (#92761)
Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI."

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761
Approved by: https://github.com/jansel
2023-01-22 17:10:20 +00:00
Jason Ansel
7c1c239db1 [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575)
This reverts commit 94262efc7d to reland #91105 / #90738.

Fixes https://github.com/pytorch/torchdynamo/issues/2015

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575
Approved by: https://github.com/ngimel
2023-01-11 00:08:03 +00:00
blzheng
0c1777acec Dynamo benchmark: add CPU specific changes (#88477)
This pr adds some CPU specific changes:

- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-07 09:26:06 +00:00
Shunting Zhang
a5f32f8978 training support for dynamo+torchxla integration (#88449)
We've already shown some promising perf result by integrating dynamo with torchxla for inference. To provide consistent UX for training and for inference, in this PR we try to enable training for dynamo/torchxla.

Training is trickier than inference and we may not expect much perf gains since
1. in training case, torchxla only generate a single combined graph for fwd/bwd/optimizer while in `torchxla_trace_once` bridge we added in dynamo, due to how AOT_Autograd works, we will generate 3 graphs: one for forward, one for backward and one for the optimizer. XLA favors larger graph to do more optimizations.
2. in training case, tracing overhead can be overlapped with computation. Tracing overhead is not as a big deal for training as for inference. After all training cares more about throughput while inference cares more about latency.
3. in training case, people can increase batch size to 'mitigate' the tracing overhead. Increase batch size does not change tracing overhead, thus it shows like the tracing overhead 'per example' reduces.

But we still want to add training support to dynamo/torchxla to make the work complete.

We added '--iterations-per-run' argument to control how may iterations we do per measure/device sync. This is to understand the impact of item 2 above.

Results:

With '--iterations-per-run' equals to 1, here are the perf numbers:
```
+-------------------------+--------------------+-------------------------+
| Model                   |   XLA (trace once) |   XLA (trace everytime) |
+=========================+====================+=========================+
| resnet18                |             0.91   |                0.959    |
+-------------------------+--------------------+-------------------------+
| resnet50                |             0.917  |                0.932    |
+-------------------------+--------------------+-------------------------+
| resnext50_32x4d         |             0.912  |                0.905    |
+-------------------------+--------------------+-------------------------+
| alexnet                 |             1.038  |                0.974    |
+-------------------------+--------------------+-------------------------+
| mobilenet_v2            |             0.881  |                0.835    |
+-------------------------+--------------------+-------------------------+
| mnasnet1_0              |             0.903  |                0.931    |
+-------------------------+--------------------+-------------------------+
| vgg16                   |             0.914  |                0.967    |
+-------------------------+--------------------+-------------------------+
| BERT_pytorch            |             1.359  |                0.84     |
+-------------------------+--------------------+-------------------------+
| timm_vision_transformer |             1.288  |                0.893    |
+-------------------------+--------------------+-------------------------+
| geomean                 |             1.0006 |                0.913794 |
+-------------------------+--------------------+-------------------------+
```

Overall it looks like graph break indeed cause perf loss. But for BERT_pytorch and timm_vision_transformer we still see perf gain. We need do more experiments with larger '--iterations-per-run'

NOTE:
In torchbench.py I added the following code to do a few workaround:
```
from myscripts import workaround # TODO will remove this line before landing
```

Here are the content of workaround.py:
```
import torch
from torch import nn
import os

# override max_pool2d with avg_pool2d
if os.environ.get("REPLACE_MAXPOOL", "0") == "1":
    torch.nn.MaxPool2d = torch.nn.AvgPool2d

```

It work around a few issues we found
1. MaxPool2d does not work for training in dynamo/torchxla: https://github.com/pytorch/torchdynamo/issues/1837 . WIP fix from Brian in https://github.com/pytorch/pytorch/pull/90226 , https://github.com/pytorch/xla/pull/4276/files (WIP)
2. recent change ( this PR https://github.com/pytorch/pytorch/pull/88697 ) in op decomposition cause batch_norm ops to fallback in torchxla. Fix from jack in https://github.com/pytorch/xla/pull/4282#event-7969608134 . (confirmed the fix after adding Deduper to handle duplicated return from fx graph generated by AOTAutograd)
3. we have issue to handle dropout because of random seed out of sync issue. Here is the fix: https://github.com/pytorch/xla/pull/4293 (confirmed the fix)

Example command:
```
REPLACE_MAXPOOL=1 USE_FAKE_TENSOR=0 GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=aot_torchxla_trace_once --only vgg16
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88449
Approved by: https://github.com/wconstab, https://github.com/qihqi, https://github.com/malfet
2023-01-05 19:59:34 +00:00
PyTorch MergeBot
94262efc7d Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)"
This reverts commit d6dd2e97da.

Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds
2022-12-21 00:02:38 +00:00
Jason Ansel
d6dd2e97da [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)
https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land.  This is an exact copy that PR without using ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105
Approved by: https://github.com/ngimel
2022-12-20 02:38:23 +00:00
Edward Z. Yang
212873c615 Add dynamic shapes benchmark accuracy to CI (#90444)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444
Approved by: https://github.com/voznesenskym
2022-12-17 11:17:20 +00:00
PyTorch MergeBot
e2377c8300 Revert "Add dynamic shapes benchmark accuracy to CI (#90444)"
This reverts commit 85db031e60.

Reverted https://github.com/pytorch/pytorch/pull/90444 on behalf of https://github.com/ezyang due to lint failing
2022-12-17 07:18:07 +00:00
Edward Z. Yang
85db031e60 Add dynamic shapes benchmark accuracy to CI (#90444)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444
Approved by: https://github.com/voznesenskym
2022-12-17 06:39:45 +00:00
Michael Lazos
7c524221ba [reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-17 06:27:15 +00:00
PyTorch MergeBot
6bc6fb21db Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)"
This reverts commit 8bc38ae4e2.

Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures
2022-12-16 19:28:05 +00:00
Michael Lazos
8bc38ae4e2 [reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-16 13:33:38 +00:00
Bin Bao
ff1bbc2773 Revert "[reland][dynamo] use optimizers correctly in benchmarking (#87492)" (#90746)
This reverts commit d91d7a3221.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746
Approved by: https://github.com/anijain2305
2022-12-13 11:37:16 +00:00
Animesh Jain
d91d7a3221 [reland][dynamo] use optimizers correctly in benchmarking (#87492)
Reland https://github.com/pytorch/pytorch/pull/87311

mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492
Approved by: https://github.com/desertfire
2022-12-09 20:32:53 +00:00
Animesh Jain
3162a48a77 [dynamo][benchmarks] Call zero grad (#90026)
Hoping that it might reduce some flakiness

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
Animesh Jain
68805b08d1 [benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780)
Moving to train mode for TIMM models and also raising batch size for accuracy testing.

Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780
Approved by: https://github.com/ngimel
2022-11-30 12:57:35 +00:00
Xu Zhao
e4d9dbd7d2 Port torchdynamo's torchbench script to userbenchmark (#89239)
Summary:
This Diff ports the torchbench.py script from torchdynamo to torchbench to support the development of internal models.

Currently, only works with the `--only` option, and can only test one model at a time.

Note that the noisy logs are from upstream model code, not the benchmark code.
In the internal environment, `torch._dynamo.config.base_dir` is not writable, so we add an option to specify the output directory.

Test Plan:
```
$ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only ads_dhen_5x --part over --output-directory /tmp/tb-test/
cuda eval  ads_dhen_5x
  1/  1 +0 frames   2s  1 graphs  1 graph calls  412/ 411 = 100% ops 100% time
```

```
$  buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only cmf_10x --part over --output-directory /tmp/tb-test/
cuda eval  cmf_10x
  1/  1 +0 frames   1s  1 graphs  1 graph calls  306/ 305 = 100% ops 100% time
```

Reviewed By: jansel

Differential Revision: D41294311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89239
Approved by: https://github.com/jansel
2022-11-21 17:25:28 +00:00
Animesh Jain
cad5772c2c [dashboard][huggingface] skip accuracy checks for really large models… (#89273)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89273
Approved by: https://github.com/desertfire
2022-11-19 00:22:45 +00:00
Edward Z. Yang
d596b048e5 Also skip large models for normal --accuracy runs (#88086)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88086
Approved by: https://github.com/albanD
2022-11-01 00:59:09 +00:00
Will Constable
ee231671c0 Make torchbench setup a function (#87469)
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87469
Approved by: https://github.com/anijain2305
2022-10-21 19:58:38 +00:00
PyTorch MergeBot
f38a88c4dd Revert "[dynamo] use optimizers correctly in benchmarking (#87311)"
This reverts commit 703c19008d.

Reverted https://github.com/pytorch/pytorch/pull/87311 on behalf of https://github.com/anijain2305 due to Bin (desertfire) is trying to get torchbench models in CI, and this PR prevents that. I will bring this back after models are in CI.
2022-10-20 22:01:51 +00:00
Animesh Jain
703c19008d [dynamo] use optimizers correctly in benchmarking (#87311)
We were not setting optimizers correctly

* This hid the issue that we see here - https://github.com/pytorch/torchdynamo/issues/1687
* This has also revealed that we are activating profilers for every dynamo optimized model call. This could affect speedup

cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87311
Approved by: https://github.com/mlazos, https://github.com/yanboliang
2022-10-20 05:46:25 +00:00
Animesh Jain
c30cfb07ab [dynamo][dashboard] Run 2 iterations for the correctness runs (#87104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104
Approved by: https://github.com/soumith
2022-10-18 15:53:40 +00:00
Jason Ansel
054a2fd6c2 Sync changes from pytorch/torchdynamo (#87013)
This updates to:
6380959be2

Generated with:
https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013
Approved by: https://github.com/voznesenskym
2022-10-15 21:00:57 +00:00
Jason Ansel
8f71e8de7e Sync changes from pytorch/torchdynamo, enable tests (#86950)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950
Approved by: https://github.com/Chillee
2022-10-14 23:08:58 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00