Commit Graph

75 Commits

Author SHA1 Message Date
Edward Z. Yang
b92a7afed9 Reclassify some dynamic aot_eager failures as static failures (#92376)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92376
Approved by: https://github.com/Chillee
2023-01-18 19:27:11 +00:00
Wu, Chunyuan
3aa6cec18c [dynamo] exclude reset_rng_state when measure timing (#92237)
Fixes inductor performance regression on CPU: https://github.com/pytorch/torchdynamo/issues/2027, https://github.com/pytorch/torchdynamo/issues/2028 and https://github.com/pytorch/torchdynamo/issues/2029.
The details are explained here: https://github.com/pytorch/torchdynamo/issues/2028#issuecomment-1381496678.

### Performance

- Model: lennard_jones
- Machine: IceLake (32 cores per socket)
- Configuration: single instance, 32 cores per instance
- jemalloc and iomp enabled

```bash
python benchmarks/dynamo/torchbench.py  --inductor-settings --inductor --performance --float32 -dcpu -n5000  --no-skip --dashboard --only=lennard_jones --quiet
```

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/chunyuan/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

Time before   regression | Time after regression | Time with this PR
-- | -- | --
0.00020483799744397402 | 0.0002818034990923479 | 0.00020241099991835654

</body>

</html>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92237
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-18 13:17:28 +00:00
Edward Z. Yang
fbbb19599a Update dynamic skips after #92076 (#92103)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92103
Approved by: https://github.com/voznesenskym, https://github.com/Chillee
2023-01-13 04:05:10 +00:00
Edward Z. Yang
74cbf058a5 Support --dynamic-ci-skips (#91893)
This makes it easier for us to run only the skipped benchmarks and
see if that actually started passing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91893
Approved by: https://github.com/albanD
2023-01-11 20:02:58 +00:00
Edward Z. Yang
d24324bf1d s/INDCUTOR/INDUCTOR/ (#91885)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91885
Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet
2023-01-11 12:28:19 +00:00
Edward Z. Yang
56ed976edf hrnet_w18, tts_angular works with dynamic shapes (#91891)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91891
Approved by: https://github.com/voznesenskym
2023-01-11 11:40:16 +00:00
blzheng
0c1777acec Dynamo benchmark: add CPU specific changes (#88477)
This pr adds some CPU specific changes:

- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-07 09:26:06 +00:00
Shunting Zhang
a5f32f8978 training support for dynamo+torchxla integration (#88449)
We've already shown some promising perf result by integrating dynamo with torchxla for inference. To provide consistent UX for training and for inference, in this PR we try to enable training for dynamo/torchxla.

Training is trickier than inference and we may not expect much perf gains since
1. in training case, torchxla only generate a single combined graph for fwd/bwd/optimizer while in `torchxla_trace_once` bridge we added in dynamo, due to how AOT_Autograd works, we will generate 3 graphs: one for forward, one for backward and one for the optimizer. XLA favors larger graph to do more optimizations.
2. in training case, tracing overhead can be overlapped with computation. Tracing overhead is not as a big deal for training as for inference. After all training cares more about throughput while inference cares more about latency.
3. in training case, people can increase batch size to 'mitigate' the tracing overhead. Increase batch size does not change tracing overhead, thus it shows like the tracing overhead 'per example' reduces.

But we still want to add training support to dynamo/torchxla to make the work complete.

We added '--iterations-per-run' argument to control how may iterations we do per measure/device sync. This is to understand the impact of item 2 above.

Results:

With '--iterations-per-run' equals to 1, here are the perf numbers:
```
+-------------------------+--------------------+-------------------------+
| Model                   |   XLA (trace once) |   XLA (trace everytime) |
+=========================+====================+=========================+
| resnet18                |             0.91   |                0.959    |
+-------------------------+--------------------+-------------------------+
| resnet50                |             0.917  |                0.932    |
+-------------------------+--------------------+-------------------------+
| resnext50_32x4d         |             0.912  |                0.905    |
+-------------------------+--------------------+-------------------------+
| alexnet                 |             1.038  |                0.974    |
+-------------------------+--------------------+-------------------------+
| mobilenet_v2            |             0.881  |                0.835    |
+-------------------------+--------------------+-------------------------+
| mnasnet1_0              |             0.903  |                0.931    |
+-------------------------+--------------------+-------------------------+
| vgg16                   |             0.914  |                0.967    |
+-------------------------+--------------------+-------------------------+
| BERT_pytorch            |             1.359  |                0.84     |
+-------------------------+--------------------+-------------------------+
| timm_vision_transformer |             1.288  |                0.893    |
+-------------------------+--------------------+-------------------------+
| geomean                 |             1.0006 |                0.913794 |
+-------------------------+--------------------+-------------------------+
```

Overall it looks like graph break indeed cause perf loss. But for BERT_pytorch and timm_vision_transformer we still see perf gain. We need do more experiments with larger '--iterations-per-run'

NOTE:
In torchbench.py I added the following code to do a few workaround:
```
from myscripts import workaround # TODO will remove this line before landing
```

Here are the content of workaround.py:
```
import torch
from torch import nn
import os

# override max_pool2d with avg_pool2d
if os.environ.get("REPLACE_MAXPOOL", "0") == "1":
    torch.nn.MaxPool2d = torch.nn.AvgPool2d

```

It work around a few issues we found
1. MaxPool2d does not work for training in dynamo/torchxla: https://github.com/pytorch/torchdynamo/issues/1837 . WIP fix from Brian in https://github.com/pytorch/pytorch/pull/90226 , https://github.com/pytorch/xla/pull/4276/files (WIP)
2. recent change ( this PR https://github.com/pytorch/pytorch/pull/88697 ) in op decomposition cause batch_norm ops to fallback in torchxla. Fix from jack in https://github.com/pytorch/xla/pull/4282#event-7969608134 . (confirmed the fix after adding Deduper to handle duplicated return from fx graph generated by AOTAutograd)
3. we have issue to handle dropout because of random seed out of sync issue. Here is the fix: https://github.com/pytorch/xla/pull/4293 (confirmed the fix)

Example command:
```
REPLACE_MAXPOOL=1 USE_FAKE_TENSOR=0 GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=aot_torchxla_trace_once --only vgg16
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88449
Approved by: https://github.com/wconstab, https://github.com/qihqi, https://github.com/malfet
2023-01-05 19:59:34 +00:00
Bin Bao
6bf0e3b697 [inductor] Check for BackendCompilerFailed on CI (#91634)
Summary: https://github.com/pytorch/pytorch/pull/91283/ skips certain
random triton failure on CI, but we need to check against the
BackendCompilerFailed exception type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91634
Approved by: https://github.com/ngimel
2023-01-03 22:38:29 +00:00
Animesh Jain
a32916190d buck-related minifier work (#91215)
Summary: Extending the minifier to generate buck target

Test Plan: N/A

Differential Revision: D42173893

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91215
Approved by: https://github.com/bertmaher, https://github.com/ngimel
2022-12-22 19:33:50 +00:00
Bin Bao
07c61685c8 [inductor] CI improvments (#91283)
Summary:
1) Setting torch.backends.cudnn.deterministic to True helps to
eliminate the eager_variance failures seen on CI
2) Skip Triton failure instead of retry
3) Some minor script cleanup is also included in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91283
Approved by: https://github.com/anijain2305
2022-12-22 15:37:43 +00:00
Michael Lazos
2f5759eaba Disable non-deterministic models for optimizers (#91149)
These two models are non-deterministic even with constant inputs + weights and sometimes fail with variations between the fp64 and fp32 models in CI very rarely as a result.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91149
Approved by: https://github.com/desertfire
2022-12-20 20:19:54 +00:00
Bin Bao
84e73e1269 [inductor] small CI improvements (#91140)
Summary: 1) Increase timm_model download retry times; 2) Skip certain
random triton failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91140
Approved by: https://github.com/williamwen42
2022-12-20 17:26:12 +00:00
Michael Lazos
07c340bb2a Remove debug code (#91148)
Removes some debug code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91148
Approved by: https://github.com/desertfire, https://github.com/williamwen42
2022-12-20 15:00:55 +00:00
Bin Bao
2a37ba8e81 [inductor] Add retry after benchmark test fails on CI (#90808)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90808
Approved by: https://github.com/malfet
2022-12-19 18:10:55 +00:00
Michael Lazos
1accd915a4 Re-enable optimizers (#90709)
Fixes
https://github.com/pytorch/pytorch/issues/90165
https://github.com/pytorch/torchdynamo/issues/328

Re-enables optimizer capture + compilation now that the dynamo slowdowns have been fixed

and it has speedups, numbers to come soon

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90709
Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/yanboliang
2022-12-19 04:07:41 +00:00
Edward Z. Yang
212873c615 Add dynamic shapes benchmark accuracy to CI (#90444)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444
Approved by: https://github.com/voznesenskym
2022-12-17 11:17:20 +00:00
PyTorch MergeBot
e2377c8300 Revert "Add dynamic shapes benchmark accuracy to CI (#90444)"
This reverts commit 85db031e60.

Reverted https://github.com/pytorch/pytorch/pull/90444 on behalf of https://github.com/ezyang due to lint failing
2022-12-17 07:18:07 +00:00
Edward Z. Yang
85db031e60 Add dynamic shapes benchmark accuracy to CI (#90444)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90444
Approved by: https://github.com/voznesenskym
2022-12-17 06:39:45 +00:00
Michael Lazos
7c524221ba [reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-17 06:27:15 +00:00
PyTorch MergeBot
6bc6fb21db Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)"
This reverts commit 8bc38ae4e2.

Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures
2022-12-16 19:28:05 +00:00
Michael Lazos
8bc38ae4e2 [reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… (#90956)
…king (#87492)" (#90746)"

This reverts commit ff1bbc2773.

This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-16 13:33:38 +00:00
Bin Bao
ad4189c8db [reland][inductor] Update TIMM skip list (#90762)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90762
Approved by: https://github.com/eellison
2022-12-13 19:56:23 +00:00
Bin Bao
ff1bbc2773 Revert "[reland][dynamo] use optimizers correctly in benchmarking (#87492)" (#90746)
This reverts commit d91d7a3221.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746
Approved by: https://github.com/anijain2305
2022-12-13 11:37:16 +00:00
PyTorch MergeBot
e37c8c8436 Revert "[inductor] Update TIMM skip list (#90188)"
This reverts commit fd3f5d7bf7.

Reverted https://github.com/pytorch/pytorch/pull/90188 on behalf of https://github.com/desertfire due to flaky accuracy failure
2022-12-12 15:31:50 +00:00
Edward Z. Yang
e1ed5ad5a5 Add a timeout to benchmark script (#90634)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90634
Approved by: https://github.com/voznesenskym
2022-12-11 23:12:29 +00:00
Jiong Gong
181d37475d Simple fix: add missing positional arg in init_optimizer() call (#90641)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90641
Approved by: https://github.com/kit1980
2022-12-11 13:18:05 +00:00
Bin Bao
fd3f5d7bf7 [inductor] Update TIMM skip list (#90188)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90188
Approved by: https://github.com/anijain2305
2022-12-09 21:30:23 +00:00
Animesh Jain
d91d7a3221 [reland][dynamo] use optimizers correctly in benchmarking (#87492)
Reland https://github.com/pytorch/pytorch/pull/87311

mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492
Approved by: https://github.com/desertfire
2022-12-09 20:32:53 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
David Berard
8f079b895b [Dynamo+FSDP] Update benchmarks with use_orig_params=True (#90100)
After https://github.com/pytorch/pytorch/pull/89523, we now need to assert use_orig_params=True, even in the non-recursive case where (I think) we wouldn't otherwise need to run with use_orig_params=True.

Tested with `python benchmarks/dynamo/torchbench.py --training --accuracy --only hf_T5 --fsdp`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90100
Approved by: https://github.com/wconstab
2022-12-07 03:33:58 +00:00
Richard Zou
4068c5467d [Reland] Move functorch/_src to torch/_functorch (#88756) (#90091)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091
Approved by: https://github.com/anijain2305, https://github.com/ezyang
2022-12-03 14:17:15 +00:00
Wang, Eikan
0bde810572 Add more debug information for Inductor (#90008)
- Add graph index to the profile information of the Inductor kernel for better debugability.

  The generated code for different graphs could produce kernels with the same name. The side effect is that it is hard to identify the portion of E2E performance for these kernels because the profiler will aggregate the performance with the same kernel name regardless of different graphs. Hence, this PR added the graph index to the profile information to address this limitation.

- Label arbitrary code ranges for `eager` and `opt` modes for better debugability

  The profile information of dynamo benchmarks mixes the eager mode and opt mode. It is hard to separate the range for different modes. This PR added eager and opt marks to the profile information to address this limitation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90008
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-12-02 09:34:48 +00:00
Animesh Jain
3162a48a77 [dynamo][benchmarks] Call zero grad (#90026)
Hoping that it might reduce some flakiness

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
Animesh Jain
68805b08d1 [benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780)
Moving to train mode for TIMM models and also raising batch size for accuracy testing.

Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780
Approved by: https://github.com/ngimel
2022-11-30 12:57:35 +00:00
PyTorch MergeBot
218d9c6e09 Revert "Move functorch/_src to torch/_functorch (#88756)"
This reverts commit 52bc5c1cfe.

Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests 52bc5c1cfe https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace
2022-11-29 17:17:11 +00:00
Richard Zou
52bc5c1cfe Move functorch/_src to torch/_functorch (#88756)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756
Approved by: https://github.com/ezyang
2022-11-29 13:55:42 +00:00
Bin Bao
465ee7bc09 [inductor] skip dm_nfnet_f0 in TIMM model test (#89768)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89768
Approved by: https://github.com/clee2000
2022-11-28 20:08:41 +00:00
Animesh Jain
cdf4087597 [benchmarks] Disabling gradscaler (#89741)
Disabling Gradscaler because
 1) Benchmark setup runs 2 iterations of fwd-bwd. So, not useful.
 2) Current setup shares grad_scaler for eager and dynamo model,
 which is bad as Gradscaler has state and can adjust the scaling
 factor between eager and dynamo run, making accuracy check
 harder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89741
Approved by: https://github.com/ngimel
2022-11-28 20:08:37 +00:00
Bin Bao
049a0f2cd5 [inductor] Update CI model tests (#89499)
Summary:
1) Add model inference test
2) Switch model training test to use AMP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
Approved by: https://github.com/bertmaher
2022-11-23 18:30:51 +00:00
Edward Z. Yang
ed32511974 Don't use explain() for --explain; instead read it off the counters (#89518)
Fixes huggingface problem where example_inputs is not actually the
args.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89518
Approved by: https://github.com/albanD
2022-11-23 02:43:53 +00:00
Will Constable
26322544b8 Add limited FSDP correctness to torchdynamo benchmark (#89469)
- Does not do recursive wrapping
- Only supports accuracy bench
- Mainly useful for sweeping over models for correctness, in part
  to evaluate whether dynamo support for FSDP is breaking anywhere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89469
Approved by: https://github.com/davidberard98, https://github.com/aazzolini
2022-11-23 00:19:36 +00:00
Animesh Jain
f281f435a8 Fix benchmarks - xla tensor test (#89509)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89509
Approved by: https://github.com/ngimel, https://github.com/shunting314
2022-11-22 18:42:13 +00:00
Shunting Zhang
e545caa50f dynamo/torchxla integration: trace on xla rather than eager (#88904)
In #87741 we added the inference support for dynamo/torchxla integration. Later on in #88449 we attempt to add the training support. That attempt is not smooth because
- we try 2 things together
   1. let dynamo trace the model on xla rather than eager
   2. enable training
- It turns out neither of these two tasks are trivial enough.

Furthermore, item 2 (enable training) depends on item 1 (tracing on xla). We enable training via AOTAutograd. AOTAutograd lift all model parameters/buffers as graph inputs. Without item 1 being done, we would need copy all graph inputs (including model parameters/buffers) from eager device to xla devices. That hurts performance a lot. Have a cache to map eager parameter to XLA parameter does not solve the problem since the update on either will not sync automatically to the other. They will easily go out of sync.

This PR let dynamo trace the model on XLA rather than eager. This is a preparation step to enabling training.

Also, tracing on XLA makes the data movement more efficient. We see 1.5x geomean speedup compared to previous 1.38x.
```
+-------------------------+--------------------+-------------------------+
| Model                   |   XLA (trace once) |   XLA (trace everytime) |
+=========================+====================+=========================+
| resnet18                |            1.38    |                 1.008   |
+-------------------------+--------------------+-------------------------+
| resnet50                |            1.227   |                 0.998   |
+-------------------------+--------------------+-------------------------+
| resnext50_32x4d         |            1.544   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| alexnet                 |            1.085   |                 1.045   |
+-------------------------+--------------------+-------------------------+
| mobilenet_v2            |            2.028   |                 1.013   |
+-------------------------+--------------------+-------------------------+
| mnasnet1_0              |            1.516   |                 0.995   |
+-------------------------+--------------------+-------------------------+
| squeezenet1_1           |            0.868   |                 1.01    |
+-------------------------+--------------------+-------------------------+
| vgg16                   |            1.099   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| BERT_pytorch            |            3.26    |                 1.027   |
+-------------------------+--------------------+-------------------------+
| timm_vision_transformer |            2.182   |                 1.015   |
+-------------------------+--------------------+-------------------------+
| geomean                 |            1.50389 |                 1.01261 |
+-------------------------+--------------------+-------------------------+
```

Example command
```
GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --only resnet18 --backend=torchxla_trace_once
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88904
Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/jansel
2022-11-22 03:57:04 +00:00
Xu Zhao
e4d9dbd7d2 Port torchdynamo's torchbench script to userbenchmark (#89239)
Summary:
This Diff ports the torchbench.py script from torchdynamo to torchbench to support the development of internal models.

Currently, only works with the `--only` option, and can only test one model at a time.

Note that the noisy logs are from upstream model code, not the benchmark code.
In the internal environment, `torch._dynamo.config.base_dir` is not writable, so we add an option to specify the output directory.

Test Plan:
```
$ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only ads_dhen_5x --part over --output-directory /tmp/tb-test/
cuda eval  ads_dhen_5x
  1/  1 +0 frames   2s  1 graphs  1 graph calls  412/ 411 = 100% ops 100% time
```

```
$  buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only cmf_10x --part over --output-directory /tmp/tb-test/
cuda eval  cmf_10x
  1/  1 +0 frames   1s  1 graphs  1 graph calls  306/ 305 = 100% ops 100% time
```

Reviewed By: jansel

Differential Revision: D41294311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89239
Approved by: https://github.com/jansel
2022-11-21 17:25:28 +00:00
Michael Voznesensky
631baecbcd Add --explain flag to bench (#89316)
TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 time python benchmarks/dynamo/torchbench.py  --accuracy --explain  --backend aot_eager --train --only BERT_pytorch

Dynamo produced 76 graphs with 75 graph break and 198 ops

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89316
Approved by: https://github.com/ezyang
2022-11-19 03:35:09 +00:00
Bin Bao
19fcb80551 [inductor] Skip DALLE2_pytorch in torchbench (#89288)
Summary: DALLE2_pytorch fails in eager as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89288
Approved by: https://github.com/Krovatkin
2022-11-18 16:21:17 +00:00
Bin Bao
1f7c0ff6e7 [inductor] Temporarily disable functorch_dp_cifar10 test in TorchBench (#89281)
Summary: The failure wasn't caught because of a land race. Skip the test
for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89281
Approved by: https://github.com/Krovatkin
2022-11-18 16:07:44 +00:00
Bin Bao
31b10e7d40 Enable inductor CI for TorchBench (#87465)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87465
Approved by: https://github.com/malfet
2022-11-17 23:16:21 +00:00
Animesh Jain
74610a1ced [dynamo][benchmarks] HF - Fix seq len and batch sizes (#89165)
Fixes many models in https://github.com/pytorch/torchdynamo/issues/1842
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89165
Approved by: https://github.com/ngimel
2022-11-17 06:14:24 +00:00