Commit Graph

370 Commits

Author SHA1 Message Date
BowenBao
30f43e3d89 [ONNX][bench] Deepcopy model to another device before export to avoid OOM (#118710)
Prior to onnx export, the model is deepcopied to avoid modifications that may affect later performance profiling. However this increases the memory requirement on the device.
This PR modifies the script to deepcopy and export the model on another device when possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118710
Approved by: https://github.com/thiagocrepaldi
2024-01-31 23:03:39 +00:00
Simon Fan
ed0ec2e0be Remove dynamo runner's dependency on distributed build (#117903)
So that we can bisect faster without needing to rebuild distributed module. We remove the annotation to avoid flake8 undefined name lint

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117903
Approved by: https://github.com/xuzhao9
2024-01-24 06:51:14 +00:00
Bin Bao
4d625c1c92 [AOTI] Fix a bug in the torch._export.aot_load API (#118039)
Summary:
tree_flatten_spec should use args instead of *args

clone of https://github.com/pytorch/pytorch/pull/117948 but with some fbcode specific changes

Test Plan: CI

Differential Revision: D52982401

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118039
Approved by: https://github.com/angelayi
2024-01-23 14:54:02 +00:00
Michael Lazos
f302a0d380 Re-enable SGD (#117434)
Re-enables the SGD optimizer now that compile times are more reasonable. [Benchmark run](https://github.com/pytorch/pytorch/actions/runs/7511073761)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117434
Approved by: https://github.com/anijain2305, https://github.com/janeyx99
2024-01-19 04:28:50 +00:00
Bin Bao
26956980c6 [AOTI] Add torch._export.aot_load (#117610)
Summary: Add a torch._export.aot_load API that can load an AOTInductor-compiled model.so into a python executable.

Test Plan: CI

Differential Revision: D52825456

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117610
Approved by: https://github.com/angelayi, https://github.com/khabinov, https://github.com/chenyang78
2024-01-18 15:02:16 +00:00
PyTorch MergeBot
b0084be114 Revert "Re-enable SGD (#117434)"
This reverts commit e7fac72be7.

Reverted https://github.com/pytorch/pytorch/pull/117434 on behalf of https://github.com/lezcano due to breaks test_profiler.py when run with dynamo ([comment](https://github.com/pytorch/pytorch/pull/117434#issuecomment-1898311961))
2024-01-18 11:37:36 +00:00
Michael Lazos
e7fac72be7 Re-enable SGD (#117434)
Re-enables the SGD optimizer now that compile times are more reasonable. [Benchmark run](https://github.com/pytorch/pytorch/actions/runs/7511073761)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117434
Approved by: https://github.com/anijain2305, https://github.com/janeyx99
2024-01-18 06:47:15 +00:00
Simon Fan
4b25948ee6 Torchbench Dynamo Runner: Enable DDP for perf test and traces (#113332)
- Removes an outdated assert that prevents perf tests from running DDP, we now have single node --multiprocess and perf tests are already wrapping the model using `deepcopy_and_maybe_ddp`
- Append rank name to traces to avoid all ranks trying to create the same file
- Renames `deepcopy_and_maybe_ddp` to `deepcopy_and_maybe_parallelize` to include FSDP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113332
Approved by: https://github.com/H-Huang, https://github.com/wconstab
2024-01-12 22:41:09 +00:00
Simon Fan
88bf84f106 [benchmark] add --compile-autograd to dynamo benchmarks (#117196)
Adds `--compile-autograd` flag to benchmark suite to run accuracy and performance tests. Also adds autograd_captures and autograd_compiles to dynamo stats

e.g. accuracy_inductor.csv
```
dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles
cuda,BERT_pytorch,4,pass,2655,2,8,7,1,1
cuda,Background_Matting,4,pass_due_to_skip,0,0,0,0,0,0
cuda,DALLE2_pytorch,0,eager_fail_to_run,0,0,0,0,0,0
cuda,LearningToPaint,4,pass,639,2,8,7,1,1
...
```

e.g. speedup_inductor.csv
```
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles
cuda,hf_T5,8,1.214311,136.236793,88.350570,0.751322,18.754706,24.962275,3298,2,8,8,1,1
cuda,hf_T5,8,1.226645,135.431856,52.461461,1.040973,18.754706,18.016508,795,1,7,7,0,0
...
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117196
Approved by: https://github.com/jansel
2024-01-11 20:12:58 +00:00
Bin Bao
7e9cbc6834 [CI] Catch more exception types when running eager in PT2 tests (#117120)
Summary: https://github.com/pytorch/pytorch/actions/runs/7467073391/job/20320251143#step:16:1332 shows a case where model loading fails with KeyError but the error is not logged in the report csv file, which can cause an eager model failure silently ignored in the PT2 integration test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117120
Approved by: https://github.com/huydhn
2024-01-11 17:46:11 +00:00
Bin Bao
b8374314cc [AOTI] Update AOTI runner util (#116971)
Summary: Update the runner used in integration tests after https://github.com/pytorch/torchrec/pull/1604

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116971
Approved by: https://github.com/chenyang78
2024-01-09 19:07:54 +00:00
Bin Bao
640d46f823 [inductor] Control the cpp_wrapper mode with an env variable (#116615)
Summary: also add one model test for the cpp_wrapper mode on CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116615
Approved by: https://github.com/angelayi
2024-01-02 21:50:25 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Isuru Fernando
a254fbfd61 Initialize variable for all codepaths in dynamo benchmarks (#116260)
Sometimes, the first statement that sets this variable in the try block fails due to out of memory issues and the finally block tries to delete this variable, but it was not written to in the first place.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116260
Approved by: https://github.com/lezcano
2023-12-26 05:15:39 +00:00
BowenBao
259b0af367 [ONNX] Add copy before export for perf bench to avoid mutating base model (#115945)
Otherwise base model might be mutated and affects the performance measured.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115945
Approved by: https://github.com/justinchuby, https://github.com/titaiwangms
2023-12-21 01:20:46 +00:00
Michael Lazos
be90b757d9 Enable compiled Adam in the benchmarks (#116093)
Commit b697bcc583 of mlazos/compiled-adam2 at https://hud.pytorch.org/benchmark/compilers
is an initial benchmark run

Increases compile time by 20s for torchbench and HF, and 30s for TIMM

I expect the compile time to come down significantly with fake tensor prop caching

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116093
Approved by: https://github.com/janeyx99
2023-12-21 00:17:36 +00:00
Michael Lazos
80b1ecc308 Run eager adam optimizer in benchmarks where possible (#115445)
Runs eager Adam (instead of SGD) on all models that don't fail accuracy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115445
Approved by: https://github.com/desertfire
2023-12-18 18:28:23 +00:00
BowenBao
7e6ec8d3db [ONNX] Add proper iobinding synchronize for ONNX cuda bench (#115773)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115773
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #115670, #115673
2023-12-15 00:37:32 +00:00
BowenBao
823523acc0 [ONNX] Dump sarif diagnostics for failed onnx exports in benchmark (#115673)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115673
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #115670
2023-12-15 00:37:32 +00:00
BowenBao
0959e67de3 [ONNX] Set correct cuda.current_device for multi-device onnx performance bench (#115670)
Otherwise `torch.cuda.synchronize()` works on a different device from the one that
runs PyTorch model, which lead to incorrect performance number.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115670
Approved by: https://github.com/thiagocrepaldi
2023-12-15 00:37:32 +00:00
haozhe.zhu
6500ccebd7 enable fp16 autocast for dynamo benchmark (#114088)
`--amp` to enable amp path for` CUDA` (default amp_dtype will be float16) and `CPU` (default amp_dtype will be bfloat16).

If users set `--amp_dtype`, the amp_dtype from users will have the highest priority.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114088
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-12-14 12:38:44 +00:00
Bin Bao
26266c9718 [CI] Call torch.cuda.empty_cache to release device memory (#114663)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114663
Approved by: https://github.com/eellison
2023-12-10 21:27:42 +00:00
Jason Ansel
694cc6af56 [benchmarks] Fix NameError: name 'args' is not defined (#115494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115494
Approved by: https://github.com/Skylion007, https://github.com/desertfire
2023-12-10 21:22:21 +00:00
Bin Bao
81b565b142 [CI] Fix a missing write_csv_when_exception problem (#115370)
Summary: Fix a problem shown in https://github.com/pytorch/pytorch/actions/runs/7124839624/job/19400589129 when a model times out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115370
Approved by: https://github.com/eellison
2023-12-08 18:09:53 +00:00
Bin Bao
5f939e32e3 [CI] Log load_model failures in csv (#114784)
Summary: Right now when load_model fails (either because of loading error or validation eager run failure), the result won't be logged in generated csv files. Let's log them in csv so that they are monitored by the expected results checking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114784
Approved by: https://github.com/malfet
2023-12-06 15:19:16 +00:00
BowenBao
b9c4fb68c5 [ONNX][Bench] Fix model name retrieval and remove unused argument (#115108)
Might be some upstream updates, the previous hack starts to not pick up model names, updating to use the other more appropriate variable.
Also fix a bug with an unused argument that was supposed to be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115108
Approved by: https://github.com/thiagocrepaldi
2023-12-05 23:55:12 +00:00
BowenBao
77c4565d58 [ONNX][Bench] Remove double export and session init in perf test (#114907)
Previously both `optimize_ctx` call and `experiment` call will do export and session creation, ending up doubling the resource cost. This PR makes `experiment` call re-use the onnx model created by `optimize_ctx`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114907
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #110178
2023-12-02 00:17:07 +00:00
BowenBao
baeb0705fe [ONNX][Bench] Add warmup for onnx cuda runs (#114821)
Increases perf accuracy especially for low iteration runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114821
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #112179, #114767
2023-11-30 20:41:44 +00:00
BowenBao
c1e51fcbfc [ONNX][Bench] Relax tolerance for cuda accuracy check (#114767)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114767
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #112179
2023-11-30 04:43:46 +00:00
Bin Bao
ffa974b940 [CI] Dump more detailed error msg in PT2 integration tests (#114683)
Summary: Sometimes a PT2 CI test shows as both pass and infra_error, e.g. https://github.com/pytorch/pytorch/actions/runs/7015184949/job/19086433407. Add more logging to investigate what has happened.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114683
Approved by: https://github.com/eellison
2023-11-29 18:44:23 +00:00
Bin Bao
11277cc510 [CI] Remove an exception catching for Triton compiler error (#113064)
Summary: The workaround was there when Triton compiler was at its early stage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113064
Approved by: https://github.com/eellison
2023-11-28 23:46:30 +00:00
Aaron Gokaslan
9f073ae304 [BE][Easy]: add some PLR pylint checks and exclusions to ruff (#114519)
Add a couple of additional checks and exclusions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114519
Approved by: https://github.com/jansel
2023-11-28 20:49:03 +00:00
BowenBao
bebe66e262 [ONNX] Benchmark to save sample inputs to disk before running (#114163)
Such that even if failures occur during model run, the sample inputs
are accessible for later investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114163
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #113780
2023-11-22 05:39:00 +00:00
Bin Bao
6ff7260700 [CI] Switch to check against expected result files for cpu inductor integration tests (#113668)
Summary: With this, we can completely remove CI_SKIP from common.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113668
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #113574, #113575, #113446, #113559
2023-11-21 21:20:47 +00:00
Bin Bao
a9f9f98e2f [CI] Switch to check against expected result files for dynamo_eager and aot_eager benchmark tests (#113559)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113559
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #113574, #113575, #113446
2023-11-21 21:20:47 +00:00
Bin Bao
212f668408 [CI] Remove CI skip list for inductor integration tests (#113446)
Summary: Switch to completely rely on checking against expected result files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113446
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/jansel
ghstack dependencies: #113574, #113575
2023-11-21 21:20:41 +00:00
leslie-fang-intel
fb3bc3949a [Inductor] remove GPT2ForSequenceClassification from ci skip list (#112100)
**Summary**
As discussed in https://github.com/pytorch/pytorch/issues/109019, the accuracy issue of `GPT2ForSequenceClassification` has been fixed in https://github.com/pytorch/pytorch/pull/108690. Remove it from CI Skip list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112100
Approved by: https://github.com/lezcano
2023-11-19 05:12:18 +00:00
BowenBao
b169f04170 [ONNX] Fix bench w/ iobinding; Remove cpu fallback (#113703)
Summary
- `TORCH_TO_NUMPY_DTYPE` was misplaced previously hence subclasses cannot access it.
- Remove cpu fallback when benching onnx with gpu, expose gpu run failures properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113703
Approved by: https://github.com/thiagocrepaldi
ghstack dependencies: #113404, #113697
2023-11-18 01:33:06 +00:00
Jane Xu
ac08022137 [BE][benchmarks] Minor comment cleanup, typos (#113898)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113898
Approved by: https://github.com/desertfire
2023-11-17 19:03:41 +00:00
eellison
605236af06 Force fp16 for vision_maskrcnn inference (#113110)
For fp16 for maskrcnn inference (doesnt support bf16). Also skip phi_1_5 in training - it OOMs even with batch size 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113110
Approved by: https://github.com/xmfan
2023-11-10 02:25:11 +00:00
Bin Bao
f6c00b16c8 [aotinductor] Update the benchmarking script to clone an eager model (#113046)
Summary: fix https://github.com/pytorch/pytorch/issues/113029 where running a model in eager somehow can change a weight stride

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113046
Approved by: https://github.com/angelayi
2023-11-08 22:05:03 +00:00
William Wen
ad1c3467e2 [dynamo] run guard fail hooks for each cache entry for which there is a cache miss (#110325)
Attempt number 2 at https://github.com/pytorch/pytorch/issues/108950.

Improves debugging for guard failures/recompilations by:
- only running guard fail reason generation during recompilation, instead of when a guard fails during dynamo cache lookup (so generating guard failure reasons is not on the critical path)
- ~~always reporting all guard failures~~ Reports the first-failing guard failure for each cache entry.

We don't expect a performance hit since the guard fail reasons are only generated at recompile time rather than runtime. Perf benchmark to check this (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri,%2027%20Oct%202023%2017:42:43%20GMT&stopTime=Fri,%2003%20Nov%202023%2017:42:43%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=gh/williamwen42/62/head&lCommit=f4724f5ffc6d17ceae513a42fc18627be7b85482&rBranch=main&rCommit=29f3d392bf230072e3bffae37b078e770cae1956). We may also need to verify this on benchmarks where guard fails are common.

Sample script:
```python
import torch
def generate_data(b):
    return (
        torch.randn(b, 3, 32, 32).to(torch.float32).cuda(),
        torch.randint(1000, (b,)).cuda(),
    )

from torchvision.models import resnet18
def init_model():
    return resnet18().to(torch.float32).cuda()

model = init_model()
model_opt = torch.compile(model, dynamic=False)

for b in range(16, 32):
    data = generate_data(b)
    model_opt(data[0])
```

Sample logs:
```bash
(/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ python playground5.py
/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
[2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING]    function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284)
[2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING]    last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24
[2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
(/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ TORCH_LOGS="recompiles" python playground5.py
/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
[2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 17
[2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 18
[2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 18
[2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 19
[2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 19
[2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 19
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 19, actual 20
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 20
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 20
[2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 20
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 20, actual 21
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 19, actual 21
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 21
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 21
[2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 21
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 21, actual 22
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 20, actual 22
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 19, actual 22
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 22
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 22
[2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 22
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 22, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 21, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 20, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 19, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 23
[2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 23
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 23, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 22, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 21, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 20, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 19, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 18, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 17, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 16, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING]    function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284)
[2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING]    last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24
[2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
[2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 25
[2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 26
[2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 26
[2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 26, actual 27
[2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 27
[2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 27
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 27, actual 28
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 26, actual 28
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 28
[2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 28
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 28, actual 29
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 27, actual 29
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 26, actual 29
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 29
[2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 29
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 29, actual 30
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 28, actual 30
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 27, actual 30
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 26, actual 30
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 30
[2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 30
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     triggered by the following guard failure(s):
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 30, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 29, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 28, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 27, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 26, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 25, actual 31
[2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG]     - tensor 'L['x']' size mismatch at index 0. expected 24, actual 31
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110325
Approved by: https://github.com/ezyang, https://github.com/jon-chuang
2023-11-07 20:10:59 +00:00
Peter Bell
65ecb36621 Move ShapeEnv config out of dynamo (#112933)
Previously there was a circular dependency between fx and dynamo that happened
to work out since ShapeEnv didn't access the config at module init time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112933
Approved by: https://github.com/ezyang
2023-11-07 01:10:25 +00:00
Thiago Crepaldi
eefe327b11 Rename torch.onnx.ExportOutput* to ONNXProgram* (#112263)
Since PyTorch 2.1, torch.export API was introduced and the term "export"
got overloaded due to the already existing torch.onnx.export API.

The torch.onnx.dynamo_export API was introduced on pyTorch 2.0 and it
exposed a torch.onnx.ExportOutput which now can be confused with
torch.export.export output

To prevent such ambiguity and standardize names around the new
torch.export.ExportedProgram, this PR renames torch.onnx.ExportOutput to
torch.onnx.ONNXProgram

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112263
Approved by: https://github.com/BowenBao
ghstack dependencies: #112444
2023-11-06 22:27:15 +00:00
angelayi
ff35e1e45b [pytree] Add custom treespec fqn field (#112428)
Custom classes that are serialized with pytree are serialized by default with `f”{class.__module__}.{class.__name__}”`. This is a dependency from our serialized program directly into the outer Python environment. If a user moves the class to a different directory, the serialized program will be unable to be loaded. So, we will require users to pass in an FQN if they want to serialize their custom treespec type.

Differential Revision: [D50886366](https://our.internmc.facebook.com/intern/diff/D50886366)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112428
Approved by: https://github.com/suo
2023-11-02 00:26:41 +00:00
Shunting Zhang
a1e222ef02 metric table (#109245)
In dynamo/inductor, sometimes it helps to gather metrics/statistics for each model in different levels like model level, graph level, kernel level or pair of fusion nodes level. This kind of thing will be very easy to do with Scuba, but we only have scuba in fbcode. This PR build metric tables to solve part of the problem.

Q: why not log to stdout/err direclty
A: sometimes we need more structured data. E.g., it would be helpful to gather all the stats in a CSV and then do post-processing (like calculating a geomean etc.). Also metric table will tag each row with the model name which is helpful.

Q: what's the difference with speedup_indcutor.csv
A: speedup_indcutor.csv is a special case that gather statistics on model level: i.e., we have one row for each model. But recording statistics on finer grain level like graph etc. is also helpful.

Example use cases:
- As a followup on the bechmark fusion PR, I want to gather all the 'slow' fusion and analyze them. With the metric table, I can easily log slow fusion for each model into a csv file. Here is the log gathered for huggingface:
 https://gist.github.com/shunting314/964e73cc98368b301414ec7b7ad4c702 .
- To help understand the effect of 'loop ordering after fusion' PR, it would be helpful to gather stats like how many fusions happens for each graph. Previously we log the metric to stderr directly. But logging these metrics in a structural way is useful.
- gather number of registers, register spills, shared memory usage for each kernel in each model with runnable kernel code logged.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109245
Approved by: https://github.com/jansel, https://github.com/mlazos
2023-11-01 02:33:42 +00:00
Peter Bell
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
angelayi
b126adcdee [aotinductor] Pass TorchIR to AOTInductor (#110020)
Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code.
Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020
Approved by: https://github.com/desertfire
2023-10-26 15:54:31 +00:00
Simon Fan
9e6c97890b Dynamo runner: add FSDP handcrafted module wrapping policy (#111505)
The default size based auto wrap policy may not be representative of actual usage of the models. We add support for a few handpicked models, and fallback to the size based policy.

sample command:
`PYTHONPATH=~/benchmark/ python benchmarks/dynamo/torchbench.py -dcuda --training --backend=inductor --multiprocess --performance --only nanogpt --fsdp`

1.257x
1.256x
1.257x
1.252x
1.257x
1.262x
1.258x
1.272x

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111505
Approved by: https://github.com/H-Huang, https://github.com/xuzhao9
2023-10-25 03:05:31 +00:00
BowenBao
ad4971c0b1 Delete deepcopied model after use in benchmark to reduce memory consumption (#111868)
As title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111868
Approved by: https://github.com/msaroufim, https://github.com/thiagocrepaldi
ghstack dependencies: #111867, #111593
2023-10-24 23:44:14 +00:00