mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Laith Sakka 39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93		2025-05-28 03:41:26 +00:00
..
ci_expected_accuracy	update torchvision pin (#154255 )	2025-05-27 16:15:25 +00:00
microbenchmarks	Add `flag _metrics_log_runtime` to disable runtime metric logging by default (#153506 )	2025-05-22 01:02:11 +00:00
pr_time_benchmarks	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 )	2025-05-28 03:41:26 +00:00
__init__.py
all_torchbench_models_list.txt
benchmarks.py	PEP585 update - benchmarks tools torchgen (#145101 )	2025-01-18 05:05:07 +00:00
cachebench.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
check_accuracy.py	Add timm_efficientnet to flaky models after cuda 12.6 update in CI/CD (#148788 )	2025-03-10 13:40:41 +00:00
check_csv.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
check_graph_breaks.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
check_memory_compression_ratio.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
check_perf_csv.py	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 )	2025-02-24 19:56:09 +00:00
combine_csv.py	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
common.py	update torchvision pin (#154255 )	2025-05-27 16:15:25 +00:00
dist_util.py	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
distributed.py	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
expected_ci_perf_inductor_torchbench.csv
expected_ci_speedup_inductor_torchbench_cpu.csv	[AOTI] Add a boxed_run API (#142213 )	2025-01-14 18:47:42 +00:00
huggingface_models_list_cpu.txt
huggingface_models_list.txt
huggingface.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
huggingface.yaml	change GPT2ForSequenceClassification inference accuracy tolerance (#136749 )	2024-10-12 01:12:28 +00:00
join_results.py	[BE] Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` / `torchgen/` with `ruff format` (#132577 )	2024-10-11 18:30:26 +00:00
Makefile	Clean up conda usage in benchmark scripts (#152552 )	2025-04-30 21:27:29 +00:00
parse_logs.py	[BE]: Enable ruff rule SIM113 (#147290 )	2025-02-16 22:41:16 +00:00
README.md
run_all.sh
run_delta.sh
runner.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
summarize_perf.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
test.py	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
timm_models_list_cpu.txt
timm_models_list.txt
timm_models.py	[cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt (#153556 )	2025-05-24 03:43:35 +00:00
timm_models.yaml	[dynamo][benchmarks] Stop benchmarking compile time of dead code (#145590 )	2025-01-29 22:14:47 +00:00
torchao_backend.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
torchbench_models_list_cpu.txt
torchbench_models_list.txt
torchbench.py	Revert "[Reopen] [Intel GPU] Set higher tolerance for some models only on XPU Device (#144756 )"	2025-04-17 11:09:01 +00:00
torchbench.yaml	Allow higher fp16 tolerance for phlippe_resnet on CUDA 12.8 (#154109 )	2025-05-22 14:25:12 +00:00
training_loss.py	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 )	2025-01-08 21:55:00 +00:00

README.md

`torch.compile()` Benchmarking

This directory contains benchmarking code for TorchDynamo and many backends including TorchInductor. It includes three main benchmark suites:

TorchBenchmark: A diverse set of models, initially seeded from highly cited research models as ranked by Papers With Code. See torchbench installation and torchbench.py for the low-level runner. Makefile also contains the commands needed to setup TorchBenchmark to match the versions used in PyTorch CI.
Models from HuggingFace: Primarily transformer models, with representative models chosen for each category available. The low-level runner (huggingface.py) automatically downloads and installs the needed dependencies on first run.
Models from TIMM: Primarily vision models, with representative models chosen for each category available. The low-level runner (timm_models.py) automatically downloads and installs the needed dependencies on first run.

GPU Performance Dashboard

Daily results from the benchmarks here are available in the TorchInductor Performance Dashboard, currently run on an NVIDIA A100 GPU.

The inductor-perf-test-nightly.yml workflow generates the data in the performance dashboard. If you have the needed permissions, you can benchmark your own branch on the PyTorch GitHub repo by:

Select "Run workflow" in the top right of the workflow
Select your branch you want to benchmark
Choose the options (such as training vs inference)
Click "Run workflow"
Wait for the job to complete (4 to 12 hours depending on backlog)
Go to the dashboard
Select your branch and commit at the top of the dashboard

The dashboard compares two commits a "Base Commit" and a "New Commit". An entry such as 2.38x → 2.41x means that the performance improved from 2.38x in the base to 2.41x in the new commit. All performance results are normalized to eager mode PyTorch (1x), and higher is better.

CPU Performance Dashboard

The TorchInductor CPU Performance Dashboard is tracked on a GitHub issue and updated periodically.

Running Locally

Raw commands used to generate the data for the performance dashboards can be found here.

To summarize there are three scripts to run each set of benchmarks:

./benchmarks/dynamo/torchbench.py ...
./benchmarks/dynamo/huggingface.py ...
./benchmarks/dynamo/timm_models.py ...

Each of these scripts takes the same set of arguments. The ones used by dashboards are:

--accuracy or --performance: selects between checking correctness and measuring speedup (both are run for dashboard).
--training or --inference: selects between measuring training or inference (both are run for dashboard).
--device=cuda or --device=cpu: selects device to measure.
--amp, --bfloat16, --float16, --float32: selects precision to use --amp is used for training and --bfloat16 for inference.
--cold-start-latency: disables caching to accurately measure compile times.
--backend=inductor: selects TorchInductor as the compiler backend to measure. Many more are available, see --help.
--output=<filename>.csv: where to write results to.
--dynamic-shapes --dynamic-batch-only: used when the dynamic config is enabled.
--disable-cudagraphs: used by configurations without cudagraphs enabled (default).
--freezing: enable additional inference-only optimizations.
--cpp-wrapper: enable C++ wrapper code to lower overheads.
TORCHINDUCTOR_MAX_AUTOTUNE=1 (environment variable): used to measure max-autotune mode, which is run weekly due to longer compile times.
--export-aot-inductor: benchmarks ahead-of-time compilation mode.
--total-partitions and --partition-id: used to parallel benchmarking across different machines.

For debugging you can run just a single benchmark by adding the --only=<NAME> flag.

A complete list of options can be seen by running each of the runners with the --help flag.

As an example, the commands to run first line of the dashboard (performance only) would be:

./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv
./benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend=inductor --output=torchbench_inference.csv

./benchmarks/dynamo/huggingface.py --performance --training --amp --backend=inductor --output=huggingface_training.csv
./benchmarks/dynamo/huggingface.py --performance --inference --bfloat16 --backend=inductor --output=huggingface_inference.csv

./benchmarks/dynamo/timm_models.py --performance --training --amp --backend=inductor --output=timm_models_training.csv
./benchmarks/dynamo/timm_models.py --performance --inference --bfloat16 --backend=inductor --output=timm_models_inference.csv

README.md

torch.compile() Benchmarking

GPU Performance Dashboard

CPU Performance Dashboard

Running Locally

`torch.compile()` Benchmarking