pytorch/benchmarks/dynamo
2023-08-21 04:34:24 +00:00
..
_onnx [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
ci_expected_accuracy benchmark: higher tolerance for RobertaForQuestionAnswering (#107376) 2023-08-21 04:34:24 +00:00
microbenchmarks [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
__init__.py
all_torchbench_models_list.txt Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146) 2023-02-08 01:18:38 +00:00
benchmarks.py [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
check_accuracy.py [CI] Use expected accuracy csv files to check benchmark test status (#98839) 2023-04-15 13:54:41 +00:00
check_csv.py Enable inductor CI for huggingface (#86792) 2022-10-21 01:38:46 +00:00
check_graph_breaks.py [CI] Use expected accuracy csv files to check benchmark test status (#98839) 2023-04-15 13:54:41 +00:00
check_hf_bert_perf_csv.py [CI] Bump up torchbench version to fix dynamo graph breaks in transformers (#98003) 2023-03-31 16:52:09 +00:00
check_memory_compression_ratio.py [inductor] Check memory compression ratio in model tests (#89305) 2023-01-30 22:01:06 +00:00
combine_csv.py [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
common.py benchmark: convert output of fp64 to torch.float64 (#107375) 2023-08-21 04:34:23 +00:00
dist_util.py [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587) 2023-02-11 18:19:48 +00:00
distributed.py [BE] Remove deprecated logging warn method (#94708) 2023-02-13 18:24:52 +00:00
expected_ci_perf_inductor_torchbench.csv [dynamo][hf_bigbird] Actually graph break on tensor.unsqueeze_/resize_ (#99986) 2023-04-26 18:50:06 +00:00
huggingface_models_list_cpu.txt tuned best BS with inductor on cpu for E2E models (#94181) 2023-02-09 13:32:57 +00:00
huggingface_models_list.txt [dynamo][benchmarks] HF - Fix seq len and batch sizes (#89165) 2022-11-17 06:14:24 +00:00
huggingface.py benchmark: higher tolerance for RobertaForQuestionAnswering (#107376) 2023-08-21 04:34:24 +00:00
Makefile Don't build Triton from source in benchmarks/dynamo/Makefile (#100613) 2023-05-04 06:13:42 +00:00
parse_logs.py [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
README.md Add more child links to benchmark readme (#104627) 2023-07-06 12:11:00 +00:00
run_all.sh Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146) 2023-02-08 01:18:38 +00:00
run_delta.sh Utility for running delta comparisons between two flag configs (#95411) 2023-02-25 02:30:22 +00:00
runner.py [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429) 2023-07-19 04:46:37 +00:00
summarize_perf.py Make summarize_perf.py work with perf-compare (#99095) 2023-04-14 12:10:54 +00:00
test.py [BE] Prefer dash over underscore in command-line options (#94505) 2023-02-09 20:16:49 +00:00
timm_models_list_cpu.txt update timm commit (#100931) 2023-05-12 04:22:08 +00:00
timm_models_list.txt update timm commit (#100931) 2023-05-12 04:22:08 +00:00
timm_models.py Lower batch on cait_m36_384 (#106091) 2023-07-27 19:33:38 +00:00
torchbench_models_list_cpu.txt tuned best BS with inductor on cpu for E2E models (#94181) 2023-02-09 13:32:57 +00:00
torchbench_models_list.txt
torchbench.py Install torchrec/fbgemm from source in CI (#106808) 2023-08-12 02:08:44 +00:00
training_loss.py Tag dynamo backends as debug/experimental (#93878) 2023-02-04 00:50:51 +00:00

Torchdynamo Benchmarks

What We Benchmark

TorchDynamo provides a benchmark harness that takes care of uniformly benchmarking different models. It interleaves runs of eager and dynamo to avoid machine noise/variability issues, and reports results based on medians along with P-values.

The runner integrates with models from TorchBenchmark, HuggingFace and TIMM suites and covers both training and inference.

The infrastructure allows us to specify a loss function. For torchbench models, we use .sum().backward() call in place of the native loss function. For TIMM models, we use a CrossEntropy loss. And HF models contain a loss function inside the model itself, so we don't need any special loss computation handling.

Training benchmarks approximate training by running the model forward, computing loss, running backward, and then the optimizer (SGD). Note: the optimizer is currently not compiled by Torchdynamo.

Inference benchmarks and Training benchmarks measure correctness by comparing dynamo and eager model outputs given fixed inputs and seeds.

Setup

Machine

We run benchmarks on AWS machines (p4d.24xlarge) using 8xNVidia A100 40GB cards. We suggest using Cuda 11.6 for consistency.

Benchmarks

Make sure to carefully follow the torchbench installation instructions, taking care to build the auxiliary libraries (torchvision, torchtext) from a matching version to your pytorch version.

For HF and TIMM models, the scripts already install the transformers and timm package respectively on the first run.

Runbook

Basic Usage

There are a lot of flags in the benchmark runner, and it can be confusing to know which settings to use or what machine to run it on. In order to support apples-to-apples comparison, we have provided the following 'standard' settings in runner.py. This script is a wrapper over the common benchmarking infrastructure and simplifies the flags. We will continually update runner.py with the latest and most relevant compilers for training and inference. It also provides some graph utilities to visualize and compare results. Some of the example commands are:

Inference Commands

  • Inference compilers on torchbench models - python benchmarks/dynamo/runner.py --suites=torchbench --inference --dtypes=float16
  • Inductor Inference compiler on torchbench models - python benchmarks/dynamo/runner.py --suites=torchbench --inference --dtypes=float16 --compilers=inductor

Training Commands

  • Training compilers on TIMM models - python benchmarks/dynamo/runner.py --suites=timm_models --training --dtypes=float32 --output-dir=timm_logs
  • AOTAutograd Training compiler on TIMM models - python benchmarks/dynamo/runner.py --suites=timm_models --training --dtypes=float32 --compilers=aot_nvfuser --output-dir=timm_logs
  • Inductor Training compiler on TIMM models - python benchmarks/dynamo/runner.py --suites=timm_models --training --dtypes=float32 --compilers=inductor --output-dir=timm_logs

Running runner.py generates a file named run.sh. This file contains the actual commands that invoke the common benchmarking infrastructure with the appropriate flags. Which brings us to the advanced usage.

Advanced Usage

One could directly call torchbench.py, huggingface.py or timm_models.py with the necessary flags. There are a lot of flags in the benchmarks runner. Some of the examples are as follows. These are subject to change.

Inference Commands

  • TorchScript (with TorchDynamo capture) NVFuser Inference - python benchmarks/dynamo/torchbench.py -dcuda -n100 --speedup-dynamo-ts --performance
  • TorchInductor CUDA Graphs Inference - python benchmarks/dynamo/torchbench.py -dcuda --float32 -n50 --inductor --performance

Training Commands

  • TorchScript (with TorchDynamo capture) NVFuser Training - python benchmarks/dynamo/torchbench.py --float32 -dcuda --training --nvfuser --speedup-dynamo-ts --performance
  • TorchInductor CUDA Graphs Training - python benchmarks/dynamo/torchbench.py --float32 -dcuda --training --inductor --performance

Above commands are for torchbench models. You can simply replace torchbench.py with huggingface.py for HF models, and timm_model.py for TIMM models.