pytorch/benchmarks
eellison 000d55870a Enable in oss (#124031)
Biggest movement is 4% HF inference, 9% TIMM inference. Note, this is max-autotune mode so we are more tolerant of compilation increases. We could improve compilation time by limiting:
```
# Take how many of the top triton kernels to benchmark epilogue
max_epilogue_benchmarked_choices = 3
```

There is a hf_Whisper failure which you can repro on main without this stack with `TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS=TRITON TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --backend inductor --amp --accuracy --training --only hf_Whisper`. When you turn off epilogue fusion, it fixes the accuracy. I bisected the failure to an epilogue, however when you compare the results of that epilogue with the corresponding separate kernels the results of the output are equivalent.

Inference:

<img width="1686" alt="image" src="https://github.com/pytorch/pytorch/assets/11477974/0b240080-cd33-4c08-89d3-583103b1fb0c">

Training:

<img width="1329" alt="Screenshot 2024-04-16 at 6 16 30 PM" src="https://github.com/pytorch/pytorch/assets/11477974/db0afcc9-7288-4c27-84ce-4fc1a5690788">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124031
Approved by: https://github.com/Chillee, https://github.com/shunting314
ghstack dependencies: #124030, #122642, #123229, #122825
2024-04-19 20:28:55 +00:00
..
distributed IntraNodeComm primitives for allgather_matmul (#118038) 2024-04-04 00:46:08 +00:00
dynamo Enable in oss (#124031) 2024-04-19 20:28:55 +00:00
fastrnns Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
framework_overhead_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
functional_autograd_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
fuser Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
gpt_fast [BE]: Optimize min/max/sum comprehensions C419 (#123960) 2024-04-12 23:54:15 +00:00
inference Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
instruction_counts Use strict to toggle strict options in MYPYSTRICT (#118479) 2024-01-28 19:22:22 +00:00
nested Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
operator_benchmark highlight readme code block (#120228) 2024-02-22 21:23:08 +00:00
overrides_benchmark [BE]: Update ruff to 0.285 (#107519) 2023-08-22 23:16:38 +00:00
profiler_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
record_function_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
serialization Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
sparse [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
static_runtime [PyTorch] fix mixed int32/int64 indices/offsets for embedding_bag_out (#120752) 2024-02-28 20:13:30 +00:00
tensorexpr [BE]: Optimize min/max/sum comprehensions C419 (#123960) 2024-04-12 23:54:15 +00:00
transformer Optimized templated attention to use exp2 (#124356) 2024-04-19 01:58:19 +00:00
compare-fastrnn-results.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
compare.sh
README.md Add more child links to benchmark readme (#104627) 2023-07-06 12:11:00 +00:00
upload_scribe.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: