pytorch/benchmarks
Eddie Yan cd380c794f [CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)
#113713

Going to clean up some of the checks and will remove draft status after.
Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`.

CC @drisspg @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663
Approved by: https://github.com/drisspg
2024-02-14 22:02:06 +00:00
..
distributed [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027) 2023-12-20 19:35:08 +00:00
dynamo [DDP] Change the --no-optimize-ddp flag to reflect the latest usage (#119437) 2024-02-13 16:53:56 +00:00
fastrnns Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
framework_overhead_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
functional_autograd_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
fuser Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
inference Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
instruction_counts Use strict to toggle strict options in MYPYSTRICT (#118479) 2024-01-28 19:22:22 +00:00
nested Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
operator_benchmark More efficient multi-threading in Softmax & LogSoftmax CPU kernels (#116367) 2024-01-17 02:26:29 +00:00
overrides_benchmark [BE]: Update ruff to 0.285 (#107519) 2023-08-22 23:16:38 +00:00
profiler_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
record_function_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
serialization Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
sparse [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
static_runtime fix some typos (#106018) 2023-07-26 18:14:44 +00:00
tensorexpr [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
transformer [CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663) 2024-02-14 22:02:06 +00:00
compare-fastrnn-results.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
compare.sh
README.md Add more child links to benchmark readme (#104627) 2023-07-06 12:11:00 +00:00
upload_scribe.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: