mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

drisspg 6b120c6cf9 Update the sdpa benchmark to measure forward backward time in isolation (#115986 ) # Summary The benchmarks were getting a little stale and I think it makes more sense to measure in isolation now rather than E2E in a mha component. This is a pre-req for getting the data for https://github.com/pytorch/pytorch/pull/115357 Output from run: ``` Shell +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ \| batch_size \| num_heads \| q_seq_len \| kv_seq_len \| embed_dim \| is_causal \| dtype \| forward_time \| backward_time \| +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ \| 1 \| 16 \| 128 \| 128 \| 2048 \| True \| torch.bfloat16 \| 23.86634959839284 \| 66.21150835417211 \| \| 1 \| 16 \| 128 \| 128 \| 2048 \| False \| torch.bfloat16 \| 23.452017060481012 \| 66.90612225793302 \| \| 1 \| 16 \| 256 \| 256 \| 2048 \| True \| torch.bfloat16 \| 24.478124547749758 \| 76.4232068322599 \| \| 1 \| 16 \| 256 \| 256 \| 2048 \| False \| torch.bfloat16 \| 24.6928428998217 \| 75.76151192188263 \| \| 1 \| 16 \| 512 \| 512 \| 2048 \| True \| torch.bfloat16 \| 28.69622849393636 \| 114.73898496478796 \| \| 1 \| 16 \| 512 \| 512 \| 2048 \| False \| torch.bfloat16 \| 34.399422979913645 \| 112.96746158041059 \| \| 1 \| 16 \| 1024 \| 1024 \| 2048 \| True \| torch.bfloat16 \| 65.4690912924707 \| 216.26344555988908 \| \| 1 \| 16 \| 1024 \| 1024 \| 2048 \| False \| torch.bfloat16 \| 88.57532404363155 \| 212.07790216431025 \| \| 8 \| 16 \| 128 \| 128 \| 2048 \| True \| torch.bfloat16 \| 11.582905380055308 \| 70.09557797573505 \| \| 8 \| 16 \| 128 \| 128 \| 2048 \| False \| torch.bfloat16 \| 12.068384909071026 \| 70.01491216942668 \| \| 8 \| 16 \| 256 \| 256 \| 2048 \| True \| torch.bfloat16 \| 31.671419646590945 \| 203.54910241439939 \| \| 8 \| 16 \| 256 \| 256 \| 2048 \| False \| torch.bfloat16 \| 33.0585768679157 \| 209.45609430782497 \| \| 8 \| 16 \| 512 \| 512 \| 2048 \| True \| torch.bfloat16 \| 87.43969700299202 \| 469.8729298543185 \| \| 8 \| 16 \| 512 \| 512 \| 2048 \| False \| torch.bfloat16 \| 123.9265550393611 \| 580.1084265112877 \| \| 8 \| 16 \| 1024 \| 1024 \| 2048 \| True \| torch.bfloat16 \| 561.1918237991632 \| 1181.655174586922 \| \| 8 \| 16 \| 1024 \| 1024 \| 2048 \| False \| torch.bfloat16 \| 884.2707145959139 \| 1662.4679416418073 \| +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115986 Approved by: https://github.com/mikaylagawarecki		2023-12-18 22:40:47 +00:00
..
cpp	[BE]: Remove useless lambdas (#113602 )	2023-11-14 20:06:48 +00:00
distributed	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
dynamo	Run eager adam optimizer in benchmarks where possible (#115445 )	2023-12-18 18:28:23 +00:00
fastrnns	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
framework_overhead_benchmark	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
functional_autograd_benchmark	[BE]: Update ruff to 0.285 (#107519 )	2023-08-22 23:16:38 +00:00
fuser	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
inference	Add standard deviation of metrics over runs to inference benchmark (#113309 )	2023-11-09 18:38:05 +00:00
instruction_counts	Fix some typos, mostly "that that" (#106901 )	2023-08-10 19:46:53 +00:00
nested	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
operator_benchmark	[BE]: Update ruff to 0.285 (#107519 )	2023-08-22 23:16:38 +00:00
overrides_benchmark	[BE]: Update ruff to 0.285 (#107519 )	2023-08-22 23:16:38 +00:00
profiler_benchmark	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
record_function_benchmark	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
serialization	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
sparse	Use weakref in storing tensors as keys (follow-up to #111470 ) (#112076 )	2023-10-30 19:16:05 +00:00
static_runtime	fix some typos (#106018 )	2023-07-26 18:14:44 +00:00
tensorexpr	[BE]: Remove useless lambdas (#113602 )	2023-11-14 20:06:48 +00:00
transformer	Update the sdpa benchmark to measure forward backward time in isolation (#115986 )	2023-12-18 22:40:47 +00:00
compare-fastrnn-results.py	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00
compare.sh
README.md	Add more child links to benchmark readme (#104627 )	2023-07-06 12:11:00 +00:00
upload_scribe.py	Apply UFMT to all files in benchmarks/ (#105928 )	2023-07-26 01:18:48 +00:00

README.md

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: