pytorch/benchmarks
Shunting Zhang aaba3a87b1 tune down batch-size for res2net to avoid OOM (#122977)
The batch-size for this model is 64 previously. Later on we change that to 256 and cause OOM in cudagraphs setting. This PR tune the batch size down to 128.

Share more logs from my local run
```
cuda,res2net101_26w_4s,128,1.603578,110.273572,335.263494,1.042566,11.469964,11.001666,807,2,7,6,0,0
cuda,res2net101_26w_4s,256,1.714980,207.986155,344.013071,1.058278,22.260176,21.034332,807,2,7,6,0,0
```

The log shows that torch.compile uses 11GB for 128 batch size and 21GB for 256 batch size. I guess the benchmark script has extra overhead cause the model OOM for 256 batch size in the dashboard run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122977
Approved by: https://github.com/Chillee
2024-03-30 03:54:53 +00:00
..
distributed [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027) 2023-12-20 19:35:08 +00:00
dynamo tune down batch-size for res2net to avoid OOM (#122977) 2024-03-30 03:54:53 +00:00
fastrnns Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
framework_overhead_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
functional_autograd_benchmark [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
fuser Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
gpt_fast made gpt_fast benchmark run faster (#122872) 2024-03-29 03:49:19 +00:00
inference Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
instruction_counts Use strict to toggle strict options in MYPYSTRICT (#118479) 2024-01-28 19:22:22 +00:00
nested Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
operator_benchmark highlight readme code block (#120228) 2024-02-22 21:23:08 +00:00
overrides_benchmark [BE]: Update ruff to 0.285 (#107519) 2023-08-22 23:16:38 +00:00
profiler_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
record_function_benchmark Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
serialization Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
sparse [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
static_runtime [PyTorch] fix mixed int32/int64 indices/offsets for embedding_bag_out (#120752) 2024-02-28 20:13:30 +00:00
tensorexpr [BE]: Enable F821 and fix bugs (#116579) 2024-01-01 08:40:46 +00:00
transformer Add an option to sdpa benchmark to specify backend (#122368) 2024-03-21 07:00:40 +00:00
compare-fastrnn-results.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00
compare.sh
README.md Add more child links to benchmark readme (#104627) 2023-07-06 12:11:00 +00:00
upload_scribe.py Apply UFMT to all files in benchmarks/ (#105928) 2023-07-26 01:18:48 +00:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: