mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

henrylhtsang 00ebbbb701 [cutlass backend] add addmm and bmm for cutlass backend benchmark (#152163 ) Copying what @kadeng did. ``` FINAL results... Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+--------------------+----------------------+---------------------+ \| aten \| 44.454172253608704 \| 3.0991086587309837 \| NA \| \| triton \| 44.06978189945221 \| 0.07496077567338943 \| -0.8646890374284049 \| \| triton_persistent_tma \| 43.598245829343796 \| 0.06154991965740919 \| -1.9254130284597197 \| \| cutlass_lvl_default \| 39.91834074258804 \| 0.056073310784995556 \| -10.20338762612423 \| +-----------------------+--------------------+----------------------+---------------------+ Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.bfloat16 +-----------------------+-------------------+----------------------+---------------------+ \| name \| forward_time (us) \| compilation_time (s) \| perf_over_aten (%) \| +-----------------------+-------------------+----------------------+---------------------+ \| aten \| 49.05610531568527 \| 0.160279156640172 \| NA \| \| triton \| 43.97720843553543 \| 0.0660805031657219 \| -10.353241145961718 \| \| triton_persistent_tma \| 43.94153505563736 \| 0.061738294549286366 \| -10.425960697724962 \| \| cutlass_lvl_default \| 40.2066633105278 \| 0.034127906896173954 \| -18.039430460713596 \| +-----------------------+-------------------+----------------------+---------------------+ Average edge over aten (max(-edge, 0), higher is better): triton: 5.608965091695062 (from 2 valid values) triton_persistent_tma: 6.175686863092341 (from 2 valid values) cutlass_lvl_default: 14.121409043418913 (from 2 valid values) ``` Differential Revision: [D73625766](https://our.internmc.facebook.com/intern/diff/D73625766/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152163 Approved by: https://github.com/jingsh		2025-04-28 20:16:17 +00:00
..
distributed/ddp	[BE] Remove outdated RPC benchmark (#146716 )	2025-03-29 04:44:36 +00:00
dynamo	Enable max autotune for AOTInductor benchmark (#149309 )	2025-04-28 06:54:26 +00:00
fastrnns	[BE]: Enable ruff rule SIM113 (#147290 )	2025-02-16 22:41:16 +00:00
framework_overhead_benchmark	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
functional_autograd_benchmark	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
fuser	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
gpt_fast	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
inductor_backends	[cutlass backend] add addmm and bmm for cutlass backend benchmark (#152163 )	2025-04-28 20:16:17 +00:00
inference	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
instruction_counts	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
nested	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
operator_benchmark	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
overrides_benchmark	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
profiler_benchmark	Apply TorchFix TOR203 fixes (#143691 )	2024-12-23 18:21:03 +00:00
record_function_benchmark
serialization	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
sparse	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
static_runtime	[StaticRuntime] Fix a bug that memory planner ignores subblocks (#146728 ) (#146855 )	2025-02-11 13:59:54 +00:00
tensorexpr	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
transformer	Add sparsity (#148513 )	2025-03-07 01:47:52 +00:00
compare-fastrnn-results.py	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
compare.sh
README.md
upload_scribe.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00

README.md

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: