mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Copying what @kadeng did. ``` FINAL results... Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.float16 +-----------------------+--------------------+----------------------+---------------------+ | name | forward_time (us) | compilation_time (s) | perf_over_aten (%) | +-----------------------+--------------------+----------------------+---------------------+ | aten | 44.454172253608704 | 3.0991086587309837 | NA | | triton | 44.06978189945221 | 0.07496077567338943 | -0.8646890374284049 | | triton_persistent_tma | 43.598245829343796 | 0.06154991965740919 | -1.9254130284597197 | | cutlass_lvl_default | 39.91834074258804 | 0.056073310784995556 | -10.20338762612423 | +-----------------------+--------------------+----------------------+---------------------+ Experiment group: bmm (BS: 8, 1024x1024, 1024x1024) torch.bfloat16 +-----------------------+-------------------+----------------------+---------------------+ | name | forward_time (us) | compilation_time (s) | perf_over_aten (%) | +-----------------------+-------------------+----------------------+---------------------+ | aten | 49.05610531568527 | 0.160279156640172 | NA | | triton | 43.97720843553543 | 0.0660805031657219 | -10.353241145961718 | | triton_persistent_tma | 43.94153505563736 | 0.061738294549286366 | -10.425960697724962 | | cutlass_lvl_default | 40.2066633105278 | 0.034127906896173954 | -18.039430460713596 | +-----------------------+-------------------+----------------------+---------------------+ Average edge over aten (max(-edge, 0), higher is better): triton: 5.608965091695062 (from 2 valid values) triton_persistent_tma: 6.175686863092341 (from 2 valid values) cutlass_lvl_default: 14.121409043418913 (from 2 valid values) ``` Differential Revision: [D73625766](https://our.internmc.facebook.com/intern/diff/D73625766/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152163 Approved by: https://github.com/jingsh |
||
|---|---|---|
| .. | ||
| distributed/ddp | ||
| dynamo | ||
| fastrnns | ||
| framework_overhead_benchmark | ||
| functional_autograd_benchmark | ||
| fuser | ||
| gpt_fast | ||
| inductor_backends | ||
| inference | ||
| instruction_counts | ||
| nested | ||
| operator_benchmark | ||
| overrides_benchmark | ||
| profiler_benchmark | ||
| record_function_benchmark | ||
| serialization | ||
| sparse | ||
| static_runtime | ||
| tensorexpr | ||
| transformer | ||
| compare-fastrnn-results.py | ||
| compare.sh | ||
| README.md | ||
| upload_scribe.py | ||
PyTorch Benchmarks
This folder contains scripts that produce reproducible timings of various PyTorch features.
It also provides mechanisms to compare PyTorch with other frameworks.
Setup environment
Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:
# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch
# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop
# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"
Benchmark List
Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: