mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Nicolas De Carli 3e77a2b478 [PyTorch] Improve aarch64 performance of bfloat16 ops (#166028 ) Summary: PR allows compiler to better optimize some bfloat16-based operations, when ran on NEON Benchmarks show measurable improvements: Before: bfloat16 add: 250.503us bfloat16 sub: 245.674us bfloat16 neg: 113.945us After: bfloat16 add: 203.862us ---> 23% higher throughput bfloat16 sub: 201.526us ---> 22% higher throughput bfloat16 neg: 74.986us ---> 52% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: binary_test.py has been updated, to run bfloat16 benchmarks using basic arithmetic functions Differential Revision: D85186786 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166028 Approved by: https://github.com/Skylion007		2025-10-22 19:25:33 +00:00
..
data	Dataloader benchmark script (#159432 )	2025-08-06 19:05:19 +00:00
distributed	[SymmMem] Tiled reduce (#162243 )	2025-10-08 02:03:04 +00:00
dynamo	[reland][fx] Move Node._prepend/Node._remove_from_list to C++ (#165882 )	2025-10-21 19:43:55 +00:00
fastrnns	Enable PLW0127 in ruff (#165851 )	2025-10-21 03:30:57 +00:00
framework_overhead_benchmark	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
functional_autograd_benchmark	[9/N] Apply ruff UP035 rule (#165515 )	2025-10-17 00:09:51 +00:00
fuser	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
gpt_fast	Enable all PIE rules on ruff (#165814 )	2025-10-18 07:36:18 +00:00
inductor_backends	[9/N] Apply ruff UP035 rule (#165515 )	2025-10-17 00:09:51 +00:00
inference	[BE] fix typos in benchmarks/ (#156077 )	2025-06-17 13:12:18 +00:00
instruction_counts	Enable ruff rule E721 (#165162 )	2025-10-13 01:48:55 +00:00
nested	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
operator_benchmark	[PyTorch] Improve aarch64 performance of bfloat16 ops (#166028 )	2025-10-22 19:25:33 +00:00
overrides_benchmark	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
profiler_benchmark	Apply TorchFix TOR203 fixes (#143691 )	2024-12-23 18:21:03 +00:00
record_function_benchmark	[Caffe2]Remove Caffe2 scripts and benchmarks (#126747 )	2024-06-05 23:46:31 +00:00
serialization	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
sparse	[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].` (#156027 )	2025-07-09 11:24:27 +00:00
static_runtime	[3/N] Use internal linkage in C++ files (#151297 )	2025-05-05 17:48:39 +00:00
tensorexpr	[BE] fix typos in benchmarks/ (#156077 )	2025-06-17 13:12:18 +00:00
transformer	Enable all SIM rules except disabled ones (#164645 )	2025-10-17 07:27:11 +00:00
compare-fastrnn-results.py	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 )	2024-07-17 14:34:42 +00:00
compare.sh
README.md	[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].` (#156027 )	2025-07-09 11:24:27 +00:00
upload_scribe.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00

README.md

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
python -m pip install torch torchvision

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python -m pip install --no-build-isolation -v -e .

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: