mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Nikolay Korovaiko 195ab5e864 remove non-default settings in fuser.py (#48862 ) Summary: I've noticed we are setting `_jit_set_num_profiled_runs` to 2 (which isn't our default) and sometimes we don't. We are also setting `_jit_set_bailout_depth` to 20 which is our default. I suggest we remove this logic altogether. I did a quick run to see if there's any impact and thankfully, the numbers seem to be consistent, but we should try avoding testing configurations that aren't default or aren't considered to become default. numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling non-defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.057 0.06287 None 7.322 0.07404 None aten 5.602 0.06303 None 13.64 0.4078 None jit 7.019 0.07995 None 13.77 0.554 None jit_premul 5.324 0.06203 None 12.01 0.2996 None jit_premul_bias 5.148 0.08061 None 11.62 0.4104 None jit_simple 6.69 0.2317 None 13.37 0.3791 None jit_multilayer 7.006 0.251 None 13.67 0.2239 None py 19.05 0.1119 None 28.28 0.6346 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.712 0.01628 None 19.93 0.03512 None resnet18_jit 8.688 0.01374 None 19.79 0.07518 None resnet50 31.04 0.08049 None 66.44 0.08187 None resnet50_jit 31.11 0.07171 None 66.45 0.09157 None ``` defaults: ``` Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10) Benchmarking LSTMs... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd cudnn 5.086 0.115 None 7.394 0.1743 None aten 5.611 0.2559 None 13.54 0.387 None jit 7.062 0.3358 None 13.24 0.3688 None jit_premul 5.379 0.2086 None 11.57 0.3987 None jit_premul_bias 5.202 0.2127 None 11.13 0.06748 None jit_simple 6.648 0.05794 None 12.84 0.3047 None jit_multilayer 6.964 0.1104 None 13.24 0.3283 None py 19.14 0.09959 None 28.17 0.4946 None Benchmarking ResNets... name avg_fwd std_fwd info_fwd avg_bwd std_bwd info_bwd resnet18 8.713 0.01563 None 19.93 0.02759 None resnet18_jit 8.697 0.01792 None 19.78 0.06916 None resnet50 31.14 0.07431 None 66.57 0.07418 None resnet50_jit 31.21 0.0677 None 66.56 0.08655 None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48862 Reviewed By: bertmaher Differential Revision: D25342097 Pulled By: Krovatkin fbshipit-source-id: 8d2f72c2770793ec8cecee9dfab9aaaf2e1ad2b1		2020-12-05 20:58:39 -08:00
..
cpp/tensorexpr	[te][benchmark] Add more optimized versions of gemm (#48159 )	2020-11-18 12:21:08 -08:00
distributed/ddp	Flake8 fixes (#48453 )	2020-11-25 19:09:50 -08:00
fastrnns	remove non-default settings in fuser.py (#48862 )	2020-12-05 20:58:39 -08:00
framework_overhead_benchmark	Remove py2 compatible future imports (#44735 )	2020-09-16 12:55:57 -07:00
functional_autograd_benchmark	Reland of benchmark code (#43428 )	2020-08-24 13:27:26 -07:00
operator_benchmark	[OpBench] change relu entry point after D24747035	2020-11-13 15:38:27 -08:00
overrides_benchmark	Add __torch_function__ for methods (#37091 )	2020-08-05 20:44:13 -07:00
profiler_benchmark	Use libkineto in profiler (#46470 )	2020-11-25 04:32:16 -08:00
record_function_benchmark	Fix D23995953 import.	2020-09-29 19:30:23 -07:00
serialization	[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 )	2020-05-29 01:56:18 -07:00
static_runtime	[PT][StaticRuntime] Move prim op impl to ops.cpp (#48210 )	2020-11-18 23:07:39 -08:00
tensorexpr	[NVFuser]Benchmark minor update (#46778 )	2020-10-26 12:22:36 -07:00
compare-fastrnn-results.py	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
compare.sh	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
README.md	Fix spelling errors	2020-01-28 04:46:15 -08:00
upload_scribe.py	Benchmarks: make fuser and executor configurable from command line. (#44291 )	2020-09-09 11:59:35 -07:00

README.md

PyTorch Benchmarks

NOTE: This folder is currently work in progress.

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite

Fast RNNs benchmarks