mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Ansha Yu 690c8b434f [static runtime] binding for aten::sub_out (#56656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56656 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.85766 ms. 35.7817%. fb::sigrid_transforms_torch_bind (1 nodes) 1.1238 ms. 21.6464%. aten::linear (6 nodes) 0.858116 ms. 16.5288%. aten::argmin (1 nodes) 0.334183 ms. 6.43694%. aten::matmul (1 nodes) 0.173697 ms. 3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.118827 ms. 2.28881%. fb::clip_ranges_gather (263 nodes) 0.101348 ms. 1.95215%. aten::sub (1 nodes) 0.0748209 ms. 1.44118%. aten::repeat (1 nodes) 0.0582576 ms. 1.12214%. aten::norm (1 nodes) 0.0474353 ms. 0.913686%. fb::batch_box_cox (1 nodes) 0.0457588 ms. 0.881393%. aten::__getitem__ (506 nodes) 0.0435175 ms. 0.838222%. prim::TupleUnpack (254 nodes) 0.0425416 ms. 0.819425%. aten::sigmoid (2 nodes) 0.0383822 ms. 0.739308%. fb::offsets_to_ranges (253 nodes) 0.0330187 ms. 0.635996%. aten::mul (3 nodes) 0.027534 ms. 0.530352%. fb::simple_embedding_bag_sum (3 nodes) 0.0274914 ms. 0.529532%. aten::pow (1 nodes) 0.0236733 ms. 0.455989%. fb::casted_batch_one_hot_lengths (1 nodes) 0.023348 ms. 0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0193511 ms. 0.372735%. aten::sum (3 nodes) 0.0188839 ms. 0.363737%. prim::DictConstruct (2 nodes) 0.0183191 ms. 0.352858%. prim::TupleConstruct (1 nodes) 0.0119029 ms. 0.22927%. aten::div (1 nodes) 0.0103263 ms. 0.198902%. static_runtime::to_copy (8 nodes) 0.00977658 ms. 0.188314%. prim::ListConstruct (4 nodes) 0.00924042 ms. 0.177986%. fb::sigrid_hash_precompute (1 nodes) 0.00692162 ms. 0.133322%. aten::contiguous (1 nodes) 0.00567485 ms. 0.109307%. aten::narrow (4 nodes) 0.00362285 ms. 0.0697823%. aten::logit (1 nodes) 0.00329995 ms. 0.0635627%. aten::add (1 nodes) 0.00285633 ms. 0.0550178%. aten::full (1 nodes) 0.00268469 ms. 0.0517118%. fb::gather_ranges (4 nodes) 0.00248577 ms. 0.0478803%. aten::stack (1 nodes) 0.00241782 ms. 0.0465715%. aten::relu (1 nodes) 0.00233674 ms. 0.0450096%. aten::clamp_min (1 nodes) 0.00222238 ms. 0.0428068%. static_runtime::reshape_copy (2 nodes) 0.00171177 ms. 0.0329716%. aten::size (3 nodes) 0.00120008 ms. 0.0231155%. aten::expand_as (1 nodes) 0.00112628 ms. 0.0216942%. fb::clip_ranges (2 nodes) 0.00103193 ms. 0.0198768%. fb::lengths_to_offsets (3 nodes) 0.000598624 ms. 0.0115305%. static_runtime::flatten_copy (1 nodes) 0.000236196 ms. 0.00454954%. prim::device (1 nodes) 5.19164 ms. in Total StaticRuntime setup time: 0.000868 ms Memory allocation time: 0.0109619 ms Memory deallocation time: 0.071791 ms Outputs deallocation time: 0.0560187 ms Total memory managed: 1232320 bytes Total number of reused tensors: 32 W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661 I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27929253 fbshipit-source-id: 5a7984ba3ce2d6d4bce0a0ab6c5e09e8c037b44e		2021-04-22 08:40:35 -07:00
..
cpp/tensorexpr	Use at::cpu in bench_approx (#56563 )	2021-04-21 22:56:07 -07:00
distributed	Add lint for unqualified `type: ignore` (#56290 )	2021-04-21 08:07:23 -07:00
fastrnns	Add lint for unqualified `noqa` (#56272 )	2021-04-19 13:16:18 -07:00
framework_overhead_benchmark	Remove py2 compatible future imports (#44735 )	2020-09-16 12:55:57 -07:00
functional_autograd_benchmark	Add lint for unqualified `type: ignore` (#56290 )	2021-04-21 08:07:23 -07:00
instruction_counts	Add lint for unqualified `type: ignore` (#56290 )	2021-04-21 08:07:23 -07:00
operator_benchmark	Add lint for unqualified `noqa` (#56272 )	2021-04-19 13:16:18 -07:00
overrides_benchmark	Remove legacy constructor calls from pytorch codebase. (#54142 )	2021-04-11 15:45:17 -07:00
profiler_benchmark	Use libkineto in profiler (#46470 )	2020-11-25 04:32:16 -08:00
record_function_benchmark	Fix D23995953 import.	2020-09-29 19:30:23 -07:00
serialization	[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 )	2020-05-29 01:56:18 -07:00
sparse	Add CSR (compressed sparse row) layout for sparse tensors (#50937 )	2021-04-12 10:09:12 -07:00
static_runtime	[static runtime] binding for aten::sub_out (#56656 )	2021-04-22 08:40:35 -07:00
tensorexpr	[NNC] Implementation for aten::cat without conditionals. (#53128 )	2021-03-07 22:57:02 -08:00
compare-fastrnn-results.py	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
compare.sh	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
README.md	Add CSR (compressed sparse row) layout for sparse tensors (#50937 )	2021-04-12 10:09:12 -07:00
upload_scribe.py	Benchmarks: make fuser and executor configurable from command line. (#44291 )	2020-09-09 11:59:35 -07:00

README.md

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite

Fast RNNs benchmarks