pytorch/benchmarks
Ansha Yu 690c8b434f [static runtime] binding for aten::sub_out (#56656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56656

Test Plan:
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```
```
Time per node type:
        1.85766 ms.    35.7817%. fb::sigrid_transforms_torch_bind (1 nodes)
         1.1238 ms.    21.6464%. aten::linear (6 nodes)
       0.858116 ms.    16.5288%. aten::argmin (1 nodes)
       0.334183 ms.    6.43694%. aten::matmul (1 nodes)
       0.173697 ms.     3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
       0.118827 ms.    2.28881%. fb::clip_ranges_gather (263 nodes)
       0.101348 ms.    1.95215%. aten::sub (1 nodes)
      0.0748209 ms.    1.44118%. aten::repeat (1 nodes)
      0.0582576 ms.    1.12214%. aten::norm (1 nodes)
      0.0474353 ms.   0.913686%. fb::batch_box_cox (1 nodes)
      0.0457588 ms.   0.881393%. aten::__getitem__ (506 nodes)
      0.0435175 ms.   0.838222%. prim::TupleUnpack (254 nodes)
      0.0425416 ms.   0.819425%. aten::sigmoid (2 nodes)
      0.0383822 ms.   0.739308%. fb::offsets_to_ranges (253 nodes)
      0.0330187 ms.   0.635996%. aten::mul (3 nodes)
       0.027534 ms.   0.530352%. fb::simple_embedding_bag_sum (3 nodes)
      0.0274914 ms.   0.529532%. aten::pow (1 nodes)
      0.0236733 ms.   0.455989%. fb::casted_batch_one_hot_lengths (1 nodes)
       0.023348 ms.   0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0193511 ms.   0.372735%. aten::sum (3 nodes)
      0.0188839 ms.   0.363737%. prim::DictConstruct (2 nodes)
      0.0183191 ms.   0.352858%. prim::TupleConstruct (1 nodes)
      0.0119029 ms.    0.22927%. aten::div (1 nodes)
      0.0103263 ms.   0.198902%. static_runtime::to_copy (8 nodes)
     0.00977658 ms.   0.188314%. prim::ListConstruct (4 nodes)
     0.00924042 ms.   0.177986%. fb::sigrid_hash_precompute (1 nodes)
     0.00692162 ms.   0.133322%. aten::contiguous (1 nodes)
     0.00567485 ms.   0.109307%. aten::narrow (4 nodes)
     0.00362285 ms.  0.0697823%. aten::logit (1 nodes)
     0.00329995 ms.  0.0635627%. aten::add (1 nodes)
     0.00285633 ms.  0.0550178%. aten::full (1 nodes)
     0.00268469 ms.  0.0517118%. fb::gather_ranges (4 nodes)
     0.00248577 ms.  0.0478803%. aten::stack (1 nodes)
     0.00241782 ms.  0.0465715%. aten::relu (1 nodes)
     0.00233674 ms.  0.0450096%. aten::clamp_min (1 nodes)
     0.00222238 ms.  0.0428068%. static_runtime::reshape_copy (2 nodes)
     0.00171177 ms.  0.0329716%. aten::size (3 nodes)
     0.00120008 ms.  0.0231155%. aten::expand_as (1 nodes)
     0.00112628 ms.  0.0216942%. fb::clip_ranges (2 nodes)
     0.00103193 ms.  0.0198768%. fb::lengths_to_offsets (3 nodes)
    0.000598624 ms.  0.0115305%. static_runtime::flatten_copy (1 nodes)
    0.000236196 ms. 0.00454954%. prim::device (1 nodes)
        5.19164 ms. in Total
StaticRuntime setup time: 0.000868 ms
Memory allocation time: 0.0109619 ms
Memory deallocation time: 0.071791 ms
Outputs deallocation time: 0.0560187 ms
Total memory managed: 1232320 bytes
Total number of reused tensors: 32
W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661
I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
```

Reviewed By: hlu1

Differential Revision: D27929253

fbshipit-source-id: 5a7984ba3ce2d6d4bce0a0ab6c5e09e8c037b44e
2021-04-22 08:40:35 -07:00
..
cpp/tensorexpr Use at::cpu in bench_approx (#56563) 2021-04-21 22:56:07 -07:00
distributed Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
fastrnns Add lint for unqualified noqa (#56272) 2021-04-19 13:16:18 -07:00
framework_overhead_benchmark Remove py2 compatible future imports (#44735) 2020-09-16 12:55:57 -07:00
functional_autograd_benchmark Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
instruction_counts Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
operator_benchmark Add lint for unqualified noqa (#56272) 2021-04-19 13:16:18 -07:00
overrides_benchmark Remove legacy constructor calls from pytorch codebase. (#54142) 2021-04-11 15:45:17 -07:00
profiler_benchmark Use libkineto in profiler (#46470) 2020-11-25 04:32:16 -08:00
record_function_benchmark Fix D23995953 import. 2020-09-29 19:30:23 -07:00
serialization [JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379) 2020-05-29 01:56:18 -07:00
sparse Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
static_runtime [static runtime] binding for aten::sub_out (#56656) 2021-04-22 08:40:35 -07:00
tensorexpr [NNC] Implementation for aten::cat without conditionals. (#53128) 2021-03-07 22:57:02 -08:00
compare-fastrnn-results.py Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
compare.sh Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
README.md Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
upload_scribe.py Benchmarks: make fuser and executor configurable from command line. (#44291) 2020-09-09 11:59:35 -07:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite