pytorch/benchmarks
Nikolay Korovaiko 195ab5e864 remove non-default settings in fuser.py (#48862)
Summary:
I've noticed we are setting `_jit_set_num_profiled_runs` to 2 (which isn't our default) and sometimes we don't. We are also setting `_jit_set_bailout_depth` to 20 which **is** our default. I suggest we remove this logic altogether.
I did a quick run to see if there's any impact and thankfully, the numbers seem to be consistent, but we should try avoding testing configurations that aren't default or aren't  considered to become default.

 numactl -C 3 python -m fastrnns.bench --fuser=te --executor=profiling

non-defaults:

```
Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)
Benchmarking LSTMs...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
           cudnn            5.057          0.06287             None            7.322          0.07404             None
            aten            5.602          0.06303             None            13.64           0.4078             None
             jit            7.019          0.07995             None            13.77            0.554             None
      jit_premul            5.324          0.06203             None            12.01           0.2996             None
 jit_premul_bias            5.148          0.08061             None            11.62           0.4104             None
      jit_simple             6.69           0.2317             None            13.37           0.3791             None
  jit_multilayer            7.006            0.251             None            13.67           0.2239             None
              py            19.05           0.1119             None            28.28           0.6346             None

Benchmarking ResNets...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
        resnet18            8.712          0.01628             None            19.93          0.03512             None
    resnet18_jit            8.688          0.01374             None            19.79          0.07518             None
        resnet50            31.04          0.08049             None            66.44          0.08187             None
    resnet50_jit            31.11          0.07171             None            66.45          0.09157             None
```

defaults:
```
Namespace(cnns=None, cuda_pointwise_block_count=None, cuda_pointwise_block_size=None, cuda_pointwise_loop_level=None, device='cuda', executor='profiling', fuser='te', group=['cnns', 'rnns'], hiddenSize=512, inputSize=512, miniBatch=64, nloops=100, numLayers=1, print_json=None, rnns=None, sep=' ', seqLength=100, variable_lstms=False, warmup=10)
Benchmarking LSTMs...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
           cudnn            5.086            0.115             None            7.394           0.1743             None
            aten            5.611           0.2559             None            13.54            0.387             None
             jit            7.062           0.3358             None            13.24           0.3688             None
      jit_premul            5.379           0.2086             None            11.57           0.3987             None
 jit_premul_bias            5.202           0.2127             None            11.13          0.06748             None
      jit_simple            6.648          0.05794             None            12.84           0.3047             None
  jit_multilayer            6.964           0.1104             None            13.24           0.3283             None
              py            19.14          0.09959             None            28.17           0.4946             None

Benchmarking ResNets...
            name          avg_fwd          std_fwd         info_fwd          avg_bwd          std_bwd         info_bwd
        resnet18            8.713          0.01563             None            19.93          0.02759             None
    resnet18_jit            8.697          0.01792             None            19.78          0.06916             None
        resnet50            31.14          0.07431             None            66.57          0.07418             None
    resnet50_jit            31.21           0.0677             None            66.56          0.08655             None

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48862

Reviewed By: bertmaher

Differential Revision: D25342097

Pulled By: Krovatkin

fbshipit-source-id: 8d2f72c2770793ec8cecee9dfab9aaaf2e1ad2b1
2020-12-05 20:58:39 -08:00
..
cpp/tensorexpr [te][benchmark] Add more optimized versions of gemm (#48159) 2020-11-18 12:21:08 -08:00
distributed/ddp Flake8 fixes (#48453) 2020-11-25 19:09:50 -08:00
fastrnns remove non-default settings in fuser.py (#48862) 2020-12-05 20:58:39 -08:00
framework_overhead_benchmark Remove py2 compatible future imports (#44735) 2020-09-16 12:55:57 -07:00
functional_autograd_benchmark Reland of benchmark code (#43428) 2020-08-24 13:27:26 -07:00
operator_benchmark [OpBench] change relu entry point after D24747035 2020-11-13 15:38:27 -08:00
overrides_benchmark Add __torch_function__ for methods (#37091) 2020-08-05 20:44:13 -07:00
profiler_benchmark Use libkineto in profiler (#46470) 2020-11-25 04:32:16 -08:00
record_function_benchmark Fix D23995953 import. 2020-09-29 19:30:23 -07:00
serialization [JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379) 2020-05-29 01:56:18 -07:00
static_runtime [PT][StaticRuntime] Move prim op impl to ops.cpp (#48210) 2020-11-18 23:07:39 -08:00
tensorexpr [NVFuser]Benchmark minor update (#46778) 2020-10-26 12:22:36 -07:00
compare-fastrnn-results.py Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
compare.sh Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
README.md Fix spelling errors 2020-01-28 04:46:15 -08:00
upload_scribe.py Benchmarks: make fuser and executor configurable from command line. (#44291) 2020-09-09 11:59:35 -07:00

PyTorch Benchmarks

NOTE: This folder is currently work in progress.

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite