pytorch/benchmarks
Raghavan Raman 60bda4d06b [Static Runtime] Fix handling relu in quantized linear relu dynamic op
Summary:
The implementation of `PackedLinearWeightFp16::apply_dynamic_impl` [here](https://www.internalfb.com/code/fbsource/[b1ef7c31f022]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp?lines=393) does not handle `relu`. It completely ignores the `ReluFused` boolean template parameter.

At this point, callers of that function handle `relu` explicitly. While the correct thing to do would be to handle the `ReluFused` parameter in that implementation, it is not clear if that semantics is being followed in this code. So, we are handling this in SR's out-variant implementation, until the owner fixes that issue.

This issue resulted in incorrect results when Static Runtime was enabled for the MRS video model.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=StaticRuntime.QuantizedLinearReluDynamicFp16
```

Reviewed By: mikeiovine

Differential Revision: D35366309

fbshipit-source-id: e60126e3590d52681ceaee5583b81c4c0b5404d9
(cherry picked from commit cabeb96a792339e7dbfd16cb51a3ac9039812137)
2022-04-04 22:16:22 +00:00
..
cpp Nvfuser code bump 030122 (#73627) 2022-03-31 08:18:22 +00:00
distributed Remove .data from benchmarks and tensorboard (#65389) 2021-09-22 11:16:59 -07:00
fastrnns Clean up profiling mode and profiling executor strategy (#73875) 2022-03-29 18:38:51 +00:00
framework_overhead_benchmark Remove py2 compatible future imports (#44735) 2020-09-16 12:55:57 -07:00
functional_autograd_benchmark Extend autograd functional benchmarking to run vectorized tasks (#67045) 2021-12-21 17:20:29 -08:00
fuser Benchmarks for various fusers (#67622) 2021-11-04 18:57:17 -07:00
instruction_counts Allow instruction counting to use shared memory as a staging ground. (And a couple other tweaks.) (#56711) 2021-05-12 20:37:41 -07:00
operator_benchmark [TorchArrow][AIBench] Add AIBench Metrics for TorchArrow Inference Benchmark Test (#75035) 2022-04-01 00:35:42 +00:00
overrides_benchmark Use classmethods for overrides (#64841) 2021-09-17 08:32:49 -07:00
profiler_benchmark Use libkineto in profiler (#46470) 2020-11-25 04:32:16 -08:00
record_function_benchmark Fix D23995953 import. 2020-09-29 19:30:23 -07:00
serialization [JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379) 2020-05-29 01:56:18 -07:00
sparse Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
static_runtime [Static Runtime] Fix handling relu in quantized linear relu dynamic op 2022-04-04 22:16:22 +00:00
tensorexpr Clean up profiling mode and profiling executor strategy (#73875) 2022-03-29 18:38:51 +00:00
compare-fastrnn-results.py Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
compare.sh Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
README.md Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
upload_scribe.py Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808) 2021-07-19 09:45:05 -07:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite