pytorch/benchmarks
Don Jang 416f593080 [Static Runtime] Group graph nodes into input aliases & output aliases (#65517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517

This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows:

- input_aliases:  values that are either inputs or contain aliases of inputs or constants.
- output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases.
- Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph.

`output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance.

Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before.

Reviewed By: hlu1

Differential Revision: D30940955

fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc
2021-10-07 14:35:12 -07:00
..
cpp Compile without -Wno-unused-variable (take 2) (#66041) 2021-10-04 20:39:39 -07:00
distributed Remove .data from benchmarks and tensorboard (#65389) 2021-09-22 11:16:59 -07:00
fastrnns Remove .data from benchmarks and tensorboard (#65389) 2021-09-22 11:16:59 -07:00
framework_overhead_benchmark Remove py2 compatible future imports (#44735) 2020-09-16 12:55:57 -07:00
functional_autograd_benchmark faster generate_square_subsequent_mask in nn.Transformer (#60631) 2021-06-25 16:07:01 -07:00
instruction_counts Allow instruction counting to use shared memory as a staging ground. (And a couple other tweaks.) (#56711) 2021-05-12 20:37:41 -07:00
operator_benchmark [quant] Add op benchmark for GPU FakeQuantizePerChannel with float zero_points (#66183) 2021-10-06 08:07:42 -07:00
overrides_benchmark Use classmethods for overrides (#64841) 2021-09-17 08:32:49 -07:00
profiler_benchmark Use libkineto in profiler (#46470) 2020-11-25 04:32:16 -08:00
record_function_benchmark Fix D23995953 import. 2020-09-29 19:30:23 -07:00
serialization [JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379) 2020-05-29 01:56:18 -07:00
sparse Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
static_runtime [Static Runtime] Group graph nodes into input aliases & output aliases (#65517) 2021-10-07 14:35:12 -07:00
tensorexpr [nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581) 2021-06-18 14:32:09 -07:00
compare-fastrnn-results.py Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
compare.sh Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
README.md Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
upload_scribe.py Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808) 2021-07-19 09:45:05 -07:00

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite