pytorch/torch/csrc/jit/runtime/static/ProcessedNodeInputs.cpp
Scott Wolchok 639258499f [PyTorch][Static Runtime] Add & use "small array" for ProcessedNodeInputs (#67935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67935

Rationale should be documented in code comments. In short, we
can avoid heap-allocating arrays of input indexes for operators with 5
or fewer inputs, at the cost of a tag bit check on access.
ghstack-source-id: 143429112

Test Plan:
Patched d1jang's D32181666, which prints static runtime memory usage.

Previous diff, local:

```
I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208
```

This diff, local:

```
I1105 12:48:35.820663 1066520 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 338064
```
4.5% savings (16144 bytes)

Ran 10 repetitions of CMF local_ro with core pinning: P467095603. This diff is perf neutral compared to the previous diff.

Reviewed By: hlu1

Differential Revision: D32216573

fbshipit-source-id: d18483db255f75f1d90e610ecded7727c6ffe65c
2021-11-16 10:21:12 -08:00

2 lines
63 B
C++

#include <torch/csrc/jit/runtime/static/ProcessedNodeInputs.h>