pytorch/torch/csrc/jit/runtime
Mike Iovine 63c1f2fef9 [Static Runtime] Fold linear prepack ops (#85289)
Summary: Split `quantized_linear_unpacked_weight_v2` into `linear_prepack` and `quantized_linear` so that the prepacking operation may be eliminated by constant folding.

Test Plan:
Fixes a huge regression in an internal model:

```
Before
        89.6141 ms.    99.0923%. fb::quantized_linear_unpacked_weight_v2 (12 nodes)
After
       0.806852 ms.    53.5365%. quantized::linear (12 nodes, out variant)
(prepacking eliminated)
```

Differential Revision: D39622530

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85289
Approved by: https://github.com/davidberard98
2022-09-22 20:23:07 +00:00
..
interpreter
static [Static Runtime] Fold linear prepack ops (#85289) 2022-09-22 20:23:07 +00:00
argument_spec.cpp
argument_spec.h
autodiff.cpp
autodiff.h
calculate_necessary_args.h
custom_operator.h Revert "Fix crash on unload torch cpu dll (#67632)" 2022-08-02 00:56:18 +00:00
decomposition_registry_util.cpp
decomposition_registry_util.h
decomposition_registry.cpp [reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976) 2022-09-15 22:46:19 +00:00
decomposition_registry.h [reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976) 2022-09-15 22:46:19 +00:00
exception_message.h
graph_executor_impl.h
graph_executor.cpp Autograd graphtask trim unnecessary edges (#82544) 2022-08-11 18:50:09 +00:00
graph_executor.h
graph_iterator.h
instruction.cpp
instruction.h
interpreter.cpp
interpreter.h
jit_exception.cpp
jit_exception.h
jit_trace.cpp
jit_trace.h
logging.cpp
logging.h
operator_options.h
operator.cpp
operator.h Add OpOverload.decompose API (#83075) 2022-08-09 18:53:19 +00:00
print_handler.cpp
print_handler.h
profiling_graph_executor_impl.cpp
profiling_graph_executor_impl.h
profiling_record.cpp Revert "Fix crash on unload torch cpu dll (#67632)" 2022-08-02 00:56:18 +00:00
profiling_record.h Revert "Fix crash on unload torch cpu dll (#67632)" 2022-08-02 00:56:18 +00:00
register_c10_ops.cpp
register_cuda_ops.cpp [ROCm] Enable/fix unit tests test_stream_args and test_event_args (#82346) 2022-08-01 22:55:15 +00:00
register_distributed_ops.cpp
register_ops_utils.cpp
register_ops_utils.h
register_prim_ops_fulljit.cpp
register_prim_ops.cpp Performance optimizations to proxy tensor (#85049) 2022-09-16 00:28:50 +00:00
register_special_ops.cpp
script_profile.cpp
script_profile.h
serialized_shape_function_registry.cpp Fix for transposed convolution shape functions (#83557) 2022-08-22 19:05:41 +00:00
serialized_shape_function_registry.h
shape_function_registry.h
simple_graph_executor_impl.cpp
simple_graph_executor_impl.h
slice_indices_adjust.cpp
slice_indices_adjust.h
symbolic_script.cpp Supports symbolic diff for silu (#81724) 2022-08-09 01:18:10 +00:00
symbolic_script.h
symbolic_shape_registry_util.cpp [NNC] add eltwise OPs: mish and elu (#80586) 2022-09-17 01:44:34 +00:00
symbolic_shape_registry_util.h
symbolic_shape_registry.cpp Adding additional debug logging and documentation for shape functions (#77115) 2022-08-15 23:39:28 +00:00
symbolic_shape_registry.h
vararg_functions.cpp
vararg_functions.h
variable_tensor_list.h