Commit Graph

14 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
5b37ac54cb dbr quant overhead [14/x]: cache whether an op is a module (#68877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877

Saves whether an op type is a module during tracing, so we
can avoid recalculating this when validating the op during inference.
This leads to a small speedup.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

```
// MobileNetV2, 1x3x224x224, function level profiling

// before
validate_cur_op - 1.77%

// after
validate_cur_op - 1.41%

```

Reviewed By: jerryzh168

Differential Revision: D32646149

Pulled By: vkuzo

fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892
2021-11-30 06:13:06 -08:00
Vasiliy Kuznetsov
f253370bb9 dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841

Caches the current module's hook type as an attribute on the module.
This requires the assumption that the current module's hook type
does not change during inference, which is an assumption we can
commit to.

Test Plan:
correctness
```
python test/test_quantization.py TestQuantizeDBR
```

performance
```
// MobileNetV2, 1x3x224x224, function profiling

// before
get_module_hook_type -> 2.58%

// after
get_module_hook_type -> 0.73%
```

Reviewed By: jerryzh168

Differential Revision: D32630881

Pulled By: vkuzo

fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65
2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov
e1c449ff34 dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432

Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op
based on informationg gathered while tracing, and skipping execution when
this flag is true.

```
MobileNetV2, function level profiling, 1x3x224x224

// before
op_convert_before_hook = 3.25%

// after
op_convert_before_hook = 1.35%
```

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463752

Pulled By: vkuzo

fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b
2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov
f1021bcf38 dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373

Removes redundant logic in `op_needs_quantization`, for a small speedup.

Test Plan:
```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
cur_op_needs_hooks - 0.76%
op_needs_quantizaion - 0.41%

// after
cur_op_needs_hooks - 0.70%
op_needs_quantization - 0.36%
```

Reviewed By: jerryzh168

Differential Revision: D32463762

Pulled By: vkuzo

fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952
2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov
16a6e0612d dbr quant: clean up key types in AutoQuantizationState mappings (#68369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369

`AutoQuantizationState` has various mappings keyed on IDs. Only
`tensor_id_to_observer` actually needs string keys because it is an
`torch.nn.ModuleDict`.  This PR changes the other mappings to have
integer keys, for simplicity and performance.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463765

Pulled By: vkuzo

fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856
2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov
3fc9bc43c6 dbr quant overhead[4/x]: speed up hook type calculations (#68351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351

Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by
bypassing the expensive `torch.nn.Module` getters and setters and
fetching `_auto_quant_state` directly.

Test Plan:
Model level benchmarking is noisy.  Individual `cProfile` results:

```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
get_module_hook_type - 5.96%
get_torch_function_hook_type - 2.24%

// after
get_module_hook_type - 2.10%
get_torch_function_hook_type - 0.57%
```

Reviewed By: jerryzh168

Differential Revision: D32463756

Pulled By: vkuzo

fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58
2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov
9fba8971a7 dbr quant: move model level utils into own file (#68346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346

Some utility functions for DBR quant need to be aware
of `AutoQuantizationState`.  This PR moves them into their own file, so they
can use the type directly without circular imports, and removes the mypy
ignores which are no longer necessary after this change.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463763

Pulled By: vkuzo

fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1
2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov
52cc9cb0ee dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344

Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info`
instead of the current op. This will make future performance improvements
easier.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463758

Pulled By: vkuzo

fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e
2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov
2755cf457c dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343

Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to
use less internal state, makes the function have no side effects by passing
the state in the arguments, and moves the function to utils file.

This will help with a future refactor to cache this info at runtime.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463760

Pulled By: vkuzo

fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b
2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov
57472ec414 dbr quant: refactor get_quantized_op to only use seen_op_info (#68342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342

Before this PR, `get_quantized_op` required the current callable.

After this PR, `get_quantized_op` only requires `seen_op_info`.
The signature was changed slightly to return `None` if the original
callable does not need replacement for quantization.

This will make it easier to make performance improvements in a
future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463768

Pulled By: vkuzo

fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f
2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov
9cf4779ec9 dbr quant: refactor get_func_output_obs_type to only use seen_op_info (#68341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341

Before this PR, `get_func_output_obs_type` used information from the
incoming op and its arguments, which makes it hard to cache.

This PR refactors `get_func_output_obs_type` to only use information
collected during tracing. This will make it easier to make performance
improvements in a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463755

Pulled By: vkuzo

fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c
2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov
ed6ef0eec4 dbr quantization: inline scale and zp (#68251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251

Before this PR, DBR quantization used to recalculate scale and zero_point
in the converted model every time it was needed, which is slow.
This PR creates a pass during the convert function to go through every
observer in the model and cache its scale and zero_point.

Note: only doing this for observers which correspond to int8 operations
is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: VitalyFedyunin

Differential Revision: D32463769

Pulled By: vkuzo

fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29
2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov
ca499567d2 barebones numeric suite for quantization with dynamic tracing (#67776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776

This adds a barebones `add_loggers` and `extract_logger_info` API
to analyze intermediate activations of models using quantization
with dynamic tracing.  The API generally matches the NS for FX tool,
with some omissions.  For now, this is moving fast to help us
debug real models, and the API will be 100% aligned before this is marketed to users,
in future PRs.

Note: the current approach couples Numeric Suite with the quantization
logic. This is not the best for composability, and may be changed
at a future time.

Test Plan:
```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

Differential Revision:
D32231332
D32231332

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a
2021-11-20 15:15:48 -08:00
Vasiliy Kuznetsov
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00