Commit Graph

6 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
2755cf457c dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343

Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to
use less internal state, makes the function have no side effects by passing
the state in the arguments, and moves the function to utils file.

This will help with a future refactor to cache this info at runtime.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463760

Pulled By: vkuzo

fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b
2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov
57472ec414 dbr quant: refactor get_quantized_op to only use seen_op_info (#68342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342

Before this PR, `get_quantized_op` required the current callable.

After this PR, `get_quantized_op` only requires `seen_op_info`.
The signature was changed slightly to return `None` if the original
callable does not need replacement for quantization.

This will make it easier to make performance improvements in a
future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463768

Pulled By: vkuzo

fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f
2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov
9cf4779ec9 dbr quant: refactor get_func_output_obs_type to only use seen_op_info (#68341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341

Before this PR, `get_func_output_obs_type` used information from the
incoming op and its arguments, which makes it hard to cache.

This PR refactors `get_func_output_obs_type` to only use information
collected during tracing. This will make it easier to make performance
improvements in a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463755

Pulled By: vkuzo

fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c
2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov
ed6ef0eec4 dbr quantization: inline scale and zp (#68251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251

Before this PR, DBR quantization used to recalculate scale and zero_point
in the converted model every time it was needed, which is slow.
This PR creates a pass during the convert function to go through every
observer in the model and cache its scale and zero_point.

Note: only doing this for observers which correspond to int8 operations
is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: VitalyFedyunin

Differential Revision: D32463769

Pulled By: vkuzo

fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29
2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov
ca499567d2 barebones numeric suite for quantization with dynamic tracing (#67776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776

This adds a barebones `add_loggers` and `extract_logger_info` API
to analyze intermediate activations of models using quantization
with dynamic tracing.  The API generally matches the NS for FX tool,
with some omissions.  For now, this is moving fast to help us
debug real models, and the API will be 100% aligned before this is marketed to users,
in future PRs.

Note: the current approach couples Numeric Suite with the quantization
logic. This is not the best for composability, and may be changed
at a future time.

Test Plan:
```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

Differential Revision:
D32231332
D32231332

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a
2021-11-20 15:15:48 -08:00
Vasiliy Kuznetsov
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00