Commit Graph

28 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
0db4324ea9 dbr quant function fusion [2/x]: use fusion for observation and inference (#71781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781

The previous PR added information about fusions found in the subgraphs.

This PR uses that information for:
1. inserting observers at the end of fusions and not in the middle
2. during inference, replacing the original op with the fused op. The
way this is implemented is that the base op is replaced with the fused op,
and all other ops are replaced with identity functions.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```

Reviewed By: jerryzh168

Differential Revision: D33775097

Pulled By: vkuzo

fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf
2022-02-07 05:59:03 -08:00
Vasiliy Kuznetsov
4ad1ca1abc dbr quant function fusion [1/x]: record matches for functions (#71764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71764

For DBR quant, adds the code for matching seen ops to function fusion
patterns. After we have the full DAG, we have a separate pass over the
dag and add matched fusion patterns to the seen op data structure.

This is the first PR in the stack which implements matching and
recording the match results. Future PRs in this stack will use
the match results to modify observer insertion and inference.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```

Reviewed By: jerryzh168

Differential Revision: D33775098

Pulled By: vkuzo

fbshipit-source-id: 488aac902bf568d41c863ee49248990411ed9c53
2022-02-07 05:58:57 -08:00
Vasiliy Kuznetsov
b0d48a8e66 dbr quant: record dag of non-quantizeable ops (#71551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71551

This PR makes DBR quant record the DAG of non-quantizeable
ops. Having this will enable us to analyze the entire traced
graph of pytorch ops, regardless of whether they support quantization
or not.  That, in turn, will enable analysis of uses of each op,
allowing us to safely determine whether a subgraph of ops can be
fused or not.

In future PRs, this functionality will be used to implement function
fusion.

Test Plan:
```
python test/test_quantization.py -k DBR
```

Reviewed By: jerryzh168

Differential Revision: D33684130

Pulled By: vkuzo

fbshipit-source-id: 497d9882f0670a36eef2a0900ea2517c82addf66
2022-02-07 05:58:54 -08:00
Vasiliy Kuznetsov
6d85745810 dbr quant: rename seen_op_info to seen_q_op_info (#71312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71312

Renames `seen_op_info` to `seen_q_op_info` and `SeenOpInfo` to `SeenQOpInfo`.

This is to make it clear that the op here is a quantizeable op.
This is useful for a future PR where we will start recording the DAG
of non-quantizeable ops. That will be needed to properly support
function fusion.

Test Plan: CI and mypy

Reviewed By: albanD

Differential Revision: D33584751

Pulled By: vkuzo

fbshipit-source-id: 0b659d4ecefc96d532c451abac410c638e457dcb
2022-02-07 05:57:25 -08:00
Peter Bell
f7b1884845 [quant][fx] Don't assume bias is a keyword-argument (#71426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71426

dbr quantization makes faulty assumptions about which arguments are
passed as keyword arguments and which are passed as positional
arguments. This happens to work currently due to a quirk of how
`__torch_function__` is implemented in python functions, but will
break when the operators are moved to C++.

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D33754262

Pulled By: albanD

fbshipit-source-id: 63515d7a166449726e1beaba6659443b6261742d
2022-02-01 08:56:16 -08:00
Vasiliy Kuznetsov
7fdec92b9c dbr quant: make SeenOpInfo a dataclass (#71267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71267

Refactors `SeenOpInfo` to be a dataclass, to be consistent with
`QTensorInfo`, so we can get real typing. Fixes the type errors. No logic change.

Test Plan:
```
python test/test_quantization.py -k DBR
```

Reviewed By: HDCharles

Differential Revision: D33567129

Pulled By: vkuzo

fbshipit-source-id: 55f89d7a497b6db1fd9956255d964663032a0401
2022-01-24 06:23:31 -08:00
Vasiliy Kuznetsov
2985849916 dbr quant: split observer insertion to a separate pass (#71253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71253

Before this PR, observers were inserted at the same time as
we recorded ops seen while tracing with example input. This is not
ideal because for function fusion (not yet implemented),
we need to be able to look ahead from the current op to properly
insert observers.

This PR refactors observer insertion in DBR quantization to happen
in a separate pass after the ops are recorded.  There is no functionality
change in this diff, but this PR will make it easier to implement
function fusion in a future PR.

Note: the qconfig is still used during tracing to assign each
op an inference dtype. This is not ideal, in the future we may move this
step to happen as a separate pass as well. The reason we keep it as is
in this PR because some more refactoring would be needed to allow
this to both happen in a separate pass as well as survive module
boundaries.

Test Plan:
```
python test/test_quantization.py -k DBR
```

Reviewed By: wenleix

Differential Revision: D33558280

Pulled By: vkuzo

fbshipit-source-id: 54e9cea6ad05317a8c7c92be005d33653617bed6
2022-01-24 06:23:28 -08:00
Vasiliy Kuznetsov
3c8db24360 dbr quant: make QTensorInfo a dataclass and add orig_dtype (#71245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71245

This is a refactor to make a future PR of making observer
insertion be a separate pass easier.
1. adds orig_dtype, so we always record what was seen while tracing
2. switches from namedtuple to dataclass, so we can have more explicit types

Test Plan: CI and mypy

Reviewed By: HDCharles

Differential Revision: D33558281

Pulled By: vkuzo

fbshipit-source-id: b9f87e25a3538fee145f020916a31699046a9c11
2022-01-24 06:23:24 -08:00
Vasiliy Kuznetsov
9c455d7086 dbr quant: add limited support for torch.nn.ModuleList (#70372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70372

Enables basic support for `torch.nn.ModuleList` in DBR quant
by stopping it from being a leaf.  For now, we
require the user to check for `AutoQuantizationState` if they are
looping over the contents without any bounds checking.

In future PRs, we can explore how to solve this without requiring
user code changes.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_module_list
```

Reviewed By: VitalyFedyunin

Differential Revision: D33302329

Pulled By: vkuzo

fbshipit-source-id: 1604748d4b6c2b9d14b50df46268246da807d539
2022-01-06 13:25:13 -08:00
Vasiliy Kuznetsov
b12852eb41 dbr quant: support for custom leaf modules, part 1/x (#70330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70330

Starts adding support for custom leaf modules, part 1/x.
In this PR, we ensure that leaf modules and all of their children
do not get `AutoQuantizationState` objects attached to them.
The API is matching prepare_fx, using the `prepare_custom_config_dict`
argument and the `non_traceable_module_class` key within that dict.

The next couple of PRs will ensure that modules and functions in
leaves do not get quantized, keeping it separate to make PRs smaller.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_prepare_custom_config_dict_non_traceable_module_class
```

Reviewed By: jerryzh168

Differential Revision: D33285310

Pulled By: vkuzo

fbshipit-source-id: 532025fda5532b420fad0a4a0847074d1ac4ad93
2022-01-06 13:25:04 -08:00
Vasiliy Kuznetsov
dfb807d65e dbr quant: do not attach auto_quant_state to observers (#70256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70256

Somewhere in previous PRs we started attaching AutoQuantState
to observers. This PR removes this, as that has not purpose
and makes model debugging more complicated.

Test Plan:
```
python test/test_quantization.py -k DBR
```

Reviewed By: jerryzh168

Differential Revision: D33262299

Pulled By: vkuzo

fbshipit-source-id: a3543b44c517325d57f5ed03b961a8955049e682
2022-01-06 13:23:43 -08:00
Vasiliy Kuznetsov
adaf383837 dbr quant: better fix for bug with recursion on dequantize (#70128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70128

Previous code disabled torch_function when dequantizing arguments
to an unquantizeable function.  This PR blocklists the dequantize
method from the dequantize hook instead, so we can remove
the previous hack.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: ejguan

Differential Revision: D33194396

Pulled By: vkuzo

fbshipit-source-id: 6175c2da637c1d0c93b3fea0ef8218eaee6a2872
2021-12-21 06:25:37 -08:00
Vasiliy Kuznetsov
a4173fc887 dbr quant: extend qconfig_dict support to functions, part 1 (#69758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69758

Extends DBR quant `qconfig_dict['object_type']` support to function types,
with the restriction that a parent module must have a qconfig.

A future PR will remove the restriction above (it is due to some technical
debt), to keep PR sizes small.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33020217

Pulled By: vkuzo

fbshipit-source-id: ce8a8185f9c87d437e1319ff6f19e8f6adf41e02
2021-12-17 05:59:52 -08:00
Vasiliy Kuznetsov
4f450f44bf dbr quant: initial support of qconfig_dict for modules (#69719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69719

This PR changes the API signature of DBR quant to use `qconfig_dict`,
similar to FX graph mode quantization.  In this first PR, only basic
functionality is implemented:
* qconfig=None or static quantization with quint8 only is tested
* non-default qconfig for modules only is tested
* targeting ops by order is not implemented

Expanding this support will be done in future PRs.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33003475

Pulled By: vkuzo

fbshipit-source-id: f5af81e29c34ea57c2e23333650e44e1758102e4
2021-12-17 05:59:44 -08:00
Vasiliy Kuznetsov
5b37ac54cb dbr quant overhead [14/x]: cache whether an op is a module (#68877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877

Saves whether an op type is a module during tracing, so we
can avoid recalculating this when validating the op during inference.
This leads to a small speedup.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

```
// MobileNetV2, 1x3x224x224, function level profiling

// before
validate_cur_op - 1.77%

// after
validate_cur_op - 1.41%

```

Reviewed By: jerryzh168

Differential Revision: D32646149

Pulled By: vkuzo

fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892
2021-11-30 06:13:06 -08:00
Vasiliy Kuznetsov
f253370bb9 dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841

Caches the current module's hook type as an attribute on the module.
This requires the assumption that the current module's hook type
does not change during inference, which is an assumption we can
commit to.

Test Plan:
correctness
```
python test/test_quantization.py TestQuantizeDBR
```

performance
```
// MobileNetV2, 1x3x224x224, function profiling

// before
get_module_hook_type -> 2.58%

// after
get_module_hook_type -> 0.73%
```

Reviewed By: jerryzh168

Differential Revision: D32630881

Pulled By: vkuzo

fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65
2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov
e1c449ff34 dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432

Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op
based on informationg gathered while tracing, and skipping execution when
this flag is true.

```
MobileNetV2, function level profiling, 1x3x224x224

// before
op_convert_before_hook = 3.25%

// after
op_convert_before_hook = 1.35%
```

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463752

Pulled By: vkuzo

fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b
2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov
f1021bcf38 dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373

Removes redundant logic in `op_needs_quantization`, for a small speedup.

Test Plan:
```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
cur_op_needs_hooks - 0.76%
op_needs_quantizaion - 0.41%

// after
cur_op_needs_hooks - 0.70%
op_needs_quantization - 0.36%
```

Reviewed By: jerryzh168

Differential Revision: D32463762

Pulled By: vkuzo

fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952
2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov
16a6e0612d dbr quant: clean up key types in AutoQuantizationState mappings (#68369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369

`AutoQuantizationState` has various mappings keyed on IDs. Only
`tensor_id_to_observer` actually needs string keys because it is an
`torch.nn.ModuleDict`.  This PR changes the other mappings to have
integer keys, for simplicity and performance.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463765

Pulled By: vkuzo

fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856
2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov
3fc9bc43c6 dbr quant overhead[4/x]: speed up hook type calculations (#68351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351

Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by
bypassing the expensive `torch.nn.Module` getters and setters and
fetching `_auto_quant_state` directly.

Test Plan:
Model level benchmarking is noisy.  Individual `cProfile` results:

```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
get_module_hook_type - 5.96%
get_torch_function_hook_type - 2.24%

// after
get_module_hook_type - 2.10%
get_torch_function_hook_type - 0.57%
```

Reviewed By: jerryzh168

Differential Revision: D32463756

Pulled By: vkuzo

fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58
2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov
9fba8971a7 dbr quant: move model level utils into own file (#68346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346

Some utility functions for DBR quant need to be aware
of `AutoQuantizationState`.  This PR moves them into their own file, so they
can use the type directly without circular imports, and removes the mypy
ignores which are no longer necessary after this change.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463763

Pulled By: vkuzo

fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1
2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov
52cc9cb0ee dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344

Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info`
instead of the current op. This will make future performance improvements
easier.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463758

Pulled By: vkuzo

fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e
2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov
2755cf457c dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343

Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to
use less internal state, makes the function have no side effects by passing
the state in the arguments, and moves the function to utils file.

This will help with a future refactor to cache this info at runtime.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463760

Pulled By: vkuzo

fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b
2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov
57472ec414 dbr quant: refactor get_quantized_op to only use seen_op_info (#68342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342

Before this PR, `get_quantized_op` required the current callable.

After this PR, `get_quantized_op` only requires `seen_op_info`.
The signature was changed slightly to return `None` if the original
callable does not need replacement for quantization.

This will make it easier to make performance improvements in a
future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463768

Pulled By: vkuzo

fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f
2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov
9cf4779ec9 dbr quant: refactor get_func_output_obs_type to only use seen_op_info (#68341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341

Before this PR, `get_func_output_obs_type` used information from the
incoming op and its arguments, which makes it hard to cache.

This PR refactors `get_func_output_obs_type` to only use information
collected during tracing. This will make it easier to make performance
improvements in a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463755

Pulled By: vkuzo

fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c
2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov
ed6ef0eec4 dbr quantization: inline scale and zp (#68251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251

Before this PR, DBR quantization used to recalculate scale and zero_point
in the converted model every time it was needed, which is slow.
This PR creates a pass during the convert function to go through every
observer in the model and cache its scale and zero_point.

Note: only doing this for observers which correspond to int8 operations
is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: VitalyFedyunin

Differential Revision: D32463769

Pulled By: vkuzo

fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29
2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov
ca499567d2 barebones numeric suite for quantization with dynamic tracing (#67776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776

This adds a barebones `add_loggers` and `extract_logger_info` API
to analyze intermediate activations of models using quantization
with dynamic tracing.  The API generally matches the NS for FX tool,
with some omissions.  For now, this is moving fast to help us
debug real models, and the API will be 100% aligned before this is marketed to users,
in future PRs.

Note: the current approach couples Numeric Suite with the quantization
logic. This is not the best for composability, and may be changed
at a future time.

Test Plan:
```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

Differential Revision:
D32231332
D32231332

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a
2021-11-20 15:15:48 -08:00
Vasiliy Kuznetsov
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00