Summary:
This diff adds a way to:
- clone previously observed method
- Add calls to observer's calculate_qparams methods
- Extract the scale and zero point
- Use them to insert quant dequant nodes
Now for forward method we have
- observe_forward
- quantize_forward
observe_forward is used post training to observer statistics. In the
case of dynamic PTQ this requires just running that method once to
update weight observer statistics.
quantize_forward method will be used to use the observer
statistics to calculate quantization parameters and apply that to quant
dequant op.
Subsequent diffs will replace dequant + op with their quantized op
counter parts and replace quantize ops with relevant packed params class
where possible
Test Plan:
To be written
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570
Approved by: https://github.com/jerryzh168
Summary:
TO support on device quantization this diff introduces observer
insertion. Specifically observers are inserted by adding new method with
prefix observ_.
Intent is that post training, this method will be run to record
statistics
Test Plan:
test_ondevice_quantization.py
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568
Approved by: https://github.com/jerryzh168
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637
The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers
The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`
Test Plan:
```
python test/test_quantization.py
```
```
python test/test_quantization.py
```
Differential Revision:
D36054169
D36054169
Reviewed By: vkuzo
Pulled By: dzdang
fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814
1. move the file
```
hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/
```
2. create a new file in the old location and copy the imports
3. fix all callsites inside `torch`
Test Plan:
```
buck test mode/dev //caffe2/test:quantization
```
Reviewed By: z-a-f
Differential Revision: D30866792
fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62348
Originally we have a supported_dtypes check for linear and conv, but it's only valid for non reference option,
this PR removes the constraint when is_reference=True and enables producing reference patterns for the dtype
combinations that's not supported by fbgemm/qnnpack, for example qint8 activation dtypes
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_linear_qint8_activation
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29968675
fbshipit-source-id: 2abe37940eb62e16fcf0cbb700c174de49719223
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html
This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).
This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838
Test Plan: CI. You can also run `flake8` locally.
Reviewed By: jbschlosser
Differential Revision: D27724232
Pulled By: samestep
fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55388
temporarily revert D27314678 (c57541ce06), it appears to cause a perf regression that makes quantization of some models take too long to complete tests.
Reviewed By: houseroad
Differential Revision: D27583809
fbshipit-source-id: e9c088ccbfd3bfb3a1d4c7eafee3eca29ee7717b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54644
Previously we special case copy operator in normal insert observer code, this PR tries to split the
special case logic to a separate function and keep the rest of the code clean.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D27314678
fbshipit-source-id: d36870ceb3717bc01eaeaa6f3f1532ad562cbaf1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48069
also renamed float_qparam_dynamic_qconfig to float_qparam_weight_only_qconfig
It's not used in user code yet so we only need to update the tests.
Test Plan: Imported from OSS
Reviewed By: supriyar
Differential Revision: D25010175
fbshipit-source-id: caa3eaa5358a8bc5c808bf5f64e6ebff3e0b61e8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46786
Previously we only support static quant, this PR added support for other types of quantization.
Note qat is actually orthogonal to these quant types, this is referring to the convert step where we
convert the observed module to a quantized module.
for qat, user will provide a CustomModule -> FakeQuantizedCustomModule in prepare_custom_config_dict
and FakeQuantizedCustomModule -> static/dynamic/weight_only quantized CustomModule in convert_custom_config_dict.
Test Plan: Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D24514701
fbshipit-source-id: 2918be422dd76093d67a6df560aaaf949b7f338c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24451406
fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24004795
fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46337
We plan to pass around the mappings instead of using global registration api to keep
the mappings local to the transformations user is performing
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24317436
fbshipit-source-id: 81569b88f05eeeaa9595447e482a12827aeb961f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44835
This is for feature parity with fx graph mode quantization
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D23745086
fbshipit-source-id: ae2fc86129f9896d5a9039b73006a4da15821307
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44766
There might be modules that are not symbolically traceable, e.g. LSTM (since it has
input dependent control flows), to support quantization in these cases, user will provide
the corresponding observed and quantized version of the custom module, the observed
custom module with observers already inserted in the module and the quantized version will
have the corresponding ops quantized. And use
```
from torch.quantization import register_observed_custom_module_mapping
from torch.quantization import register_quantized_custom_module_mapping
register_observed_custom_module_mapping(CustomModule, ObservedCustomModule)
register_quantized_custom_module_mapping(CustomModule, QuantizedCustomModule)
```
to register the custom module mappings, we'll also need to define a custom delegate class
for symbolic trace in order to prevent the custom module from being traced:
```python
class CustomDelegate(DefaultDelegate):
def is_leaf_module(self, m):
return (m.__module__.startswith('torch.nn') and
not isinstance(m, torch.nn.Sequential)) or \
isinstance(m, CustomModule)
m = symbolic_trace(original_m, delegate_class=CustomDelegate)
```
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D23723455
fbshipit-source-id: 50d666e29b94cbcbea5fb6bcc73b00cff87eb77a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43901
Add similar APIs like eager and graph mode on torchscript
- fuse_fx
- quantize_fx (for both post training static and qat)
- quantize_dynamic_fx (for post training dynamic)
- prepare_fx (for both post training static and qat)
- prepare_dynamic_fx (for post training dynamic)
- convert_fx (for all modes)
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23432430
fbshipit-source-id: fc99eb75cbecd6ee7a3aa6c8ec71cd499ff7e3c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43581
Add similar APIs like eager and graph mode on torchscript
- fuse_fx
- quantize_fx (for both post training static and qat)
- quantize_dynamic_fx (for post training dynamic)
- prepare_fx (for both post training static and qat)
- prepare_dynamic_fx (for post training dynamic)
- convert_fx (for all modes)
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23385091
fbshipit-source-id: b789e54e1a0f3af6b026fd568281984e253e0433
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26390
`quantize_script`: top level API for graph mode quantization
Test Plan:
there are some known issues, we can enable test after all known issues are fixed.
Imported from OSS
Differential Revision: D17645132
fbshipit-source-id: 61f261d5607409d493b39a2f4e05ebd017279f6b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782
At least we should be consistent on top-level APIs and prepare/convert/etc.
Logic is inplace=False by default but top-level APIs take care of doing fewer copies.
Also renames always-inplace methods like add_observer to have underscore in the end.
One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around
Test Plan: Imported from OSS
Differential Revision: D17595956
Pulled By: dzhulgakov
fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709
Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.
One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.
Test Plan: Imported from OSS
Differential Revision: D17544103
Pulled By: dzhulgakov
fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975
We would like to add the FP16 weight support for the dynamic quantized LSTM.
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization
-- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated
Total time: 13.9 sec
Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep 9
22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.184s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
Summary (total time 4.35s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D17299116
fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232
Differential Revision: D16258664
fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003
torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu
Fusion function to combine specific modules: (conv,bn) and (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.
TODO: Add tests for _intrinsic modules.
Conv BN fusion code is based on DsKhudia's implementation
Differential Revision: D16199720
fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
Summary:
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650
Differential Revision: D16379374
fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Reviewed By: zafartahirov
Differential Revision: D16199356
fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c