Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24418Fixes#24394
The observer is not added correctlty, because one of the conditions is not met.
Test Plan: Imported from OSS
Differential Revision: D16833951
Pulled By: zafartahirov
fbshipit-source-id: bb4699e6a1cf6368c7278272a68e5e7c6d3f59a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24431
Pickle's fully-qualified name lookup would fail when trying to serialize QConfig_dynamic since the __name__ on the instance would refer to the wrong class name
Test Plan: Imported from OSS
Differential Revision: D16835705
Pulled By: jamesr66a
fbshipit-source-id: e146835cbe10b08923d77298bc93b0f5b0ba37c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753
Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ
Differential Revision: D16820749
fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24232
As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r306650311, we will make the keys of default_qconfig_dict as `torch.nn.Linear`. That is, we will do the dynamic quantization on the `torch.nn.Linear` by default, if the user just specify `torch.quantize_dynamic(model)`.
ghstack-source-id: 88287089
Differential Revision: D16781191
fbshipit-source-id: 991a5e151a9ea32b879d6897cd9862855d747135
Summary:
We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization.
Before this Diff, to dynamic quantize the BERT model, we have to specify each layer:
```
qconfig_dict = {
'encoder.layer.0.attention.self.query': default_qconfig,
'encoder.layer.0.attention.self.key': default_qconfig,
'encoder.layer.0.attention.self.value': default_qconfig,
'encoder.layer.0.attention.output.dense': default_qconfig,
'encoder.layer.0.intermediate.dense': default_qconfig,
'encoder.layer.0.output.dense': default_qconfig,
'encoder.layer.1.attention.self.query': default_qconfig,
'encoder.layer.1.attention.self.key': default_qconfig,
'encoder.layer.1.attention.self.value': default_qconfig,
'encoder.layer.1.attention.output.dense': default_qconfig,
'encoder.layer.1.intermediate.dense': default_qconfig,
'encoder.layer.1.output.dense': default_qconfig,
...
}
```
After this Diff, we only need the following
```
qconfig_dict = {
torch.nn.Linear : default_qconfig
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212
ghstack-source-id: 88287091
Reviewed By: zafartahirov
Differential Revision: D16436542
fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232
Differential Revision: D16258664
fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24196
Observer returns output with no changes for post training quant. This unifies observer semantics for QAT and PTQ.
ghstack-source-id: 88140887
Differential Revision: D16768277
fbshipit-source-id: fae7c94e3dc0eeda363e9982b3865a15113e11bd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635
It appears it is the same complexity to add new modules using a base class and using a generation script.
Test Plan: Imported from OSS
Differential Revision: D16593364
Pulled By: zafartahirov
fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465
We decided not to allow user to use qconfig_dict to do quantization
since that API is not robust.
Differential Revision: D16611504
fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003
torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu
Fusion function to combine specific modules: (conv,bn) and (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.
TODO: Add tests for _intrinsic modules.
Conv BN fusion code is based on DsKhudia's implementation
Differential Revision: D16199720
fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
Summary:
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650
Differential Revision: D16379374
fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Reviewed By: zafartahirov
Differential Revision: D16199356
fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c