Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863
This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).
This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34778506
fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325
Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`
**Reviewers:** Charles David Hernandez, Jerry Zhang
**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu
**Tasks:** 99696812
**Tags:** pytorch
Reviewed By: malfet, jerryzh168
Differential Revision: D32178739
Pulled By: andrewor14
fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900
This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations.
```
codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization'
```
Test Plan: `python test/run_test.py`
Reviewed By: jerryzh168
Differential Revision: D31301193
fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799
Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).
Fixes#55393
Test Plan:
python test/test_quantization.py test_dynamic_fusion
Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D30502812
fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49964
`torch.nn.modules.linear._LinearWithBias` is only used in the transformers, and is completely identical to the `torch.nn.Linear`.
This PR creates a mapping so that this module would be treated the same as the Linear.
Test Plan:
```
python test/test_quantization.py TestDynamicQuantizedModule TestStaticQuantizedModule
```
Differential Revision: D25731589
Reviewed By: jerryzh168
Pulled By: z-a-f
fbshipit-source-id: 1b2697014e250e97d3010cdb542f9d130b71fbc3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47068
Filter the dtype config before performing the quantization in linear
Test Plan: Imported from OSS
Reviewed By: supriyar
Differential Revision: D24627907
fbshipit-source-id: 162fa47b3fcf6648049f8bc0438e41ee97ac19e9
Summary:
To avoid conflicts, this PR does not remove all imports. More are coming in further PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808
Reviewed By: wanchaol
Differential Revision: D23436675
Pulled By: ailzhang
fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931
Fix docstrings for dynamic quantized Linear/LSTM and associated classes
ghstack-source-id: 107064446
Test Plan: Docs show up in correctly
Differential Revision: D22360787
fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39125
switch to setting reduce_range to true for version > 3.
Models serialized with older state_dict will have version <=3 so will be run with reduce_range=false
Verified with backward compatibility tests (works with no changes to these tests)
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D21769689
fbshipit-source-id: 131f2ae736e31705222e82bdc77480f2f1826fe8
Summary:
When applying the float16 dynamic quantization with
```
model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.float16
)
print(model)
```
there is an issue when we try to print the model. Basically we cannot print the `qscheme` information for float16 weight (It is not per-tensor or per-channel quantization defined for int8 dynamic quantization).
Before this PR:
```
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 860, in <module>
print(dlrm)
File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__
mod_str = repr(module)
File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__
mod_str = repr(module)
File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1136, in __repr__
extra_repr = self.extra_repr()
File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/quantized/dynamic/modules/linear.py", line 55, in extra_repr
self.in_features, self.out_features, self.weight().qscheme()
RuntimeError: Could not run 'aten::qscheme' with arguments from the 'CPUTensorId' backend. 'aten::qscheme' is only available for these back
ends: [QuantizedCPUTensorId, VariableTensorId].
```
After this PR:
```
(4): DynamicQuantizedLinear(
in_features=2, out_features=1, dtype=torch.float16
(_packed_params): LinearPackedParams()
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36044
Differential Revision: D20860811
Pulled By: jianyuh
fbshipit-source-id: d1405a185f46a8110e6d27982b40534c854f4d1c
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122
Differential Revision: D18630541
Pulled By: lly-zero-one
fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28767
The scale and zero_point are for the output activation tensor, not for the weight tensor. We removed them here because we don't need the zero points and scales for the output tensors in dynamic quantization.
ghstack-source-id: 92807318
Test Plan: CI
Differential Revision: D18164949
fbshipit-source-id: 0f9172bfef615c30dc28e1dd4448a9f3cc897c2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709
Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.
One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.
Test Plan: Imported from OSS
Differential Revision: D17544103
Pulled By: dzhulgakov
fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574
Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut
Test Plan:
ci
Imported from OSS
Differential Revision: D17514876
fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428
Added bias as an optional param to the quantized_linear_prepack function.
Bias is quantized during runtime using input scale and weight scale.
ghstack-source-id: 89601399
Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer
Differential Revision: D17121304
fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232
Differential Revision: D16258664
fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0