Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863
This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).
This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34778506
fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73509
This adds functionality to lower reference models
involving the Linear-Bn1d pattern in FX QAT mode. This follows
https://github.com/pytorch/pytorch/pull/72431 and https://github.com/pytorch/pytorch/pull/72796, which add Linear-Bn1d fusion functionality
to eager QAT mode.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module
Imported from OSS
Reviewed By: dagitses
Differential Revision: D34591251
fbshipit-source-id: 39144485f9954ee1830c8b414e724560fd7e47bf
(cherry picked from commit b97a39b4d9df00e045fab4c01eca88e562ca2c02)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73233
This PR makes CopyNodeQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops
Lowering passes have been implemented previously, we just need to enable the reference path here,
and cleanup the previous code to allow list some of the ops (`check_node`)
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: mrshenli
Differential Revision: D34469446
fbshipit-source-id: b9d9c5f793fbb735839199056c197ae98969cc4b
(cherry picked from commit af0cf4e79e11e7343d57e6ff7766c80e72ec60f3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72953
This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops
it includes rewrite for
torch.ops.quantized.add, torch.ops.quantized.mul, torch.ops.quantized.matmul
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: gchanan
Differential Revision: D34292408
fbshipit-source-id: 9872a5098249bc77db15e9fb614416958e62b9b2
(cherry picked from commit dbdc61ee8b5dde2e54a34a370a3af887e5117398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72444
In https://github.com/pytorch/pytorch/pull/71783 support was added for
quantized matmul.
In this PR, the FX graph mode quantization workflow support for this
operator is added, for int8 dtypes.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_qmatmul
```
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34047310
fbshipit-source-id: 781219047419ce621a4deb46ea04881818bf4209
(cherry picked from commit 7e039fa3a1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71780
Adds support for matching operator.add -> torch.relu in FX graph
mode quantization.
It would be nice to support torch.relu better in general, but
saving that for a future PR to keep PRs small.
This is useful for DBR quant because we have some test cases in DBR
quant which use add-relu, and we'd like to match them to FX.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_add_relu
python test/test_quantization.py TestQuantizeFxOps.test_mul_relu
```
Reviewed By: jerryzh168
Differential Revision: D33775096
Pulled By: vkuzo
fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24
(cherry picked from commit c1f9f38ca1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71427
This commit adds a lowering path for the LinearReLU modules
in static quantization mode. This includes torch.nn.qat.Linear,
torch.nn.intrinsic.LinearReLU, and torch.nn.intrinsic.qat.LinearReLU.
Future commits will add support for dynamic quantization and functional
LinearReLU.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module
Imported from OSS
Reviewed By: george-qi
Differential Revision: D33694742
fbshipit-source-id: 19af11f82b1ad8ade0c307498971c29a3f776036
(cherry picked from commit b3f607de43)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71168
In this PR we want to enable the reference path by default for CopyNodeQuantizeHandler
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D33715995
fbshipit-source-id: eda44892fcea3a1cba54ac75bc020f73e1becc8c
(cherry picked from commit a2cf63f68d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70757
This is an initial PR on a way to preserve stack traces throughout FX
graph mode quantization. It preserves the stack traces for ops
for all of the quantize handlers. A future PR will add stack traces
for dtype transitions.
Test Plan:
```
python test/test_quantization.py
TestQuantizeFx.test_stack_trace_preserved
```
Note: the above only tests a single case. In a future PR, once we
expand coverage, we can expand the utility functions to check for stack
traces on all tests.
```
python test/test_quantization.py
TestQuantizeFx.test_stack_trace_preserved
```
Imported from OSS
Differential Revision:
D33432485
D33432485
Reviewed By: jerryzh168
Pulled By: vkuzo
fbshipit-source-id: 56c56850393132487430a850fa1def826a9c39c0
(cherry picked from commit c11155b31e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720
This function is also useful for DBR quant, moving it from FX utils
to common utils.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeDBR
```
Reviewed By: jerryzh168
Differential Revision: D33003473
Pulled By: vkuzo
fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69334
Original PR #68121 broke with incompatible qengine for Mac OS, this PR re-introduces changes with fix
Add FX support for QAT EmbeddingBag operator, previously only eager mode support.
Test Plan:
pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embeddingbag_linear"
Imported from OSS
Reviewed By: jingsh
Differential Revision: D32815153
fbshipit-source-id: 33654ce29de6e81920bf3277a75027fe403a1eb2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333
Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that.
Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`.
Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"`
Imported from OSS
Reviewed By: jingsh
Differential Revision: D32814827
fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68229
This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we rely on
subgraph_rewriter to rewrite the reference qunatized patterns to quantized ops
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32537714
fbshipit-source-id: 456086b308c4446840d8d37997daa6f8f8068479
Summary:
**Summary**: FixedQParams operators do not need fake quantization
in the prepare step. This commit introduces FixedQParamsObserver
and makes FixedQParamsFakeQuantize a simple wrapper around this
observer. It also removes the fake quantize logic in forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143
Test Plan:
Added two tests:
python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns
python3 test/test_quantization.py TestQuantizeFx.test_register_patterns
**Reviewers**: Jerry Zhang
**Subscribers**: Jerry Zhang, Supriya Rao
**Tasks**: T104942885
**Tags**: pytorch
Reviewed By: albanD
Differential Revision: D32484427
Pulled By: andrewor14
fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67067
Plan to gradually adding features to backend_config_dict, this PR adds support
for specifying the dtype for input and output of a given pattern
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31849074
fbshipit-source-id: ca2fbb873176fe72e08ea79ed1bc659bf27cbd8a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65723
Example lowering reference linear module to fbgemm/qnnpack quantized linear module
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31567461
fbshipit-source-id: 0b8fffaf8e742ec15cb07bf6a4672cf3e856db2d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66647
Missed in the last round ,
This adds reference patterns for general shape ops like view when is_reference is True
bc-breaking:
basically disabled getitem from supporting quantized ops here, we may support it later in fbgemm
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
Imported from OSS
Reviewed By: H-Huang
Differential Revision: D31680379
fbshipit-source-id: 6a3a7128514baf6d92b1607308c40339469d0066
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61647
`prepare_fx` currently assumes that bias is always a positional argument to
convolutions, and only a keyword argument to other functions. This happens to work
today due to a quirk in how `__torch_function__` is handled for python
functions but shouldn't be considered stable.
Instead, we should support `bias` for both positional and keyword forms.
cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D31401360
Pulled By: albanD
fbshipit-source-id: 1e2f53d80e2176b870f326dc498e251e2386136e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058
After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change.
This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location.
This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace.
Test Plan: `python test/test_quantization.py`
Reviewed By: vkuzo
Differential Revision: D31366066
Pulled By: z-a-f
fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65484
This PR makes sure we only use FixedQParamFakeQuantize for quint8 dtype and allows user
to use other dtypes for ops like sigmoid, this is useful for producing reference pattern for
these ops that can be used in other backends like TensorRT
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31120377
fbshipit-source-id: 3b529d588e2b6ff0377a89c181f6237f8f0cc2f5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033
1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py
Test Plan: buck test mode/dev //caffe2/test:quantization
Reviewed By: vkuzo, z-a-f
Differential Revision: D30949749
fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3