This reverts commit 59071ab1e7.
It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.
Summary:
This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers. This is better aligned
with the reference model spec.
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
Summary: In both eager and FX graph mode quantization,
`torch.ao.nn.quantizable.LSTM` is used as an observed custom module,
which is responsible for inserting its own observers. By default,
the user specifies a single QConfig for the custom module (either
through QConfigMapping or by setting the "qconfig" attribute"),
and all inner ops will [inherit this
QConfig](dc00bb51b8/torch/ao/nn/quantizable/modules/rnn.py (L366-L378))
and use the same observer/fake_quantize constructors.
Today, users who wish to override this behavior must extend
`torch.ao.nn.quantizable.LSTM` and write a lot of custom code
to manually assign the QConfigs to the inner ops. This commit
alleviates this burden on the user by providing a helper function
to assign QConfigs with custom observers. An example use case of
this is providing a reference implementation for a backend kernel
that hardcodes qparams for efficiency.
Example usage:
```
import torch
from torch.ao.quantization import get_default_qconfig_mapping
from torch.ao.quantization.fx.custom_config import (
PrepareCustomConfig,
ConvertCustomConfig,
)
class MyModel(torch.nn.Module):
...
class UserLSTM(torch.ao.nn.quantizable.LSTM):
@classmethod
def from_float(cls, other):
assert isinstance(other, cls._FLOAT_MODULE)
linear_output_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -11, zero_point=2 ** 15, dtype=torch.qint32)
sigmoid_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -16, zero_point=0, dtype=torch.qint32)
tanh_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -15, zero_point=2 ** 15, dtype=torch.qint32)
cell_state_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -11, zero_point=0, dtype=torch.qint32)
hidden_state_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -7, zero_point=2 ** 7, dtype=torch.quint8)
return torch.ao.quantization.utils._get_lstm_with_individually_observed_parts(
float_lstm=other,
linear_output_obs_ctr=linear_output_obs_ctr,
sigmoid_obs_ctr=sigmoid_obs_ctr,
tanh_obs_ctr=tanh_obs_ctr,
cell_state_obs_ctr=cell_state_obs_ctr,
hidden_state_obs_ctr=hidden_state_obs_ctr,
)
qconfig_mapping = get_default_qconfig_mapping()
example_inputs = (torch.rand(5, 3, 50), torch.rand(1, 3, 50), torch.randn(1, 3, 50))
prepare_custom_config = PrepareCustomConfig() \
.set_float_to_observed_mapping(torch.nn.LSTM, UserLSTM)
convert_custom_config = ConvertCustomConfig() \
.set_observed_to_quantized_mapping(UserLSTM, torch.ao.nn.quantized.LSTM)
model = MyModel()
model = prepare_fx(model, qconfig_mapping, example_inputs, prepare_custom_config=prepare_custom_config)
model(*example_inputs) # calibrate
model = convert_fx(model, convert_custom_config=convert_custom_config)
model(*example_inputs)
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88456
Approved by: https://github.com/jerryzh168, https://github.com/vkuzo
Summary:
_convert_to_reference_decomposed is a private convert function in fx graph mode quantization flow to convert
a calibrated/trained model to a reference quantized model with decomposed quantized tensor representations.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87094
Approved by: https://github.com/andrewor14
Summary: the main problem with this was that the different objects
defined simply as 'Any' should theoretically be public but making them
public either A) results in an error about the module being 'typing'
rather than whatever module it should be or B) you set the module
manually, thereby changing the module for the original 'Any' class.
note: QuantizeHandler has a similar issue where its simply defined as
'Any'
Pattern was defined in multiple places which was causing issues so i just moved it to a single
place given the note at the top of quantization_types.py indicating
these definitions should be moved to utils at some point anyway.
Finally i changed any references to these objects to point at the
correct locations. Note: i didn't see any fb internal references to
NodePattern or QuantizerCls that would cause issues.
Test Plan: python test/test_public_bindings.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86031
Approved by: https://github.com/jerryzh168
Summary: include `torch.qint32` to `activation_is_statically_quantized` and `get_quant_type` so that fakequantize with `dtype=torch.qint32` won't be skipped
Test Plan: updated `test_custom_module_class`
Differential Revision: D40128178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86345
Approved by: https://github.com/jerryzh168
Summary:
Before this PR, the `dtype` attribute of observers was not clearly
defined. It originally meant `interface_dtype` in the eager mode
workflow, which is how the codebase before this PR is using it.
In the new reference model spec, `dtype` attribute of an observer
represents the `dtype` value which needs to be passed into a `quantize`
function in the reference model spec. This PR aligns the codebase
to this definition of dtype. In detail:
1. change util functions to interpret `dtype` using the reference model definition
2. change `prepare` to interpret `dtype` using the reference model definition
3. change observers for dynamic quantization to interpret `dtype` using the reference
model definition.
A future PR (left out of this one to keep LOC small) will deprecate the
`compute_dtype` field and instead expose `is_dynamic` on observers.
"
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345
Approved by: https://github.com/z-a-f, https://github.com/jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79066
Following https://github.com/pytorch/pytorch/pull/78452,
this commit replaces the following config dicts with python objects:
- prepare_custom_config_dict -> PrepareCustomConfig
- convert_custom_config_dict -> ConvertCustomConfig
- fuse_custom_config_dict -> FuseCustomConfig
This leads to better type safety and better user experience in
notebook settings due to improved auto completion. The new APIs
are as follows:
```
from torch.ao.quantization.fx.custom_config import PrepareCustomConfig
prepare_custom_config = PrepareCustomConfig() \
.set_float_to_observed_mapping(float_class, observed_class) \
.set_non_traceable_module_names(["mod1", "mod2"]) \
.set_non_traceable_module_classes([class1, class2]) \
.set_input_quantized_indexes([0, 1]) \
.set_output_quantized_indexes([0]) \
.set_preserved_attributes(["attr1", "attr2"])
convert_custom_config = ConvertCustomConfig() \
.set_observed_to_quantized_mapping(observed_class, quantized_class) \
.set_preserved_attributes(["attr1", "attr2"])
model = prepare_fx(
model,
qconfig_mapping,
example_inputs,
prepare_custom_config=prepare_custom_config)
model(data)
model = convert_fx(model, convert_custom_config=convert_custom_config)
```
For backwards compatibility, prepare_fx, prepare_qat_fx, and
convert_fx will continue to accept Dicts, which will be converted
to the relevant *CustomConfig object internally.
Note that this commit does not modify existing tests to use the
new API; they will continue to pass in Dicts as before, which still
works but triggers a deprecation warning. This will be handled in
a future commit.
Differential Revision: [D37088095](https://our.internmc.facebook.com/intern/diff/D37088095/)
Approved by: https://github.com/jerryzh168
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286
Approved by: https://github.com/dzdang
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146
Approved by: https://github.com/vkuzo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74845
This PR adds support for quantization flow to detect
parametrized modules and match them using their original module types.
This mainly involved using the new type_before_parametrizations function rather than
type to check for module mathcing
Test Plan:
python test/test_ao_sparsity.py TestComposability
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D35240274
fbshipit-source-id: 7294d89c9c2e069e51d8b9bafa45c15f92bed124
(cherry picked from commit ed5cdb7b636c42e040d1b4a67b6b94604d06e1ff)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74717
currently the weight map to 0 and max_float to 65535 due to incorrect qmin/qmax in qin16 customized qrange
the expectation from the set observers is the integer representation is supposed to be a signed int16 i.e -32768 to 32767.
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D35129924
fbshipit-source-id: 924902dd7e64c1218971422ba2451c2a484fd2f4
(cherry picked from commit 95659cdeeec7b3a01a64355244847e211c6dd2a6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74581
As title, currently the quant_min/quant_max of the FakeQuantize are not populated to the observer. We plan to populate when they are both not None.
To do this we need to do
1. Remove the current default quant_min/quant_max value (0/255) as it's not universal for various dtype.
2. Move the upper bound/lower bound check before creating the observer.
Test Plan:
```
[jiaxuzhu@devvm3400.frc0 /data/users/jiaxuzhu/fbsource/fbcode] buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize)'
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 9.5 sec (100%) 18535/84579 jobs, 2/84579 updated
Total time: 10.3 sec
More details at https://www.internalfb.com/intern/buck/build/1cab97ef-0788-4d06-92ed-a828995e3bde
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 24be645e-eebc-45d6-8111-052ef1225fa0
Trace available for this run at /tmp/tpx-20220323-094106.724238-24be645e-eebc-45d6-8111-052ef1225fa0/trace.log
RemoteExecution session id: reSessionID-24be645e-eebc-45d6-8111-052ef1225fa0-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
✓ ListingSuccess: caffe2/test:quantization : 483 tests discovered (20.179)
✓ Pass: caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize) (18.896)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
```
Reviewed By: jerryzh168
Differential Revision: D34971236
fbshipit-source-id: 4407fd03116a296053256b333f7ce6d28dcc9c42
(cherry picked from commit f6980bccea802f220cc5b6dfe1bf3a3a3eef0a34)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863
This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).
This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34778506
fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73493
This PR enables basic support for reference modules in DBR quant.
For now, the support is limited to:
1. modules that have reference versions defined only (no functions)
2. torch.qint32 dtype only
Currently, the reference module logic is enabled whenever dtype is
torch.qint32. This is done because this is needed the earliest for
the first use case. A future PR will support more dtypes and also
add the `is_reference` flag to the API.
Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_conv_int32_reference_model
```
Reviewed By: jerryzh168
Differential Revision: D34520759
Pulled By: vkuzo
fbshipit-source-id: 363db715315c5c7c20962a1818330ce288948778
(cherry picked from commit 6ccdfe2889c252211f191edc49f4147f66e803a4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73344
not user facing as of now, since we haven't advertised the backend_config_dict api,
we need this in fuser_method_mapping.py, this is to avoid circular dependency
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D34441778
fbshipit-source-id: 7a01c359e4b21e9e98345dc7781f735628209a20
(cherry picked from commit 758537094c5a98a17a8825b3f240c8d5acdd72b0)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71168
In this PR we want to enable the reference path by default for CopyNodeQuantizeHandler
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D33715995
fbshipit-source-id: eda44892fcea3a1cba54ac75bc020f73e1becc8c
(cherry picked from commit a2cf63f68d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70257
Makes dynamic quantization for linear module work in DBR quant.
Coverage for more ops and functionals will be in future PRs.
Test Plan:
```
python test/test_quantization.py -k DBR
```
Reviewed By: jerryzh168
Differential Revision: D33262300
Pulled By: vkuzo
fbshipit-source-id: c1cb0f9dd3f42216ad6ba19f4222b171ff170174
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720
This function is also useful for DBR quant, moving it from FX utils
to common utils.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeDBR
```
Reviewed By: jerryzh168
Differential Revision: D33003473
Pulled By: vkuzo
fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68770
Previous fusion only works for a sequnce of ops, which is not general enough for fusion patterns
that is defined by a subgraph, this PR refactors that to make it more general
Test Plan:
```
python test/test_quantization.py TestFuseFx
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32602637
fbshipit-source-id: a7897c62081b9d71c67fb56e78484cf68deaacf6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68769
att, since we want to use this type in fuser_method_mapping in later PRs
Test Plan:
no change to logic, just regression test on ci
```
python test/test_quantization.py
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32602636
fbshipit-source-id: 15b95241431dfca9b1088d0920bf75705b37aa9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id: 138303325
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: jerryzh168
Differential Revision: D30899082
fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9