Summary: The existing BackendConfig fusion pattern
uses a "reversed nested tuple" format that is highly
unintuitive. For example,
```
linear-relu -> (nn.ReLU, nn.Linear)
conv-bn-relu -> (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))
```
This pattern format also complicates the signatures
of the user specified "fuser methods", which needed
to accept arguments in reverse nested order to match
the patterns:
```
def fuse_linear_relu(is_qat, relu, linear):
...
def fuse_conv_bn_relu(is_qat, relu, bn_conv):
(bn, conv) = bn_conv
...
```
Instead, this commit introduces a new pattern format that
simply specifies the ops in forward order with no nesting:
```
linear-relu -> (nn.Linear, nn.ReLU)
conv-bn-relu -> (nn.Conv2d, nn.BatchNorm2d, nn.ReLU)
def fuse_linear_relu(is_qat, linear, relu):
...
def fuse_conv_bn_relu(is_qat, conv, bn, relu):
...
```
Note that the legacy "reversed nested tuple" is still
used internally since it is more general. In the
future, we should replace it with the format used in
the subgraph rewriter in `torch.fx`, and simplify the
existing pattern matching code to handle the new
format added in this commit.
BC-breaking Notes:
Before:
```
import torch as nn
import torch.ao.nn.intrinsic as nni
from torch.ao.quantization.backend_config import BackendPatternConfig
def fuse_linear_relu(is_qat, relu, bn_conv):
(bn, conv) = bn_conv
return nni.ConvBnReLU2d(conv, bn, relu)
config = BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \
.set_dtype_configs(...) \
.set_fuser_method(fuse_conv_bn_relu) \
.set_fused_module(nni.ConvBnReLU2d)
```
After:
```
def fuse_linear_relu(is_qat, conv, bn, relu):
return nni.ConvBnReLU2d(conv, bn, relu)
config = BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) \
.set_dtype_configs(...) \
.set_fuser_method(fuse_conv_bn_relu) \
.set_fused_module(nni.ConvBnReLU2d)
```
OR (for backward-compatibility)
```
def fuse_linear_relu(is_qat, relu, bn_conv):
(bn, conv) = bn_conv
return nni.ConvBnReLU2d(conv, bn, relu)
config = BackendPatternConfig() \
._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \
.set_dtype_configs(...) \
.set_fuser_method(fuse_conv_bn_relu) \
.set_fused_module(nni.ConvBnReLU2d) \
._set_use_legacy_pattern_format(True)
```
Before:
```
backend_config.configs # returns Dict[Pattern, BackendPatternConfig]
```
After:
```
backend_config.configs # returns List[BackendPatternConfig]
```
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestBackendConfig
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Differential Revision: [D41954553](https://our.internmc.facebook.com/intern/diff/D41954553)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90698
Approved by: https://github.com/vkuzo, https://github.com/jerryzh168
Summary: Previously we explicitly set a qconfig for ops
like conv and linear in the default QConfigMapping. However,
this makes it difficult for user to override the global and
have the new global take effect for basic ops. This commit
removes these explicit settings so the user can simply run
the following to quantize these ops.
```
qconfig_mapping = get_default_qconfig_mapping()
qconfig_mapping.set_global(my_qconfig)
```
There is no change in behavior for the default use case
of not setting anything on the default QConfigMapping.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_default_qconfig_mapping_override_global
Reviewers: vkuzo, jerryzh168
Subscribers: vkuzo, jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90066
Approved by: https://github.com/vkuzo, https://github.com/jerryzh168
Summary:
This PR adds support for matching patterns that has multiple arguments, it's needed for quantization in PyTorch 2.0 early prototype
Before this PR, we only support patterns like:
```
x -> conv -> bn -> relu
(relu, (bn, conv))
```
where each operator has a single node, the code breaks when we want to match a pattern that has an op that has multiple arguments, such as:
```
shape \
transpose -> reshape -> output ->
```
where `reshape` has two arguments
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_match_pattern_with_multiple_args
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89986
Approved by: https://github.com/vkuzo
Summary: This commit renames fx/quantization_patterns.py
to fx/quantize_handler.py, and fx/fusion_patterns.py to
fx/fuse_handler.py. This is because these files contain
only QuantizeHandler and FuseHandler respectively, so the
new names are more descriptive. A future commit will
further break BC by removing all the empty *QuantizeHandler
classes.
BC-breaking notes:
The following classes under the
`torch.ao.quantization.fx.quantization_patterns` namespace
are migrated to the `torch.ao.quantization.fx.quantize_handler`
namespace:
```
QuantizeHandler
BinaryOpQuantizeHandler
CatQuantizeHandler
ConvReluQuantizeHandler
LinearReLUQuantizeHandler
BatchNormQuantizeHandler
EmbeddingQuantizeHandler
RNNDynamicQuantizeHandler
DefaultNodeQuantizeHandler
FixedQParamsOpQuantizeHandler
CopyNodeQuantizeHandler
GeneralTensorShapeOpQuantizeHandler
CustomModuleQuantizeHandler
StandaloneModuleQuantizeHandler
```
The following classes under the
`torch.ao.quantization.fx.fusion_patterns` namespace are
migrated to the `torch.ao.quantization.fx.fuse_handler`
namespace:
```
DefaultFuseHandler
FuseHandler
```
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89872
Approved by: https://github.com/jerryzh168
Preparation for the next PR in this stack: #89559.
I replaced
- `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`,
- the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and
- `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default).
There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527
Approved by: https://github.com/mruberry
Summary:
att, after this PR we can produce quantize_per_channel and dequantize_per_channel ops (typically used for quantizing weights)
in the reference flow using decomposed tensor
Test Plan:
python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_per_channel_quant
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89270
Approved by: https://github.com/vkuzo
Summary:
split the is_decomposed logic for `_replace_observer_with_quantize_dequantize_node` in a separate function and added support for dynamic quantization in the decomposed version of this function.
In case of dynamic quantization, we'll produce the following reference quantized pattern in decomposed mode:
```
x -> choose_qparams -> quantize_per_tensor -> dequantize_per_tensor -> linear
```
Test Plan:
python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_dynamic_quant
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89248
Approved by: https://github.com/vkuzo
Summary: In both eager and FX graph mode quantization,
`torch.ao.nn.quantizable.LSTM` is used as an observed custom module,
which is responsible for inserting its own observers. By default,
the user specifies a single QConfig for the custom module (either
through QConfigMapping or by setting the "qconfig" attribute"),
and all inner ops will [inherit this
QConfig](dc00bb51b8/torch/ao/nn/quantizable/modules/rnn.py (L366-L378))
and use the same observer/fake_quantize constructors.
Today, users who wish to override this behavior must extend
`torch.ao.nn.quantizable.LSTM` and write a lot of custom code
to manually assign the QConfigs to the inner ops. This commit
alleviates this burden on the user by providing a helper function
to assign QConfigs with custom observers. An example use case of
this is providing a reference implementation for a backend kernel
that hardcodes qparams for efficiency.
Example usage:
```
import torch
from torch.ao.quantization import get_default_qconfig_mapping
from torch.ao.quantization.fx.custom_config import (
PrepareCustomConfig,
ConvertCustomConfig,
)
class MyModel(torch.nn.Module):
...
class UserLSTM(torch.ao.nn.quantizable.LSTM):
@classmethod
def from_float(cls, other):
assert isinstance(other, cls._FLOAT_MODULE)
linear_output_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -11, zero_point=2 ** 15, dtype=torch.qint32)
sigmoid_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -16, zero_point=0, dtype=torch.qint32)
tanh_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -15, zero_point=2 ** 15, dtype=torch.qint32)
cell_state_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -11, zero_point=0, dtype=torch.qint32)
hidden_state_obs_ctr = FixedQParamsObserver.with_args(
scale=2 ** -7, zero_point=2 ** 7, dtype=torch.quint8)
return torch.ao.quantization.utils._get_lstm_with_individually_observed_parts(
float_lstm=other,
linear_output_obs_ctr=linear_output_obs_ctr,
sigmoid_obs_ctr=sigmoid_obs_ctr,
tanh_obs_ctr=tanh_obs_ctr,
cell_state_obs_ctr=cell_state_obs_ctr,
hidden_state_obs_ctr=hidden_state_obs_ctr,
)
qconfig_mapping = get_default_qconfig_mapping()
example_inputs = (torch.rand(5, 3, 50), torch.rand(1, 3, 50), torch.randn(1, 3, 50))
prepare_custom_config = PrepareCustomConfig() \
.set_float_to_observed_mapping(torch.nn.LSTM, UserLSTM)
convert_custom_config = ConvertCustomConfig() \
.set_observed_to_quantized_mapping(UserLSTM, torch.ao.nn.quantized.LSTM)
model = MyModel()
model = prepare_fx(model, qconfig_mapping, example_inputs, prepare_custom_config=prepare_custom_config)
model(*example_inputs) # calibrate
model = convert_fx(model, convert_custom_config=convert_custom_config)
model(*example_inputs)
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88456
Approved by: https://github.com/jerryzh168, https://github.com/vkuzo
Summary: same function in observer and quantize, consolidated to a
single function. Note the definitions were slightly different, I've
changed the definition to be maximally inclusive so that the name of the
function is more accurate
Test Plan: python test/test_public_bindings.py
python test/test_quantization.py
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520
Approved by: https://github.com/jcaip
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.
Later, https://github.com/pytorch/pytorch/pull/80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.
This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.
BC-breaking notes:
```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx
model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```
Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88620
Approved by: https://github.com/jerryzh168
## Description
Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py`
## Validation
UTs added to test
- correctness of quantized `ChannelShuffle` module.
- FX lowering of `ChannelShuffle` module and functional `channel_shuffle`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83731
Approved by: https://github.com/jerryzh168
Summary:
Previously we hardcoded the supported observers for fixedqparam ops, this PR changes that to take the information from BackendConfig,
this allows users to customize the support for fixed qparam ops
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_change_backend_config_for_fixed_qparam_ops
Reviewers:
Subscribers:
Tasks:
Tags:
unlinked from diff since it's too hard to land
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87425
Approved by: https://github.com/andrewor14
Summary:
_convert_to_reference_decomposed is a private convert function in fx graph mode quantization flow to convert
a calibrated/trained model to a reference quantized model with decomposed quantized tensor representations.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87094
Approved by: https://github.com/andrewor14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259
Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn"
for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping
Test Plan:
python test/test_quantization.py -k test_get_default_qconfig_mapping
Imported from OSS
Reviewed By: jcaip
Differential Revision: D40236474
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331
Approved by: https://github.com/andrewor14
Summary: Today, in order to get XNNPACK quantized ops to work,
the user must write some code that refers to private data
structures (`_FIXED_QPARAMS_OP_TO_OBSERVER`) to create a
QConfigMapping that is compatible with the symmetric constraints
in the QNNPACK BackendConfig. This is because
`get_default_qconfig("qnnpack")` produces a QConfig that does
not satisfy these constraints, and the default QConfigMapping
for QNNPACK uses this Qconfig.
Instead, we simply put this code into a helper function to make
it easier for the user to run XNNPACK quantized ops. In the
future, once there is feature parity between the set of ops
supported by QNNPACK and XNNPACK, we should revisit whether
to simply change `get_default_qconfig("qnnpack")` to return
an XNNPACK-compatible QConfig.
Test Plan:
python test/test_quantization.py
TestQuantizeFx.test_symmetric_qnnpack_qconfig_mapping
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87002
Approved by: https://github.com/vkuzo
Summary:
This PR adds checks for the existence of "weight_dtype" and "bias_dtype" in the node_name_to_dtype dictionary before accessing it,
the corner case is hit when we check the compatibility of qconfig and backend_config for weight and bias that appears before activation (e.g. torch.addmm)
Test Plan:
python test/test_quantization.py -k test_backend_config_check_for_weight_and_bias
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86719
Approved by: https://github.com/andrewor14
Summary:
previously the call failed because there was an infinite loop in _get_share_qparams_ops_configs
Test Plan:
python test/test_quantization.py -k test_get_executorch_backend_config
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86338
Approved by: https://github.com/andrewor14
Summary: include `torch.qint32` to `activation_is_statically_quantized` and `get_quant_type` so that fakequantize with `dtype=torch.qint32` won't be skipped
Test Plan: updated `test_custom_module_class`
Differential Revision: D40128178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86345
Approved by: https://github.com/jerryzh168
**Summary:** This commit enforces the following constraints on the
QNNPACK BackendConfig:
- `quant_min_lower_bound` = -127 for qint8 weight
- `quant_max_upper_bound` = 127 for qint8 weight
- `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight
These constraints will enable users to use this BackendConfig with
faster XNNPACK quantized ops. They are also consistent with the
existing settings in `default_symmetric_qnnpack_qconfig` and its
per_channel and QAT variants. For more detail on why these exact
values were chosen, please see the description of
https://github.com/pytorch/pytorch/pull/74396.
Note that there are currently no restrictions on the qscheme in
DTypeConfig. This should be added in the future to further enforce
the restriction that the weights must be quantized with either
per_tensor_symmetric or per_channel_symmetric.
Existing default QConfigs such as `get_default_qconfig("qnnpack")`
and `get_default_qat_qconfig("qnnpack")` will continue to be
supported, but only for the existing dtypes, e.g. quint8 activations
for weighted ops like linear and conv. In the future, we should
revisit whether to enable XNNPACK ops using these QConfigs as well.
**Test Plan:**
python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config
**Reviewers:** jerryzh168, vkuzo
**Subscribers:** jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863
Approved by: https://github.com/jerryzh168
**Summary:** This commit adds the following constraints to
BackendConfig:
quant_min_lower_bound
quant_max_upper_bound
scale_min_lower_bound
scale_max_upper_bound
This is motivated by QNNPACK constraints on qint8 weight
values and the min scale value. Actually enforcing these
constraints in the QNNPACK BackendConfig will follow in a
future commit.
Today, users can also specify the above constraints through
QConfigs, and these settings may not necessarily match the
ones specified in the BackendConfig. In this case, we will
handle the discrepancy as follows:
(1) Require QConfig quant ranges to fall within the backend's
(2) Require QConfig min scale value (eps) >= backend's
(3) Require QConfig to specify quant range if the backend
specified one
(4) Require QConfig to specify min scale value (eps) if the
backend specified one
Public API changes:
* Previous API, still supported after this commit:
```
dtype_config = DTypeConfig(
input_dtype=torch.quint8,
output_dtype=torch.quint8,
weight_dtype=torch.qint8,
bias_dtype=torch.float,
)
```
* New API:
```
dtype_config = DTypeConfig(
input_dtype=DTypeWithConstraints(
dtype=torch.quint8,
quant_min_lower_bound=0,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
output_dtype=DTypeWithConstraints(
dtype=torch.quint8,
quant_min_lower_bound=0,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
weight_dtype=DTypeWithConstraints(
dtype=torch.qint8,
quant_min_lower_bound=-128,
quant_max_upper_bound=127,
scale_min_lower_bound=2 ** -12,
),
bias_dtype=torch.float,
)
```
* Additionally, the following `DTypeConfig` attributes
have new types with helper getters:
```
# These have type DTypeWithConstraints
dtype_config.input_dtype
dtype_config.output_dtype
dtype_config.weight_dtype
# These return Optional[torch.dtype]
dtype_config.get_input_dtype()
dtype_config.get_output_dtype()
dtype_config.get_weight_dtype()
```
Note that scale_max is currently not used because there is
no existing mechanism to enforce this on the observer. In the
future, we can validate this as well if there is a use case.
**Test Plan:**
python test/test_quantization.py
TestBackendConfig.test_dtype_with_constraints
python test/test_quantization.py
TestQuantizeFx.test_backend_config_scale_min
python test/test_quantization.py
TestQuantizeFx.test_backend_config_quantization_range
**Reviewers:** jerryzh168, vkuzo
**Subscribers:** jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200
Approved by: https://github.com/jerryzh168
**Summary:** This commit enables the custom module LSTM path for
FX graph mode static quantization. This has the same flow as eager
mode, which was already previously supported:
```
torch.nn.LSTM
| (prepare_fx)
v
torch.ao.nn.quantizable.LSTM
| (convert_fx)
v
torch.ao.nn.quantized.LSTM
```
The main reason why custom module LSTM is not supported in FX
graph mode quantization today is because its inputs and outputs
are nested tuples, and existing constructs such as observers,
"quantize" nodes, and "dequantize" nodes do not understand how
to handle complex structures.
Note that the approach taken in this commit is only intended to
be a short-term solution highly tailored to the input and output
formats of custom module LSTM. In the future, for the longer-term
solution, we should design a more general QConfig that allows users
to specify complex input and output formats, and enable FX graph
mode quantization to understand arbitrary nested structures and
automatically infer how to transform the graph accordingly.
**Context:**
Today, in FX graph mode static quantization, custom modules are
assumed to have quantized inputs and quantized outputs, with the
exact dtypes derived from the associated QConfig (default quint8).
Since custom modules are currently not handled through the reference
model flow, their observer replacement logic are a little different
from normal operators:
```
# (1) Original model
input -> custom_module -> output
# (2) Observed model (after prepare)
input -> obs0 -> custom_module -> obs1 -> output
# (3) Quantized model (after convert)
input -> quant -> quantized_custom_module -> dequant -> output
```
In the last step, input observers are replaced with "quantize"
and output observers are replaced with "dequantize", in contrast
to other non-custom-module patterns where observers are replaced
with "quantize-dequantize" pairs instead. Note that, conceptually,
the output observer `obs1` is really just a DeQuantStub, since no
observation is actually needed.
**Custom module LSTM:**
The reason why custom module LSTM cannot be handled in the same
way is because, unlike other custom modules, its inputs and outputs
are nested tuples instead of single tensors. This is how the existing
custom module code would try to handle LSTMs:
```
# (1) Original model
# input format: (input, (hidden0, hidden1))
# output format: (output, (hidden0, hidden1))
input -> lstm -> output
hidden0 -/ \-> hidden0
hidden1 -/ \-> hidden1
# (2) Observed model (after prepare)
input -> obs0 -> lstm -> obs1 # fails
hidden0 -/ # missing observer
hidden1 -/ # missing observer
```
However, this fails today because 1) we assume there is only one input
to the custom module, and so we never end up quantizing `hidden0` and
`hidden1`, and 2) the output observer `obs1` is fed a tuple, which it
does not understand how to handle.
**Short-term fix:**
This commit addresses the above by specifically handling the input
and output structures used by custom module LSTM. For the inputs,
we manually insert observers for `hidden0` and `hidden1` to ensure
all input tensors are quantized.
For the outputs, we split the tuple into its internal nodes, attach
a DeQuantStub to each node, and recombine these DeQuantStubs
according to the original structure. Finally, we must also reroute
consumers of the original LSTM tuple (and its internal nodes, e.g.
`lstm[0]`) to these DeQuantStubs:
```
# (1) Original model
input -> lstm -> output -> linear0
hidden0 -/ \-> hidden0 -> linear1
hidden1 -/ \-> hidden1 -> linear2
# (2) Observed model (after prepare)
input -> obs0 -> lstm -> output -> dqstub -> linear0 -> obs3
hidden0 -> obs1 -/ \-> hidden0 -> dqstub -> linear1 -> obs4
hidden1 -> obs2 -/ \-> hidden1 -> dqstub -> linear2 -> obs5
# (3) Reference model (after convert)
input -> quant -> qlstm -> output -> dequant -> linear0 -> quant -> dequant
hidden0 -> quant -/ \-> hidden0 -> dequant -> linear1 -> quant -> dequant
hidden1 -> quant -/ \-> hidden1 -> dequant -> linear2 -> quant -> dequant
# (4) Quantized model (after lowering)
input -> quant -> qlstm -> output -> quantized_linear0 -> dequant
hidden0 -> quant -/ \-> hidden0 -> quantized_linear1 -> dequant
hidden1 -> quant -/ \-> hidden1 -> quantized_linear2 -> dequant
```
Note that we choose to insert DeQuantStubs here instead of observers
because these will ultimately be replaced by "dequantize" nodes. This
matches the general custom module behavior, where output observers
are replaced only with "dequantize" nodes (as opposed to the normal
"quantize-dequantize" pair), since custom module outputs are assumed
to already be quantized. Using DeQuantStubs instead of observers also
simplifies the "dequantize" insertion logic. In the future, we should use
DeQuantStubs in place of output observers for custom modules in general.
**Test plan:**
python test/test_quantization.py TestQuantizeFx.test_static_lstm
python test/test_quantization.py
TestQuantizeFx.test_static_lstm_consume_tuple
**Reviewers:** jerryzh168, vkuzo
**Subscribers:** jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85068
Approved by: https://github.com/jerryzh168
Summary:
`TestQuantizeFx.test_custom_module_class` was subtly broken because the
various parts of the test case were modifying the original model. This
seems incorrect because `prepare_fx` and `convert_fx` are inplace.
To fix this, we can `copy.deepcopy` the model before applying the
test cases to it.
This test case was triggered by an unrelated refactor, splitting the
fix in a separate diff to keep the refator clean.
Test plan:
```
python test/test_quantization.py TestQuantizeFx.test_custom_module_class
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85344
Approved by: https://github.com/dzdang, https://github.com/z-a-f, https://github.com/jerryzh168
Summary:
- `remove_quant_dequant_pairs` removes ops when a `quant` is followed by a `dequant`
- It looks like the quantized implementation of `layer_norm` only supports float weights, so updated the default qconfig to avoid quantizing the weight param.
- Fixes broken test, `test_norm_weight_bias`. This was the only test that broke, because the default qconfig dict we pass in quantizes the weight. I just pulled the native qconfig object and converted it to a dict.
- Adds in qconfig and backend config support for layernorm
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
```
Reviewers:
Subscribers:
Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110
Tags: quant, fx
Differential Revision: [D39395141](https://our.internmc.facebook.com/intern/diff/D39395141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84203
Approved by: https://github.com/jerryzh168
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.
The list of the `nn.quantized` files that are being migrated:
- [X] `torch.nn.quantized` → `torch.ao.nn.quantized`
- [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
- [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
- [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
- [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat`
- [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
- [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
- [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
- [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
- [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
- [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
- [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`
Majority of the files are just moved to the new location.
However, specific files need to be double checked:
- None
Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)!
Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716
Approved by: https://github.com/jerryzh168
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.
The list of the `nn.quantized` files that are being migrated:
- [ ] `torch.nn.quantized` → `torch.ao.nn.quantized`
- [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
- [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
- [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
- [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [ ] `torch.nn.qat` → `torch.ao.nn.qat`
- [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
- [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
- [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
- [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
- [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
- [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
- [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`
Majority of the files are just moved to the new location.
However, specific files need to be double checked:
- None
Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)!
Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715
Approved by: https://github.com/jerryzh168
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.
The list of the `nn.quantized` files that are being migrated:
- [ ] `torch.nn.quantized` → `torch.ao.nn.quantized`
- [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
- [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
- [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
- [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [ ] `torch.nn.qat` → `torch.ao.nn.qat`
- [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
- [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
- [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
- [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
- [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
- [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
- [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`
Majority of the files are just moved to the new location.
However, specific files need to be double checked:
- [Documentation](docs/source/quantization-support.rst) @vkuzo
- [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10
- [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo
- [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a
- [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a
Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)!
Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714
Approved by: https://github.com/jerryzh168
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
Summary:
att, probably missed the op during migration to the reference flow
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_qmatmul
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83885
Approved by: https://github.com/andrewor14