Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.
Test Plan:
python test/test_quantization.py TestQuantizePT2E
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
Previously if we have:
```
conv1 -> cat
conv2 /
```
and configure output of conv1/conv2 to be int8 quantized, and cat also int8 quantized and with shared inputs,
it will not produce expected results (input of cat will not be shared)
The problem is that there is some missing checks when inserting observers for input for cat
This PR fixes the problem.
Fixes: https://github.com/pytorch/pytorch/issues/106760
Test Plan:
python tes/test_quantization.py TestQuantzePT2E.test_shared_qspec
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106922
Approved by: https://github.com/kimishpatel
Summary:
As title.
There's a corner case where both cpu and gpu are avaiable, although the model is moved to cpu, the newly created PTQ weight observer is still on gpu. Therefore, during the convert, this line will fail https://fburl.com/4rhipfvb
Test Plan: CI
Differential Revision: D48141494
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106755
Approved by: https://github.com/jerryzh168
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations
Test Plan:
CIs
sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"
Differential Revision: D47727838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators
Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
Reviewed By: kimishpatel
Differential Revision: D46959928
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:
float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...
inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:
```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor /
```
Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:
```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
x = (x_scale / out_scale) * x_i8
y = (y_scale / out_scale) * y_i8
out = x + y
out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
out += out_zero_point
return out
```
Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```
Reviewed By: kimishpatel
Differential Revision: D45628032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
Summary:
Special qspecs like `SharedQuantizationSpec` and
`DerivedQuantizationSpec` refer to other nodes in the graph.
However, after subgraph rewriting in QAT, the nodes referred
to in these special qspecs may be replaced by new nodes.
This could lead to the following error when inserting
observers according to these qspecs:
```
AssertionError: please make sure only refer to edge or node
that has observer/fake_quant inserted: 'getitem' not in
dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3])
```
This commit fixes this by keeping track of the nodes that
are replaced during subgraph rewriting in QAT, and using
this mapping to update the dangling references used in these
special qspecs.
Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec
Reviewed By: jerryzh168
Differential Revision: D46606614
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970
Approved by: https://github.com/jerryzh168
Summary:
Dynamo trace, via dynamo.export, with aten_graph, generates graph with nodes
whose target is an isntance of torch._ops.OpOverload. Quantization workflow
inserting quantize/dequantize ops which are sometimes instances of
torch._ops.OpOverload (quantize_per_tensor.tensor) while other times instances
of torch._ops.OpOverloadPacket (quantizer_per_tensor) is a bit inconsistent.
Also not sure if it is a valid exported model, if it has nodes with target
of type torch._ops.OpOverloadPacket.
Without op overload name attached to the 'target', it fails during executorch
tracing. Reason is that executorch tracing expects node's targets to be
instances of torch._ops.OpOverload and not torch._ops.OpOverloadPacket.
So for consistency and tracing reasons, fixing convert pass to insert ops which
are torch._ops.OpOverload
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D46342822
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103251
Approved by: https://github.com/andrewor14
This commit changes ModelReportObserver variables to buffers similar to other observers. This will allow for gathering data on other device than CPU.
Moreover, updates InputWeightEqualizationDetector to compute weight stats that are on GPU
Tested with running tests `test/quantization/fx/test_model_report_fx.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97971
Approved by: https://github.com/vkuzo
Summary:
Previously, the test for the convert flow in Conv + BN
QAT fusion was not enabled by mistake. However, reenabling this
test uncovered several bugs:
(1) The replaced nodes returned by subgraph rewriter were not
handled correctly. This is because a recent change in the subgraph
rewriter (#100556) fixed only the prepare case but not the convert
case. This commit brings this fix to the convert case as well and
deduplicates some code between the two cases.
(2) When folding BN into conv, we used the wrong arg index to get
the BN eps value. This resulted in an incorrect conv weight.
(3) In FX, we currently do a hack for weighted modules where we
observe the weights once in convert in order to ensure we get the
right shapes for these weight observers. This caused the numerics
to diverge between PT2 and FX. This commit fixes this by skipping
this unnecessary hack for `_convert_to_reference_decomposed_fx`.
(4) Per channel support was simply missing. This commit adds
support for this by matching the quantize_per_channel and
dequantize_per_channel ops in addition to the existing ones.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_numerics
Reviewed By: jerryzh168
Differential Revision: D46097783
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102224
Approved by: https://github.com/jerryzh168
Summary:
```
"""
4. DerivedQuantizationSpec
this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors
"""
class DerivedQuantizationSpec(QuantizationSpecBase):
# specifies which Tensors the quantization parameters are derived from
# this can either be an edge from argument to node, or a node
derived_from: List[EdgeOrNode]
derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]]
...
```
Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```
Reviewed By: kimishpatel
Differential Revision: D46097855
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282
Approved by: https://github.com/andrewor14
Summary:
This PR adds support for SharedQuantizationSpec, it's used to express the sharing between
two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or
output of some node (expressed as an fx Node)
Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```
Differential Revision: D46043026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184
Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel
Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules
nn.PixelShuffle and nn.PixelUnshuffle.
torch.nn.PixelUnshuffle accepts both float and quantized inputs.
However, previously we would unnecessarily dequantize quantized inputs into floats
before passing them to the function. This commit fixes this by lowering the pattern
[dequant - PixelShuffle - quant].
[dequant - PixelUnshuffle - quant].
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926
Approved by: https://github.com/jerryzh168
Summary:
In this PR we aligned with the design of annotation API and uses quantization spec directly for annotation.
main change is in prepare, we consume quantization_spec object directly instead of the observer or fake quant constructor, we create the constructor
inside prepare, and annotation api users only need to interact with quantization spec object after this PR
Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```
Reviewed By: kimishpatel
Differential Revision: D45934088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102054
Approved by: https://github.com/kimishpatel
Summary:
This diff adds QuantizationAnnotation and also refactors the existing annotation to use this object
```
dataclass
class QuantizationAnnotation:
# How some input nodes should be quantized, expressed as QuantizationSpec
# a map from torch.fx.Node to QuantizationSpec
input_qspec_map: Dict[Node, QuantizationSpec]
# How the output of this node is quantized, expressed as QuantizationSPec
output_qspec: QuantizationSpec
class QuantizationSpec:
dtype: torch.dtype
is_dynamic: bool = False
quant_min: Optional[int] = None
quant_max: Optional[int] = None
qscheme: Optional[torch.qscheme] = None
ch_axis: Optional[int] = None
# TODO: follow up PR will add this
# Kind of observer such as MinMaxObserver, PerChannelHistogramObserver etc.
# observer_or_fake_quant_type: Union[ObserverBase, FakeQuantizeBase]
```
Example after full refactor:
```
int8_qspec = QuantizationSpec(dtype=torch.int8, ...)
weight_qspec = QuantizationSpec(dtype=torch.int8, ...)
conv_node["quantization_annotation"] = QuantizationAnnotation(
input_qspec_map={input_node: int8_qspec, weight_node: weight_qspec}
output_qspec=int8_qspec,
)
```
Note: right now input_qspec_map and output_qspec map are still using observer and fake quant constructors.
Follow up PR: change the input_qspec_map and output_qspec to use QuantizationSpec directly
Test Plan:
```
buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```
Differential Revision: D45895027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101708
Approved by: https://github.com/andrewor14
Summary: We have found that `_get_lstm_with_individually_observed_parts()` is missing setup step which sets up the LSTM layer state initializing weights and biases of this layer. This diff fixes the observed numerical discrepancy seen by CTRL team in using the above API.
Test Plan: N3358643
Differential Revision: D45821681
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101299
Approved by: https://github.com/andrewor14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005
Previously the node annotation looks like the following:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr": ...,
"weight_obs_or_fq_ctr": ...,
"weight_index": 1,
}
```
Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...},
}
```
This can support specifying the observer/fake quant constructor for any argument of the node
Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
Differential Revision: D45719781
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101041
Approved by: https://github.com/andrewor14
Summary:
Previously the node annotation looks like the following:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr": ...,
"weight_obs_or_fq_ctr": ...,
"weight_index": 1,
}
```
Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...},
}
```
This can support specifying the observer/fake quant constructor for any argument of the node
Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
Reviewed By: kimishpatel
Differential Revision: D45553195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005
Approved by: https://github.com/kimishpatel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at
let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```
let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph
(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.
(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization
the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here
Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well
Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels
Imported from OSS
Differential Revision: D45198323
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767
Approved by: https://github.com/kimishpatel
Summary:
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at
let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```
let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph
(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.
(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization
the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here
Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well
Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Approved by: https://github.com/kimishpatel
Summary:
This PR changes prepare to use some default observer/fq constructor when "target_dtype_info" is not set, this allows user to not initialize all nodes to default
observer/fq constructor. Note we may still need to annotate intermediate node after this PR, there will be a follow up PR to allow users to only annotate things they
want to quantize
Test Plan:
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99001
Approved by: https://github.com/kimishpatel, https://github.com/andrewor14
Summary:
This PR added a quantizer API to prepare_pt2e_quantizer, which enables user to annotate the nodes in the graph
directly to configure quantization, instead of relying on QConfigMapping, please see test cases in
test_quantize_pt2e.py for examples. Also added a prototype for QNNPackQuantizer, that will be modified later
to fully support different quantization capabilities of QNNPack/XNNPack
The goal for introducing quantizer is to add flexibility to the quantization API to allow modeling users and backend developers to express their quantization intentions programmably, which will free architecture optimization team from supporting different use cases in the core API in the future, as a concrete example, we used to have https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping as the API for users to express their intent for quantization in fx graph mode quantization, and it has some fancy options like `set_module_name_regex` and `set_module_name_object_type_order`, this is not needed for all backends and adds burden of maintenance to AO team, in the quantizer API we will move these APIs to a backend specific `Quantizer` that needs this feature, and all the backends or even advanced modeling users can implement their own quantizer to express their intent for quantization through annotating the nodes, for example, to express the quantization intention of quantizing a convolution node, a user will find the convolution node in the graph and do:
```
operator_spec = qnnpack_quantizer.get_default_per_channel_symmetric_qnnpack_operator_spec()
conv_node.meta["target_dtype_info"] = {
"input_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec),
"weight_obs_or_fq_ctr": _get_weight_obs_or_fq_ctr(operator_spec)
"bias_obs_or_fq_ctr": _get_bias_obs_or_fq_ctr(operator_spec),
"output_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec),
# TODO: validation of weight_index must be set if weight_obs_or_fq_ctr is set
"weight_index": 1,
# TODO: validation of bias_index must be set if bias_obs_or_fq_ctr is set
"bias_index": 2,
}
```
each backend will introduce their own quantizer, e.g. QNNPackQuantizer, which may expose more convenient APIs for modeling users to configure the annotation, and different quantizer can compose with each other to annotate the graph correctly for quantization.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_simple_quantizer
python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_conv
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97994
Approved by: https://github.com/vkuzo