Summary:
Added `set_module_name_qconfig` support to allow users to set configurations based on module name in `X86InductorQuantizer`.
For example, only quantize the `sub`:
```python
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
self.sub = Sub()
def forward(self, x):
x = self.linear(x)
x = self.sub(x)
return x
m = M().eval()
example_inputs = (torch.randn(3, 5),)
# Set config for a specific submodule.
quantizer = X86InductorQuantizer()
quantizer.set_module_name_qconfig("sub", xiq.get_default_x86_inductor_quantization_config())
```
- Added `set_module_name_qconfig` to allow user set the configuration at the `module_name` level.
- Unified the annotation process to follow this order: `module_name_qconfig`, `operator_type_qconfig`, and `global_config`.
- Added `config_checker` to validate all user configurations and prevent mixing of static/dynamic or QAT/non-QAT configs.
- Moved `_get_module_name_filter` from `xnnpack_quantizer.py` into `utils.py` as it common for all quantizer.
Test Plan
```bash
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_set_module_name
```
@Xia-Weiwen @leslie-fang-intel @jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126044
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
Summary:
During convert step observers are first replaced by Q-DQ pair. In some
scenarios like following output DQ has a fan out.
---> OP2 -> Q -> DQ
/
OP -> Q -> DQ -
\
---> OP3 -> Q -> DQ
If either op OP2 or OP3 are configured to be quantized, then the input
is expected to quantized. In this case quantized equivalent of some
pattern, that quantizer asked to be quantized, should look like:
[DQ -> {pattern} -> Q]. However, in scenario like above where DQ node
is shared between multiple "quantized" patterns, boundary of "quantized"
pattern is not clear because DQ now belongs to multiple quantized
patterns.
This poses challenge for:
- Porting metadata: which "quantized" partition this DQ node belongs
- Quantized representation, equivalently, needs to identify
self-contained quantized pattern that is replaced by its equivalent pattern
that captures compute in the quantized precision.
Test Plan:
test_duplicate_dq_pass
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel
ghstack dependencies: #107105, #107106, #107899
Summary:
Currently in quantizer/quantize_pt2e we import things from specific quantizers (XNNPACKQuantizer, QuantizationConfig) etc.
this PR removes them so it's clearer that they are not part of the core quantization code base
This PR also removed get_supported_operators from main Quantizer since we haven't seen a clear need for this API
Test Plan:
CIs
Imported from OSS
Differential Revision: D48340367
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107259
Approved by: https://github.com/kimishpatel
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations
Test Plan:
CIs
sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"
Differential Revision: D47727838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14