Commit Graph

97 Commits

Author SHA1 Message Date
Jerry Zhang
b0de6a8002 [quant][executorch] Support inception_v4 in examples (#108382)
Summary: Verified that pt2e quant flow matches the fx flow with executorch backend config

Test Plan:
with-proxy buck2 run executorch/examples/quantization:example -- -m=ic4 --verify

```
[INFO 2023-08-31 16:08:06,923 example.py:77] prepare sqnr: inf
[INFO 2023-08-31 16:08:06,932 example.py:81] quant diff max: 0.0
[INFO 2023-08-31 16:08:06,936 example.py:85] quant sqnr: inf
```

full output: https://www.internalfb.com/intern/paste/P818520579/

Differential Revision: D48889075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108382
Approved by: https://github.com/kimishpatel
2023-09-08 17:39:31 +00:00
leslie-fang-intel
fa6be2fa6f [Quant][PT2E] Remove x86 inductor pt2e backend config (#105039)
**Summary**
For the Quantization PT2E path, we recommend to use `X86InductorQuantizer` instead of backend config of `x86_inductor_pt2e_backend_config`. Remove the `x86_inductor_pt2e_backend_config` and the relevant testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105039
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-07-19 23:18:29 +00:00
Justin Chu
c0d8a4af0a [BE] Enable ruff's UP rules and autoformat ao/ (#105430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430
Approved by: https://github.com/albanD, https://github.com/malfet
2023-07-19 13:44:37 +00:00
maxren
88f1885ec9 [XNNPACK][QS8] torch.cat (#104800)
Differential Revision: [D47304143](https://our.internmc.facebook.com/intern/diff/D47304143/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104800
Approved by: https://github.com/digantdesai
2023-07-19 00:15:05 +00:00
maxren
332f2057df [XNNPACK][QS8] torch.nn.ELU (#104307)
Differential Revision: [D47075933](https://our.internmc.facebook.com/intern/diff/D47075933/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104307
Approved by: https://github.com/digantdesai
2023-07-11 00:35:13 +00:00
maxren
c4e084e3c7 [XNNPACK][QS8] torch.nn.ConstantPad2d (#104306)
Differential Revision: [D47075932](https://our.internmc.facebook.com/intern/diff/D47075932/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104306
Approved by: https://github.com/digantdesai
2023-07-11 00:35:02 +00:00
maxren
2c960c73a3 [XNNPACK][QS8] torch.permute (#104305)
Differential Revision: [D47075934](https://our.internmc.facebook.com/intern/diff/D47075934/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104305
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
maxren
d41c4a8338 [XNNPACK][QS8] torch.clamp (#104304)
Differential Revision: [D47075935](https://our.internmc.facebook.com/intern/diff/D47075935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104304
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
Digant Desai
36c4dad197 [ET][XNNPACK] Add support for quantized LeakyReLU (#104309)
Summary: Also adds support for backend_config

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Reviewed By: mcr229

Differential Revision: D47043207

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104309
Approved by: https://github.com/salilsdesai, https://github.com/manuelcandales
2023-06-30 17:42:22 +00:00
Digant Desai
ef285faeba [ET][XNNPACK] Add support for quantized Multiply (#104134)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46985169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104134
Approved by: https://github.com/mcr229, https://github.com/salilsdesai
2023-06-27 16:59:28 +00:00
Digant Desai
bd8841101b [ET][XNNPACK] Add support for quantized Sub (#104090)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46924209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104090
Approved by: https://github.com/mcr229
2023-06-26 16:32:15 +00:00
andrewor14
0d5f1cb666 [quant] Add torch.flatten to executorch backend_config (#103988)
Summary: This is needed to make the short-term and long-term
quantization numerics match for mobilenetv2.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh, kimishpatel

Subscribers: jerryzh, kimishpatel

Differential Revision: [D46909962](https://our.internmc.facebook.com/intern/diff/D46909962)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103988
Approved by: https://github.com/jerryzh168
2023-06-22 22:11:48 +00:00
maxren
f37be77813 [Quant][XNNPACK] Delegate add_relu fusion (#103266)
Quantized Resnet currently sees fused add-relu
```
--> dq
       \
        add --> relu --> quant
       /
--> dq
```

Let us support this fusion in the delegate as xnnpack can use the output_min and output_max of the op nodes to clamp the values and perform a fused add - relu operation

Differential Revision: [D45258028](https://our.internmc.facebook.com/intern/diff/D45258028/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103266
Approved by: https://github.com/jerryzh168
2023-06-12 04:35:29 +00:00
Riley Dulin
424c930f76 Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926)
Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules
nn.PixelShuffle and nn.PixelUnshuffle.

torch.nn.PixelUnshuffle accepts both float and quantized inputs.
However, previously we would unnecessarily dequantize quantized inputs into floats
before passing them to the function. This commit fixes this by lowering the pattern
[dequant - PixelShuffle - quant].
[dequant - PixelUnshuffle - quant].

Test Plan:

python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926
Approved by: https://github.com/jerryzh168
2023-05-24 19:33:26 +00:00
Max Ren
151d76cc23
[quant][pt2e] remove dropout from fx quant
Differential Revision: D45250152nnPull Request resolved: https://github.com/pytorch/pytorch/pull/99935
2023-04-27 11:22:41 -07:00
maxren
e63c502baa [Executorch][XNNPACK] Quantized Max Pool 2d (#99587)
Adding support for Quantized Max Pool 2d

Additions:
- Add quantized max pool 2d to executorch backend config
- modify max pool node visitors to grab quant params from input/output
- Add qmaxpool 2d patterns for partitioners

Differential Revision: [D44977783](https://our.internmc.facebook.com/intern/diff/D44977783/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99587
Approved by: https://github.com/jerryzh168
2023-04-22 07:17:13 +00:00
maxren
a964a3dbed [quant][pt2e] add all convs-relu fusion qat configs (#99586)
Currently when prepare_qat_fx with executorch backend config we do not properly quantize conv or conv - relu

To fix this we add all the necessary qat configs for conv and conv-relu

Differential Revision: [D45135947](https://our.internmc.facebook.com/intern/diff/D45135947/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99586
Approved by: https://github.com/jerryzh168
2023-04-22 06:44:23 +00:00
maxren
c139dfd71e [quant][pt2e] add dropout to executorch backend config (#99585)
OD Model has a dropout layer in training, In order to match eager mode qat, we also fake quantize the drop out layer in prepare_qat_fx.

To do this we add the dropout layer to the default_op_configs in which the observation type uses a different observer from its input

Differential Revision: [D45095936](https://our.internmc.facebook.com/intern/diff/D45095936/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99585
Approved by: https://github.com/jerryzh168
2023-04-22 06:41:44 +00:00
maxren
80eab63587 [Quant][pt2e] torch.mean and ReLU6 (#98984)
Add nn.Module ReLU6 in addition to functional relu6.

Also add torch .mean to quantization config

Differential Revision: [D44901038](https://our.internmc.facebook.com/intern/diff/D44901038/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98984
Approved by: https://github.com/jerryzh168
2023-04-17 18:33:04 +00:00
maxren
444a9769ae [quant][pt2e] QAT Linear (#98897)
Differential Revision: [D44901039](https://our.internmc.facebook.com/intern/diff/D44901039/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98897
Approved by: https://github.com/tiandiao123, https://github.com/manuelcandales
2023-04-17 18:27:39 +00:00
maxren
568935caca [quant][pt2e] QAT conv + bn + relu (#98896)
Differential Revision: [D44901040](https://our.internmc.facebook.com/intern/diff/D44901040/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98896
Approved by: https://github.com/manuelcandales
2023-04-17 18:24:08 +00:00
Kazuaki Ishizaki
a13a63ae9a Fix typos under torch/ao directory (#97679)
This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-04-10 22:25:15 +00:00
Xia, Weiwen
e61b842001 [Quant][FX] lower functional conv_transpose ops (#97126)
**Summary**
Support quantizing and lowering functional `conv_transpose1d`, `conv_transpose2d` and `conv_transpose3d`.
Please note that
- `conv_tranpose + relu` fusion is not supported. Remember to keep `relu` node in graph when lowering.
- `conv_tranpose` requires `per-tensor` scheme for weight. Use default `qconfig_mappings` instead of deprecated `qconfig_dict` for test cases.

**Test plan**
python test/test_quantization.py -k test_conv_transpose_not_reference
python test/test_quantization.py -k test_conv_transpose_reference
python test/test_quantization.py -k test_conv_transpose_relu_not_reference
python test/test_quantization.py -k test_conv_transpose_relu_reference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97126
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-03-31 07:17:29 +00:00
maxren
3a5ca4bdd4 [quant][pt2e] Add support for conv bn fusion in et backend config (#97389)
Batch Norm was supported by XNNPACK via fusion with the preceding convolution op. We do the same here by fusing across q -> dq nodes.

We must update the original pass in order to fuse convolution weight/bias with batch norm parameters, this way quantization is supported for batch norm

Differential Revision: [D43976324](https://our.internmc.facebook.com/intern/diff/D43976324/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97389
Approved by: https://github.com/salilsdesai
2023-03-31 05:33:42 +00:00
maxren
fe2bdfb2cd [Executorch][XNNPACK] Quantized mean (#97388)
Support Quantized Mean.dim for xnnpack

Adding another pattern for Quantized Partitioner and test to ensure quantized operator works

Differential Revision: [D43915706](https://our.internmc.facebook.com/intern/diff/D43915706/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97388
Approved by: https://github.com/salilsdesai
2023-03-31 05:08:53 +00:00
maxren
f9ca48ddb5 [Executorch][XNNPACK] Quantized hardtanh (#97387)
Lower Quantized Hardtanh to XNNPACK

Also add symmetric quantization support for hardtanh in executorch backend config

Differential Revision: [D43901222](https://our.internmc.facebook.com/intern/diff/D43901222/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97387
Approved by: https://github.com/salilsdesai
2023-03-31 04:58:24 +00:00
leslie-fang-intel
a6d8c70933 Init quantization backend config for inductor (#96476)
**Summary**
Init the backend config file with quantization recipes for quantization 2.0 inductor path. In this PR, we only init the recipe for `convolution` and `convolution_relu`.

**Test Plan**
```
clear && python -m pytest test_quantization.py -k test_inductor_backend_config_conv
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96476
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jerryzh168
2023-03-22 07:56:56 +00:00
Jiaxu Zhu
08fb13db65 [Quant] Add lowering for pixel_unshuffle/narrow (#96160)
Summary:
## Summary
torch.nn.functional.pixel_unshuffle and torch.narrow accepts both float
and quantized inputs. However, previously we would unnecessarily
dequantize quantized inputs into floats before passing them to
the function. This commit fixes this by lowering the pattern
[dequant - pixel_unshuffle - quant].
[dequant - narrow - quant].

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle
```

```
python test/test_quantization.py TestQuantizeFxOps.test_narrow
```

Differential Revision: D43858199

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96160
Approved by: https://github.com/andrewor14
2023-03-08 05:25:03 +00:00
Xia, Weiwen
f3c25cd348 [Quant][PT2.0] fix issues for rearranging weight observer for decomposed linear (#94296)
**Summary**
Linear is decomposed to `t - addmm/mm` after `dynamo.export`. And weight's observer is inserted between `t` and `addmm/mm` in the first place. `_rearrange_weight_observer_for_addmm()` is then called to move the observer between weight and `t`.
```
    before:
         weight - t - observer \
           input - observer - addmm/mm
    after:
         weight - observer - t \
           input - observer - addmm/mm
```
We found two issues of `_rearrange_weight_observer_for_addmm()`:
- It does not call `m.recompile()` in the end, so it does not function correctly.
- It does not support `aten.mm.default` which is from decomposed linear without bias.

This PR fixes the two issues and renames the function to `_rearrange_weight_observer_for_decomposed_linear`.

**Test plan**
python test/test_quantization.py -k test_rearrange_weight_observer_for_decomposed_linear

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94296
Approved by: https://github.com/jgong5, https://github.com/andrewor14
2023-03-03 15:54:11 +00:00
Jacob Szwejbka
fc324d3485 [quant][pt2e] Add support for dynamic quantization with symmetric quant for input (#94854)
Summary:
Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization
for the input in dynamic quantization

Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic"

Reviewed By: digantdesai

Differential Revision: D43134794

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854
Approved by: https://github.com/digantdesai
2023-02-28 19:39:31 +00:00
andrewor14
4fc277c338 [Quant] Add lowering for pixel_shuffle (#94769)
Summary: `torch.nn.functional.pixel_shuffle` accepts both float
and quantized inputs. However, previously we would unnecessarily
dequantize quantized inputs into floats before passing them to
the function. This commit fixes this by lowering the pattern
[dequant - pixel_shuffle - quant].

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle

Reviewers: vkuzo

Subscribers: vkuzo, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94769
Approved by: https://github.com/vkuzo
2023-02-17 23:11:17 +00:00
Vasiliy Kuznetsov
f15ab8a7f2 AO migration: replace torch internal callsites (#94170)
Summary:

Do the following renames:
`torch.quantization` -> `torch.ao.quantization`
`torch.nn.quantized` -> `torch.ao.nn.quantized`
`torch.nn.quantizable` -> `torch.ao.nn.quantizable`
`torch.nn.qat` -> `torch.ao.nn.qat`
`torch.nn.intrinsic` -> `torch.ao.nn.intrinsic`

And then, do
`torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974

Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace.

Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82

This is for https://github.com/pytorch/pytorch/issues/81667

Test plan: CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170
Approved by: https://github.com/jerryzh168
2023-02-07 02:32:23 +00:00
leslie-fang-intel
0f802eedc2 [Quant][FX] Lower QConvAddReLU2d for onednn backend (#91155)
**Summary**
Add quantization mappings for QConvAddReLU2d for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode.

**Test plan**
```
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_onednn
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_by_default
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_lowering
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91155
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-02-01 01:18:52 +00:00
leslie-fang-intel
ef4118e435 [Quant][FX] Lower QConvAdd2d for onednn backend (#91153)
**Summary**
Add quantization mappings for QConvAdd2d for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode.

**Test plan**
```
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_onednn
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_by_default
python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_lowering
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91153
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-02-01 01:14:12 +00:00
Jerry Zhang
61457671a5 [quant][fx][be] Remove _input_output_observed from backend_config (#92589)
Summary:
This is no longer needed, we can use dtype to decide whether an observer is needed or not

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92589
Approved by: https://github.com/jcaip
2023-01-27 22:17:05 +00:00
Xia, Weiwen
6fa84fdea2 [FX][Quant] Enable FX quant for patterns like x.view(x.size(...), ...) (#90001)
**Summary**
This work continues with https://github.com/pytorch/pytorch/pull/83784 by @vkuzo and includes all the changes in that PR.
Quote from https://github.com/pytorch/pytorch/pull/83784:
> Issue #83658 reports that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization.
Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization.

> The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way.

**Note**: This PR needs https://github.com/pytorch/pytorch/pull/91297 to be landed first otherwise there is a UT failure.

**Test plan**
```
python test/test_quantization.py -k test_linear_size_view
python test/test_quantization.py -k test_linear_shape_view
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90001
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-01-27 07:56:29 +00:00
Jacob Szwejbka
eb32bb2ca6 [Executorch][Quantization] Backend Config for functional embedding (#92700)
Summary: title

Test Plan: ci

Differential Revision: D42643985

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92700
Approved by: https://github.com/jerryzh168
2023-01-24 03:12:56 +00:00
Jerry Zhang
ec3941ada6 [quant][fx] Add support for GRU in fx graph mode quantization (#91976)
Summary:
might be needed by a meta-internal use case

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91976
Approved by: https://github.com/jcaip
2023-01-13 07:00:12 +00:00
andrewor14
0bd3fa3d22 [Quant][docs] Move parts of BackendConfig tutorial (#91999)
Summary: This commit moves the API specification section of
the BackendConfig tutorial to the docstrings, which is a more
suitable place for this content. This change also reduces some
duplication. There is no new content added in this change.

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91999
Approved by: https://github.com/vkuzo, https://github.com/jerryzh168
2023-01-13 05:59:22 +00:00
Vasiliy Kuznetsov
ebb7f20afc quant: make various configs printable (#91419)
Summary:

Makes various quantization configs print out human readable values instead
of just the class name. This is useful when printing these configs out when
debugging.

Test plan:

test script
```
conf_1 = torch.ao.quantization.backend_config.backend_config.DTypeConfig()
print(conf_1)

conf_2 = torch.ao.quantization.backend_config.backend_config.BackendConfig()
print(conf_2)

conf_3 = torch.ao.quantization.backend_config.backend_config.BackendPatternConfig()
print(conf_3)

conf_4 = torch.ao.quantization.fx.custom_config.PrepareCustomConfig()\
    .set_input_quantized_indexes([0])
print(conf_4)

conf_5 = torch.ao.quantization.fx.custom_config.ConvertCustomConfig()\
    .set_preserved_attributes(['foo'])
print(conf_5)

conf_6 = torch.ao.quantization.fx.custom_config.FuseCustomConfig()\
    .set_preserved_attributes(['foo'])
print(conf_6)
```

test script output
```
DTypeConfig(input_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None, quant_max_
upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero_point_exa
ct_match=None), output_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None, quant
_max_upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero_poin
t_exact_match=None), weight_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None,
quant_max_upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero
_point_exact_match=None), bias_dtype=None, is_dynamic=None)
BackendConfig({'name': '', '_pattern_complex_format_to_config': {}})
BackendPatternConfig({'observation_type': <ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT: 0>})
PrepareCustomConfig({'input_quantized_indexes': [0]})
ConvertCustomConfig({'preserved_attributes': ['foo']})
FuseCustomConfig({'preserved_attributes': ['foo']})
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91419
Approved by: https://github.com/andrewor14
2023-01-04 04:52:20 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Jerry Zhang
2a23dfe8ed [quant] Support lowering for quantized embedding byte operator (#91159)
Summary: This PR adds lowering for embedding in quantization in executorch flow

Test Plan: buck run executorch/exir/tests:quant_fusion_pass -- "executorch.exir.tests.test_quant_fusion_pass.TestQuantFusionPass.test_embedding_byte"

Reviewed By: qihqi

Differential Revision: D41673139

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91159
Approved by: https://github.com/vkuzo
2022-12-21 22:52:24 +00:00
Xia, Weiwen
a5eb564ba4 [Quant] lower fused LinearTanh for onednn backend (#89188)
**Summary**
Add fuser method and quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering are supported only in FX mode.

**Test plan**
python test_quantization.py TestFuseFx TestQuantizeFx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89188
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2022-12-20 01:30:21 +00:00
Xia, Weiwen
7b0ec67e34 [Quant][FX] Add backend config for onednn backend and fuse Linear-LeakyReLU (#88665)
**Summary**
Add backend config for onednn backend so that it can support more post op fusion for int8 inference. First `Linear - LeakyReLU` fusion is implemented based on previous PRs.

**Test plan**
python test_quantization.py TestFuseFx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88665
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2022-12-17 03:33:08 +00:00
Jerry Zhang
f7b384cc46 [reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#91035)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91035
Approved by: https://github.com/HDCharles
2022-12-17 02:15:53 +00:00
PyTorch MergeBot
ad1b04c4a9 Revert "[reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971)"
This reverts commit 7dd5e55497.

Reverted https://github.com/pytorch/pytorch/pull/90971 on behalf of https://github.com/ezyang due to still broke tons of master jobs sorry
2022-12-16 09:29:39 +00:00
Jerry Zhang
7dd5e55497 [reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90971
Approved by: https://github.com/HDCharles
2022-12-16 06:24:28 +00:00
PyTorch MergeBot
9c912c7dd0 Revert "[quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802)"
This reverts commit a66af1feba.

Reverted https://github.com/pytorch/pytorch/pull/90802 on behalf of https://github.com/malfet due to somehow broke test_resnet18 (quantization.fx.test_quantize_pt2e.TestQuantizePT2EModels), see a66af1feba
2022-12-15 23:28:21 +00:00
Jerry Zhang
a66af1feba [quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90802
Approved by: https://github.com/qihqi
2022-12-15 21:50:29 +00:00
andrewor14
691a44f403 [Quant][fx][bc-breaking] Add simpler BackendConfig pattern format (#90698)
Summary: The existing BackendConfig fusion pattern
uses a "reversed nested tuple" format that is highly
unintuitive. For example,
```
linear-relu -> (nn.ReLU, nn.Linear)
conv-bn-relu -> (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))
```
This pattern format also complicates the signatures
of the user specified "fuser methods", which needed
to accept arguments in reverse nested order to match
the patterns:
```
def fuse_linear_relu(is_qat, relu, linear):
    ...

def fuse_conv_bn_relu(is_qat, relu, bn_conv):
    (bn, conv) = bn_conv
    ...
```
Instead, this commit introduces a new pattern format that
simply specifies the ops in forward order with no nesting:
```
linear-relu -> (nn.Linear, nn.ReLU)
conv-bn-relu -> (nn.Conv2d, nn.BatchNorm2d, nn.ReLU)

def fuse_linear_relu(is_qat, linear, relu):
    ...

def fuse_conv_bn_relu(is_qat, conv, bn, relu):
    ...
```
Note that the legacy "reversed nested tuple" is still
used internally since it is more general. In the
future, we should replace it with the format used in
the subgraph rewriter in `torch.fx`, and simplify the
existing pattern matching code to handle the new
format added in this commit.

BC-breaking Notes:

Before:
```
import torch as nn
import torch.ao.nn.intrinsic as nni
from torch.ao.quantization.backend_config import BackendPatternConfig

def fuse_linear_relu(is_qat, relu, bn_conv):
    (bn, conv) = bn_conv
    return nni.ConvBnReLU2d(conv, bn, relu)

config = BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \
    .set_dtype_configs(...) \
    .set_fuser_method(fuse_conv_bn_relu) \
    .set_fused_module(nni.ConvBnReLU2d)
```

After:
```
def fuse_linear_relu(is_qat, conv, bn, relu):
    return nni.ConvBnReLU2d(conv, bn, relu)

config = BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) \
    .set_dtype_configs(...) \
    .set_fuser_method(fuse_conv_bn_relu) \
    .set_fused_module(nni.ConvBnReLU2d)
```

OR (for backward-compatibility)

```
def fuse_linear_relu(is_qat, relu, bn_conv):
    (bn, conv) = bn_conv
    return nni.ConvBnReLU2d(conv, bn, relu)

config = BackendPatternConfig() \
    ._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \
    .set_dtype_configs(...) \
    .set_fuser_method(fuse_conv_bn_relu) \
    .set_fused_module(nni.ConvBnReLU2d) \
    ._set_use_legacy_pattern_format(True)
```

Before:
```
backend_config.configs  # returns Dict[Pattern, BackendPatternConfig]
```

After:
```
backend_config.configs  # returns List[BackendPatternConfig]
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestBackendConfig

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Differential Revision: [D41954553](https://our.internmc.facebook.com/intern/diff/D41954553)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90698
Approved by: https://github.com/vkuzo, https://github.com/jerryzh168
2022-12-14 22:44:29 +00:00