Commit Graph

27 Commits

Author SHA1 Message Date
Jerry Zhang
1b51d29b66 [quant][pt2e] Enable constant folding for quantize ops (#109343)
Summary:
This PR added constant folding for quantize ops so that instead of storing fp32 weight in the
quantized model, we'll get int8/int16 etc. weight

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_fold_quantize

also will verify in executorch later

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343
Approved by: https://github.com/kimishpatel, https://github.com/jgong5
2023-09-27 06:04:45 +00:00
Kimish Patel
73ac814148 [Pytorch][quant] Move xnnpack quantizer to use aten.linear (#109254)
Summary:
Now that quantization works on pre-dispatch aten IR, moving to full set
of aten ops is ok. Plus when tracing models like ViT, the linear
projections of of k, q, v uses functional.linear and not nn.Linear,
which results not being able to extract nodes corresponding to linear.

Test Plan:
quant tests

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49252194](https://our.internmc.facebook.com/intern/diff/D49252194)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109254
Approved by: https://github.com/jerryzh168
2023-09-18 20:26:44 +00:00
Jerry Zhang
b01b934aca [quant][be] Cleanup xnnpack_quantizer implementation (#108921)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108921
Approved by: https://github.com/andrewor14
2023-09-12 19:28:41 +00:00
Jerry Zhang
241e84bf98 [quant][be] Rewrite xnnpack_quantizer_utils.py to use decorators (#108920)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108920
Approved by: https://github.com/kimishpatel
2023-09-12 00:09:13 +00:00
Jerry Zhang
b0de6a8002 [quant][executorch] Support inception_v4 in examples (#108382)
Summary: Verified that pt2e quant flow matches the fx flow with executorch backend config

Test Plan:
with-proxy buck2 run executorch/examples/quantization:example -- -m=ic4 --verify

```
[INFO 2023-08-31 16:08:06,923 example.py:77] prepare sqnr: inf
[INFO 2023-08-31 16:08:06,932 example.py:81] quant diff max: 0.0
[INFO 2023-08-31 16:08:06,936 example.py:85] quant sqnr: inf
```

full output: https://www.internalfb.com/intern/paste/P818520579/

Differential Revision: D48889075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108382
Approved by: https://github.com/kimishpatel
2023-09-08 17:39:31 +00:00
Jerry Zhang
32a16d4999 [quant][pt2e] Support int16 quantization (#108453)
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
2023-09-06 19:31:20 +00:00
Kimish Patel
eb67c452c8 [Quant] Add DQ duplication pass (#107900)
Summary:
During convert step observers are first replaced by Q-DQ pair. In some
scenarios like following output DQ has a fan out.

                 ---> OP2 -> Q -> DQ
                /
OP -> Q -> DQ -
                \
                 ---> OP3 -> Q -> DQ

If either op OP2 or OP3 are configured to be quantized, then the input
is expected to quantized. In this case quantized equivalent of some
pattern, that quantizer asked to be quantized, should look like:
[DQ -> {pattern} -> Q]. However, in scenario like above where DQ node
is shared between multiple "quantized" patterns, boundary of "quantized"
pattern is not clear because DQ now belongs to multiple quantized
patterns.

This poses challenge for:
- Porting metadata: which "quantized" partition this DQ node belongs
- Quantized representation, equivalently, needs to identify
self-contained quantized pattern that is replaced by its equivalent pattern
that captures compute in the quantized precision.

Test Plan:
test_duplicate_dq_pass

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel
ghstack dependencies: #107105, #107106, #107899
2023-09-02 06:20:03 +00:00
Kimish Patel
37b0d76e35 [Quantization] Make annotation util functions return annotated nodes (#107106)
Summary:
Having annotation functions return nodes that are annotated is useful
specifically for adding "quantization_tag" to those nodes

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488415](https://our.internmc.facebook.com/intern/diff/D48488415)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107106
Approved by: https://github.com/jerryzh168
ghstack dependencies: #107105
2023-09-02 06:19:55 +00:00
Kimish Patel
99168c1fa9 [Quant] Use input_qspec_map for weight quantization of linear (#107105)
Summary:
In prepararation for metadata porting diff, it is required that weight
quant annotation happens via edge quantization, i.e. input_qspec_map.

Reason: Metadata is ported via associating DQ node's metadata with its
consumer while associating Q node's metadata with its producer.
Furthermore, such porting must be qualified via user intent to see if
the consumder of DQ, or producer of Q, actually specified intent of
quantization

By making quantization annotation on linear node's weight via
input_qspec_map, we can enable associating DQ of [weight -> Q -> DQ],
with the linear module.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488414](https://our.internmc.facebook.com/intern/diff/D48488414)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107105
Approved by: https://github.com/jerryzh168
2023-09-02 06:19:50 +00:00
Jerry Zhang
a9fe0b5b74 [quant][pt2e] Move propagate_annotation from quant flow to quantizer (#108320)
Summary:
Previously we run propagate_annotation by default in quantization flow to propagate annotations for ops like reshape, view etc.

Not all quantizers would need this so we moved this to xnnpack_quantizer_utils for now.

Next Step:
* make propagate_annotation function configurable with a custom list of ops
* remove unneeded ops in `_is_share_obs_or_fq_op`

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48856985](https://our.internmc.facebook.com/intern/diff/D48856985)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108320
Approved by: https://github.com/kimishpatel
2023-09-01 01:49:19 +00:00
leslie-fang-intel
fb808c30c7 x86_inductor_quantizer switches to new graph capture API (#108214)
**Summary**
Update `X86InductorQuantizer` and related testcase to the new graph capture API `capture_pre_autograd_graph`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108214
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-09-01 00:43:45 +00:00
Jerry Zhang
9ae3d7ca90 [reland][quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930) (#107992)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107992
Approved by: https://github.com/digantdesai, https://github.com/mcr229
2023-08-27 14:50:03 +00:00
Xia, Weiwen
e9b0f62a19 [Quant][PT2E] Enable linear and linear-unary post-op quant recipe for x86 inductor quantizer (#106781)
**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106781
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #105818
2023-08-27 10:50:17 +00:00
leslie-fang-intel
1147a28b0b [Quant][PT2E] Add cat and avg_pool2d recipe into x86InductorQuantizer (#106836)
**Summary**
Add `cat` and `avg_pool2d` quantization recipe as input output share observer into `x86InductorQuantizer`.

**Test Plan**
```
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_same_inputs
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_single_input
clear && python -m pytest test_x86inductor_quantizer.py -k test_avg_pool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106836
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-26 16:51:13 +00:00
leslie-fang-intel
70ca18f8a0 [Quant][PT2E] Enable X86InductorQuantizer single quantizable op(maxpool2d) (#105639)
**Summary**
In this PR, we mainly enable 2 things.

- Enable the skeleton of quantization recipe for single quantizable operators in `X86InductorQuantizer`.
- Add quantization recipe of `maxpool2d` and annotate it as input./output share observer.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_maxpool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105639
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456
2023-08-26 08:34:15 +00:00
andrewor14
240bdbea61 [quant][pt2e] Fix annotation for conv no bias case (#107971)
Summary: This fixes the no bias case for conv annotations.
Previously this would result in an index out of bounds, since
the new aten.conv2d op may not have the bias arg (unlike the
old aten.convolution op). This was not caught because of a lack
of test cases, which are added in this commit.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_no_bias
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_fusion_no_conv_bias

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel

Differential Revision: [D48696874](https://our.internmc.facebook.com/intern/diff/D48696874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107971
Approved by: https://github.com/jerryzh168
2023-08-26 01:01:54 +00:00
PyTorch MergeBot
8d44b0f5a5 Revert "[quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)"
This reverts commit 1d1739dc6d.

Reverted https://github.com/pytorch/pytorch/pull/107930 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107930#issuecomment-1694069330))
2023-08-26 00:37:02 +00:00
Jerry Zhang
1d1739dc6d [quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107930
Approved by: https://github.com/kimishpatel
2023-08-25 23:36:19 +00:00
Jerry Zhang
a0cfaf0688 [quant][pt2e] Make sure XNNPACKQuantizer works with the pre_dispatch=True (#107872)
Summary: att

Test Plan:
```
buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18

buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D48415977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107872
Approved by: https://github.com/andrewor14
2023-08-25 05:04:01 +00:00
Jerry Zhang
16fcb07846 [quant][pt2e] Add support for channel in DerivedQuantizationSpec (#107833)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_derived_qspec_per_channel

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48630535](https://our.internmc.facebook.com/intern/diff/D48630535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107833
Approved by: https://github.com/andrewor14
2023-08-24 07:45:13 +00:00
Jerry Zhang
28be2c674a [quant][pt2e] Move specific quantizer related things outside of main quant code base (#106806) (#107259)
Summary:

Currently in quantizer/quantize_pt2e we import things from specific quantizers (XNNPACKQuantizer, QuantizationConfig) etc.
this PR removes them so it's clearer that they are not part of the core quantization code base

This PR also removed get_supported_operators from main Quantizer since we haven't seen a clear need for this API

Test Plan:
CIs

Imported from OSS

Differential Revision: D48340367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107259
Approved by: https://github.com/kimishpatel
2023-08-18 21:29:09 +00:00
Jerry Zhang
4afab40b56 [quant][pt2e] Removed mean/hardtanh annotations and refactored adaptive_avg_pool annotation (#106805)
Summary:
Removed annotations for some ops, since they are handled in torch/ao/quantization/pt2e/_propagate_annotation.py

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106805
Approved by: https://github.com/kimishpatel
2023-08-10 04:51:06 +00:00
Jerry Zhang
e1a1780626 [quant][pt2e] Move annotate functions in XNNPACKQuantizer to utils (#106642)
Summary:
This is to allow sharing these annotate functions by other quantizers so that writing a new quantizer is easier

note that these annotation functions will be maintained by XNNPACKQuantizer developers instead of AO team

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106642
Approved by: https://github.com/andrewor14
2023-08-09 18:52:39 +00:00
Jerry Zhang
d528a137e0 [quant][pt2e][quantizer] Suppoert set_module_type in XNNPACKQuantizer (#106094)
Summary:
Added support to allow users to set configurations based on module type in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_type

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106094
Approved by: https://github.com/andrewor14
ghstack dependencies: #106087
2023-08-02 08:33:58 +00:00
Jerry Zhang
92a22a8098 [quant][pt2e][quantizer] Suppoert set_module_name in XNNPACKQuantizer (#106087)
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
2023-08-02 01:19:23 +00:00
Edward Z. Yang
7b9d250f06 Change _dynamo.export to be export(f)(*args, **kwargs) (#106109)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106109
Approved by: https://github.com/voznesenskym
2023-07-27 21:41:13 +00:00
Jerry Zhang
3a77f9aaaf [quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885)
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations

Test Plan:
CIs

sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"

Differential Revision: D47727838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
2023-07-26 18:20:09 +00:00