Commit Graph

1529 Commits

Author SHA1 Message Date
leslie-fang-intel
bfed2da2e4 [Quant][PT2E] Re-enable test case of conv add/add_relu recipe for x86inductorquantizer (#105638)
**Summary**
Re-enable the test case of `test_conv2d_binary_with_quantizer_api` and `test_conv2d_binary_unary_with_quantizer_api` for X86InductorQuantizer. We disable these 2 testcases previously due to the time out issue in internal CI.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_unary_with_quantizer_api
```

Differential Revision: [D47745372](https://our.internmc.facebook.com/intern/diff/D47745372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105638
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
2023-08-02 17:26:22 +00:00
Jerry Zhang
d528a137e0 [quant][pt2e][quantizer] Suppoert set_module_type in XNNPACKQuantizer (#106094)
Summary:
Added support to allow users to set configurations based on module type in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_type

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106094
Approved by: https://github.com/andrewor14
ghstack dependencies: #106087
2023-08-02 08:33:58 +00:00
Sergii Dymchenko
af37608276 Remove duplicate ops tests in test_quantized_op.py (#106398)
The duplicates are after https://github.com/pytorch/pytorch/pull/94170
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106398
Approved by: https://github.com/izaitsevfb, https://github.com/malfet, https://github.com/jerryzh168
2023-08-02 02:37:36 +00:00
Jerry Zhang
92a22a8098 [quant][pt2e][quantizer] Suppoert set_module_name in XNNPACKQuantizer (#106087)
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
2023-08-02 01:19:23 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
PyTorch MergeBot
93b2036bef Revert "[quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)"
This reverts commit 3ca71ed735.

Reverted https://github.com/pytorch/pytorch/pull/105894 on behalf of https://github.com/huydhn due to breaking executorch tests internally ([comment](https://github.com/pytorch/pytorch/pull/105894#issuecomment-1654831950))
2023-07-28 01:16:02 +00:00
Jerry Zhang
3ca71ed735 [quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)
Summary:
Currently scale/zero_point for per tensor quant is stored as burnt in literals, this means these values can't be serialized in state_dict, this
PR changes them to buffers/Tensors so that they can be serialized

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D47770963](https://our.internmc.facebook.com/intern/diff/D47770963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105894
Approved by: https://github.com/kimishpatel
2023-07-26 20:15:06 +00:00
Jerry Zhang
3a77f9aaaf [quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885)
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations

Test Plan:
CIs

sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"

Differential Revision: D47727838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
2023-07-26 18:20:09 +00:00
lezcano
36ae359655 Update matmul decomp to match eager (#105850)
The decomposition was not updated after https://github.com/pytorch/pytorch/pull/95261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105850
Approved by: https://github.com/Chillee
2023-07-26 09:24:51 +00:00
vasiliy
8b34fa5e9b add basic cuda support for float8 dtypes (#105807)
Summary:

Ensures that creating tensors, copying, filling with zeroes, checking for nan works on cuda for the `float8` dtypes.  This should be enough for float8 emulation on cuda.

Note that I skipped the mul test - it's less trivial to add (need a new c++ macro), and there is no use case for it. We can follow up on that in the future.

Test Plan:

```
python test/test_quantization.py TestFloat8Dtype
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105807
Approved by: https://github.com/ezyang, https://github.com/jerryzh168, https://github.com/albanD
2023-07-25 03:43:36 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Jerry Zhang
143c83d637 [quant][pt2e][be] Remove unneeded code (#105676)
Summary:
att

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105676
Approved by: https://github.com/andrewor14
2023-07-21 00:51:22 +00:00
PaliC
9760ea58a3 fix lint (#105675)
Forward fix of the lint issues introduced by https://github.com/pytorch/pytorch/pull/104242
We are forward fixing as this PR contains Meta internal changes that would be tricky to revert smoothly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105675
Approved by: https://github.com/jerryzh168, https://github.com/albanD, https://github.com/atalman
2023-07-20 18:42:25 +00:00
Amadeusz Skrzypczak
b64bd4a5dd Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
PyTorch MergeBot
f2b15772ff Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)"
This reverts commit a9804130e5.

Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284))
2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak
a9804130e5 Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
Jerry Zhang
dff4e034b8 [quant][pt2e][be] Rename qnnpack quantizer to xnnpack quantizer (#105551)
Summary: att

Test Plan: sandcastle CI and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422894

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105551
Approved by: https://github.com/andrewor14
2023-07-20 03:52:40 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
leslie-fang-intel
fa6be2fa6f [Quant][PT2E] Remove x86 inductor pt2e backend config (#105039)
**Summary**
For the Quantization PT2E path, we recommend to use `X86InductorQuantizer` instead of backend config of `x86_inductor_pt2e_backend_config`. Remove the `x86_inductor_pt2e_backend_config` and the relevant testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105039
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-07-19 23:18:29 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Jerry Zhang
554052f321 [quant][pt2e][be] Rename prepare_pt2e_quantizer to prepare_pt2e (#105484)
Summary: att

Test Plan: sandcastle and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422892

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105484
Approved by: https://github.com/andrewor14
2023-07-19 04:51:37 +00:00
Jerry Zhang
ed2b9f1af1 [quant][pt2e] rename _quantize_pt2e to quantize_pt2e (#105377)
Summary: att

Test Plan: CIs

Reviewed By: andrewor14

Differential Revision: D47234357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105377
Approved by: https://github.com/andrewor14
2023-07-18 16:46:05 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Jerry Zhang
7b4d080496 [quant][pt2e] Rename _pt2e to pt2e (#104668)
Summary:
X-link: https://github.com/pytorch/executorch/pull/3

att

Test Plan: Imported from OSS

Differential Revision: D47202807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668
Approved by: https://github.com/andrewor14
2023-07-15 06:34:17 +00:00
Tuan Tran
85745cd3d9 Fix bug in fuse_modules (#105069)
Summary: This diff fixes the issue reported in https://github.com/pytorch/pytorch/issues/105063 and also related to internal caffe2 bug (reproduced error in internal fb pytorch: N3945540)

Test Plan: Wait for sandcastle with the added unit test in caffe2/torch/ao/quantization/eager/test_fuse_eager

Differential Revision: D47402357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105069
Approved by: https://github.com/jerryzh168
2023-07-13 23:39:59 +00:00
Andrew Or
4b29829ece [quant][pt2] Fix QAT convert for mobilenetv2 (#104110)
Summary:
QAT convert for mobilenetv2 was previously not working
because we incorrectly applied dropout during eval as well as
training. This is because, for exported models, model.eval() does
not change the behavior of dropout, unlike models with torch ops.
This commit simulates the effects of model.eval() for exported
models as well by replacing the aten dropout pattern before eval.
As of this commit, end-to-end QAT numerics now match for
mobilenetv2 between FX and PT2.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Differential Revision: D46750343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110
Approved by: https://github.com/jerryzh168
2023-07-11 18:42:42 +00:00
Jerry Zhang
c42de84708 [quant] Skip some x86 quantizer tests for now due to time out (#104666)
Summary: att

Test Plan: sandcastle ci

Reviewed By: malfet

Differential Revision: D47234616

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104666
Approved by: https://github.com/DanilBaibak
2023-07-06 17:34:13 +00:00
leslie-fang-intel
8e2e2d730e [Quant][PT2E]Accelerate test of conv2d_add and conv2d_add_relu by reducing test configs (#104686)
**Summary**
Reduce the test time of `test_conv2d_binary_with_quantizer_api` and `test_conv2d_binary_unary_with_quantizer_api`.
* For `test_conv2d_binary_with_quantizer_api`, reduce the number of test config from 12 to 2.
* For `test_conv2d_binary_unary_with_quantizer_api`, reduce the number of test config from 24 to 2.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_unary_with_quantizer_api
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104686
Approved by: https://github.com/jerryzh168
2023-07-06 07:34:46 +00:00
Jerry Zhang
611febf6cf [quant] Support integer implementations for max_pool2d (#104225)
Summary:
This is needed for representing quantized model in pt2 export quantization flow

Test Plan:
tested by opinfo, python test/test_ops.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104225
Approved by: https://github.com/kimishpatel
2023-07-05 23:54:07 +00:00
leslie-fang-intel
2a21469a77 [Quant][PT2E] Enable conv2d unary and binary recipe for x86 inductor quantizer (#98826)
**Summary**

- Recipe to annotate `conv2d_relu` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add_relu` for `X86InductorQuantizer` is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98826
Approved by: https://github.com/jerryzh168
2023-07-04 00:01:10 +00:00
Kimish Patel
bd0f0f40a1 [PT2][Quant] Enable symbolic shape in linear quantization (#104473)
When tracing with symbolic shapes, arbitrary sym_size nodes can appear in the
graph. Earlier changes did not account for this and quantizer fails to annotate
the right nodes. This diff fixes that by not annotating sym_size nodes, which
should really not be relevant for quantization.

As next steps, we should validate in quant workflow that a) sym_int nodes are not
being quantized and b) add similar support, as this diff, for generic
annotations

Differential Revision: [D47132050](https://our.internmc.facebook.com/intern/diff/D47132050/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104473
Approved by: https://github.com/jerryzh168
2023-07-01 05:14:30 +00:00
Jerry Zhang
ecca9591d5 [quant][pt2e] Add reference representation for quantize/dequantize operators (#104395)
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators

Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: kimishpatel

Differential Revision: D46959928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
2023-06-30 04:32:18 +00:00
leslie-fang-intel
945a257277 [Quant][PT2E] Supported customized _EQUIVALENT_TYPES in Module Partition API (#102516)
**Summary**
`Module Partition API` can simplify the pattern match process in Quantization annotation. However, current implementation of
`Module Partition API` has hardcoded `_EQUIVALENT_TYPES` 999bae0f54/torch/ao/quantization/_pt2e/graph_utils.py (L13-L20). So, PyTorch Extension Libraries such as [intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch) can't use `Module Partition API` with customized `_EQUIVALENT_TYPES` . In this PR, we plan to enable customized `_EQUIVALENT_TYPES` by pass in parameter.

**Test Plan**
```
python -m pytest test_graph_utils.py -k test_customized_equivalet_types_dict
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102516
Approved by: https://github.com/jgong5, https://github.com/kimishpatel
2023-06-28 00:20:25 +00:00
Jerry Zhang
c98896b76f [quant][pt2e] Add more precise representation for quantized add (#104130)
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:

float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...

inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:

```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor   /
```

Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:

```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
    x = (x_scale / out_scale) * x_i8
    y = (y_scale / out_scale) * y_i8
    out = x + y
    out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
    out += out_zero_point
    return out
```

Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D45628032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
2023-06-27 20:11:30 +00:00
HDCharles
8176cd8c0f [ao] fixing quantized prelu workflow (#103455)
Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu
was not running its observers when the quantization flow was being run,
this was a bug which is now fixed and the relevant prelu tests also now
check for this. Also added a corrected observer for PReLU to
qconfig_mapping

Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455
Approved by: https://github.com/jerryzh168
2023-06-23 16:45:40 +00:00
Andrew Or
7320ef5651 [quant][pt2] Add prepare QAT test for mobilenetv2 (#104068)
Summary:
Prepare QAT for mobilenetv2 has matching numerics with
FX. There were two changes needed to achieve this, however.
First, this commit adds observer sharing for ReLU6, which is
used extensively throughout this model. Second, in the tests we
have to use the same manual seed every time we call the models
in order to get the same results between FX and PT2. This is
because there is a dropout at the end of the model.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Reviewed By: kimishpatel

Differential Revision: D46707786

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104068
Approved by: https://github.com/jerryzh168
2023-06-23 16:34:25 +00:00
leslie-fang-intel
fcb7a47f8b [Quant][PT2E]Fix the maxpool2d input observer didn't insert after QuantizationAnnotation API (#101941)
**Summary**
The previous UT has been broken accidently, since the output of conv2d node has been annotated by mistake.
Re-enable these UTs for case:

- Single `conv2d` node, if we don't annotate the output node of `conv2d`. There should be no fake quant at conv2d's output.
-  For `conv2d-maxpool` pattern, `maxpool` should has fake quant inserted at input and output node since we annotate these nodes.

**Test Plan**
```
python -m pytest test_quantize_pt2e.py -k test_wo_annotate_conv_output_quantizer
python -m pytest test_quantize_pt2e.py -k test_max_pool2d_quantizer
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101941
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-23 11:50:31 +00:00
Andrew Or
303ff84b04 [quant][pt2] Update special qspecs after QAT rewrite (#103970)
Summary:
Special qspecs like `SharedQuantizationSpec` and
`DerivedQuantizationSpec` refer to other nodes in the graph.
However, after subgraph rewriting in QAT, the nodes referred
to in these special qspecs may be replaced by new nodes.
This could lead to the following error when inserting
observers according to these qspecs:

```
AssertionError: please make sure only refer to edge or node
that has observer/fake_quant inserted: 'getitem' not in
dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3])
```

This commit fixes this by keeping track of the nodes that
are replaced during subgraph rewriting in QAT, and using
this mapping to update the dangling references used in these
special qspecs.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec

Reviewed By: jerryzh168

Differential Revision: D46606614

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970
Approved by: https://github.com/jerryzh168
2023-06-22 20:05:57 +00:00
Omkar Salpekar
ae1ed27756 [codemod][numpy] replace np.str with str (#103931)
Summary:
`np.str` is removed from numpy 1.20.0. It was an alias to builtin `str` and it's safe to do the replacement.

The whole changes is mechanical, generated using the following onliner:
```
fbgr -sl 'np\.str\b' | xargs perl -pi -e 's,\bnp\.str\b,str,g'
```

Test Plan: sandcastle

Differential Revision: D46586144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103931
Approved by: https://github.com/huydhn
2023-06-21 18:16:42 +00:00
Andrew Or
873f772df2 [quant][pt2] Fix QAT convert for resnet18 (#103759)
Summary:
Before this commit, only prepare QAT numerics matched
between PT2 and FX for resnet18. Convert numerics diverged,
however, for two reasons:

(1) Existing patterns did not handle inplace ReLUs. This commit
fixes this by adding extra patterns that use these ReLUs instead
of the normal ones.

(2) Subgraph rewriter could not handle skip connections in
quantized models, because the dequantize node is used in both
the conv node within the match pattern, and an inplace add node
outside of the match pattern. This led the subgraph matcher to
filter out the match, complaining that it was not self contained.
This commit fixes this problem by duplicating the dequantize
nodes, one for each user, such that subsequent matches will
be self contained.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_resnet18

Reviewed By: jerryzh168

Differential Revision: D46564114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103759
Approved by: https://github.com/jerryzh168
2023-06-21 15:36:07 +00:00
leslie-fang-intel
dbc8eb2a8f [Quant][PT2E]Enable x86 inductor quantizer (#98730)
**Summary**

- Enable `X86InductorQuantizer` basics.
- Recipe to annotate conv2d is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98730
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-17 06:10:23 +00:00
Andrew Or
2bc56bec07 [quant][pt2] Handle literal conv args in convert QAT (#103731)
Summary:
Similar to the prepare case, we need to manually copy
over literal conv args such as padding and stride to the new,
replaced conv nodes, since these args are not captured by the
subgraph rewriter.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion_literal_args

Reviewed By: jerryzh168

Differential Revision: D46383130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103731
Approved by: https://github.com/jerryzh168
2023-06-16 17:15:37 +00:00
Andrew Or
dad29f906b [quant][pt2] Fix no conv bias in convert QAT (#103298)
Summary:
Previously, the QAT pattern for conv + bn with no conv
bias was not actually replaced in convert. This commit adds an
extra pattern in the convert path for this case and the numerics
now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias

Reviewed By: jerryzh168

Differential Revision: D46382819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103298
Approved by: https://github.com/jerryzh168
2023-06-16 01:59:48 +00:00
Kimish Patel
90ee6a7354 [PT2][Quant] Update op names for decomposed quantized lib (#103251)
Summary:
Dynamo trace, via dynamo.export, with aten_graph, generates graph with nodes
whose target is an isntance of torch._ops.OpOverload. Quantization workflow
inserting quantize/dequantize ops which are sometimes instances of
torch._ops.OpOverload (quantize_per_tensor.tensor) while other times instances
of torch._ops.OpOverloadPacket (quantizer_per_tensor) is a bit inconsistent.

Also not sure if it is a valid exported model, if it has nodes with target
of type torch._ops.OpOverloadPacket.

Without op overload name attached to the 'target', it fails during executorch
tracing. Reason is that executorch tracing expects node's targets to be
instances of torch._ops.OpOverload and not torch._ops.OpOverloadPacket.

So for consistency and tracing reasons, fixing convert pass to insert ops which
are torch._ops.OpOverload

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D46342822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103251
Approved by: https://github.com/andrewor14
2023-06-15 04:37:58 +00:00
Jerry Zhang
0cd155b042 [reland][quant][pt2e] Annotate GRU module (#103358) (#103526)
Summary:

att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Differential Revision: D46689428

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103526
Approved by: https://github.com/andrewor14
2023-06-13 23:43:10 +00:00
PyTorch MergeBot
13777e3391 Revert "[quant][pt2e] Annotate GRU module (#103358)"
This reverts commit 23892d8ee4.

Reverted https://github.com/pytorch/pytorch/pull/103358 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/103358#issuecomment-1588729657))
2023-06-13 07:45:40 +00:00
Jerry Zhang
23892d8ee4 [quant][pt2e] Annotate GRU module (#103358)
Summary: att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Reviewed By: kimishpatel

Differential Revision: D46384329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103358
Approved by: https://github.com/HDCharles
2023-06-13 04:10:13 +00:00
Andrew Or
89d57f269f [quant][pt2] Fix convert in Conv + BN + ReLU QAT fusion (#102993)
Summary:
Previously, the QAT pattern for conv + bn + relu was
not actually replaced in convert. This is because the quantized
QAT pattern used in convert doesn't actually have a relu node.
This commit adds this extra pattern in the convert path and
the numerics now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_numerics

Reviewed By: jerryzh168

Differential Revision: D46372411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102993
Approved by: https://github.com/jerryzh168
2023-06-08 22:10:29 +00:00
Andrew Or
9508e60c1e [quant][pt2] Add prepare QAT test for resnet18 (#103020)
Summary:
Prepare QAT for resnet18 has matching numerics with FX.
Adding this test requires us to refactor the way the test code
is structured, however.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_resnet18

Differential Revision: D46456243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103020
Approved by: https://github.com/kimishpatel
2023-06-08 05:17:20 +00:00
Andrew Or
2e8d2a2e69 [quant][pt2] Add test for inplace add (#102867)
Summary: This was broken after the recent partitioner refactors.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_inplace_add_relu

Differential Revision: D46402378

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102867
Approved by: https://github.com/jerryzh168
2023-06-07 19:43:28 +00:00
Kimish Patel
471407cf78 [PT2][Quant] Use composble quantizer for embedding + static conv + dynamic (#103116)
Summary:
In this diff we test a module that does a) emedding lookup b) runs 1D
(converted to 2D) conv and c) runs linear on the output of 1d conv.

a is quantized using embedding quantizer.
c is quantized using dynamic quantization.
b is quantized using static quantization.

We compose quantizer from [a, c, b]. Tested it against similar fx config.

Test Plan: test_embedding_conv_linear_quantization

Reviewed By: jerryzh168

Differential Revision: D46267688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103116
Approved by: https://github.com/jerryzh168
2023-06-07 17:34:59 +00:00
Kimish Patel
8e0837cf84 [PT2][Quant] Move embedding quantization to osss (#103088)
Summary:
This is in preperation to enable embeddign quantization on models with
embeddings.

Test Plan: test_embedding_quantizer

Reviewed By: jerryzh168

Differential Revision: D46267689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103088
Approved by: https://github.com/andrewor14
2023-06-06 23:07:57 +00:00
Kimish Patel
8824101fb6 [PT2][Quant] Introduce composable quantizer (#102846)
Summary:
Using composable quantizer, we can now composable two or more quantizers. In
the test here we compose quantizer configured with dynamic linear quantization,
with quantizer configured for static quantization.

Note that composable quantizer has strict order in which annotations are
applied

Test Plan: test_composable_quantizer*

Reviewed By: jerryzh168

Differential Revision: D46267690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102846
Approved by: https://github.com/andrewor14
2023-06-06 14:01:55 +00:00
Andrew Or
604a414bfc [quant][pt2] Fix convert in Conv + BN QAT fusion (#102224)
Summary:
Previously, the test for the convert flow in Conv + BN
QAT fusion was not enabled by mistake. However, reenabling this
test uncovered several bugs:

(1) The replaced nodes returned by subgraph rewriter were not
handled correctly. This is because a recent change in the subgraph
rewriter (#100556) fixed only the prepare case but not the convert
case. This commit brings this fix to the convert case as well and
deduplicates some code between the two cases.

(2) When folding BN into conv, we used the wrong arg index to get
the BN eps value. This resulted in an incorrect conv weight.

(3) In FX, we currently do a hack for weighted modules where we
observe the weights once in convert in order to ensure we get the
right shapes for these weight observers. This caused the numerics
to diverge between PT2 and FX. This commit fixes this by skipping
this unnecessary hack for `_convert_to_reference_decomposed_fx`.

(4) Per channel support was simply missing. This commit adds
support for this by matching the quantize_per_channel and
dequantize_per_channel ops in addition to the existing ones.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_numerics

Reviewed By: jerryzh168

Differential Revision: D46097783

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102224
Approved by: https://github.com/jerryzh168
2023-06-05 18:09:28 +00:00
Jerry Zhang
eb0971cfe9 [quant][pt2e][be] Remove _input_output_share_observers and _reuse_input_obs_or_fq from QuantizationAnnotation (#102854)
Summary:
att, after we support SharedQuantizationSpec we don't need these things anymore, this PR refactors the
uses of _input_output_share_observers to SharedQuantizationSpec

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: andrewor14

Differential Revision: D46301342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102854
Approved by: https://github.com/andrewor14
2023-06-03 07:31:09 +00:00
Andrew Or
a1142053f0 [reland][quant][test] Fix broken PT2 import, add warnings (#102819)
Summary:
We are currently silently skipping all PT2 quantization
tests due to a recent typo. This commit fixes this and also adds
warnings so it'll be easier to debug similar issues in the future.

Test Plan: python test/test_quantization.py

Differential Revision: D46383546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102819
Approved by: https://github.com/jerryzh168
2023-06-02 22:35:30 +00:00
Kimish Patel
2296ee08fa [PT2][Quant][BE] Test refactor to be organize them better (#102704)
Collected most of the test modules under TestHelperModules. This allows reuse
of modules when possible. Probably we can refactor a bit more but left some qat
related helper modules in their respective tests

Differential Revision: [D46267687](https://our.internmc.facebook.com/intern/diff/D46267687/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102704
Approved by: https://github.com/andrewor14
2023-06-02 18:40:05 +00:00
Kimish Patel
a53acafd2b [PT2][Quant] Enable dynamic quantization (#102703)
Enable dynamic quantization of linear layers.

Differential Revision: [D46235070](https://our.internmc.facebook.com/intern/diff/D46235070/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102703
Approved by: https://github.com/andrewor14
2023-06-02 17:52:14 +00:00
Xia, Weiwen
ce9923a1cb [Quant][PT2E][Inductor] Lower quantized conv to Inductor (#101164)
**Summary**
Enable the lowering path for reference quantized conv after PT2E to Inductor.

The pattern `decomposed dequantize -> aten.convolution -> decomposed quantize` is fused to `quantized.functional.conv1d/2d/3d` and Inductor makes external calls to these ops.

This PR focuses on functionality only. The implementation is expected to have low performance.

Code example:
```Python
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 2, stride=2, padding=0, dilation=1)

    def forward(self, x):
        return nn.functional.gelu(self.conv(x))

m = M().eval()
example_inputs = (torch.randn(2, 3, 6, 6),)
exported_model, guards = torchdynamo.export(
    m,
    *copy.deepcopy(example_inputs),
    aten_graph=True,
    tracing_mode="real",
)

qconfig = get_default_qconfig("x86")
qconfig_mapping = QConfigMapping().set_global(qconfig)
backend_config_inductor = get_x86_inductor_pt2e_backend_config()
prepared_model = prepare_pt2e(
    exported_model,
    qconfig_mapping,
    example_inputs,
    backend_config_inductor
)
prepared_model(*example_inputs)
converted_model = convert_pt2e(prepared_model)
run = compile_fx(converted_model, example_inputs)
```
Output code by Inductor
```python
from ctypes import c_void_p, c_long
import torch
import math
import random
import os
import tempfile
from torch._inductor.hooks import run_intermediate_hooks
from torch._inductor.utils import maybe_profile

from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile
from torch._inductor.select_algorithm import extern_kernels

aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()

kernel_cpp_0 = async_compile.cpp('''
#include "/tmp/torchinductor_weiwen/5d/c5dsrjrcd4jlzryilhxl5hdvcrzsoek52xzzqqy57hcoezvxxxwm.h"
extern "C" void kernel(const float* in_ptr0,
                       const float* in_ptr1,
                       const long* in_ptr2,
                       unsigned char* out_ptr0)
{
    {
        #pragma GCC ivdep
        for(long i0=static_cast<long>(0L); i0<static_cast<long>(2L); i0+=static_cast<long>(1L))
        {
            #pragma GCC ivdep
            for(long i1=static_cast<long>(0L); i1<static_cast<long>(3L); i1+=static_cast<long>(1L))
            {
                #pragma GCC ivdep
                for(long i2=static_cast<long>(0L); i2<static_cast<long>(36L); i2+=static_cast<long>(1L))
                {
                    auto tmp0 = in_ptr0[static_cast<long>(i2 + (36L*i1) + (108L*i0))];
                    auto tmp1 = in_ptr1[static_cast<long>(0L)];
                    auto tmp7 = in_ptr2[static_cast<long>(0L)];
                    auto tmp2 = 1 / tmp1;
                    auto tmp3 = static_cast<float>(1.0);
                    auto tmp4 = decltype(tmp2)(tmp2 * tmp3);
                    auto tmp5 = decltype(tmp0)(tmp0 * tmp4);
                    auto tmp6 = std::nearbyint(tmp5);
                    auto tmp8 = static_cast<float>(tmp7);
                    auto tmp9 = tmp6 + tmp8;
                    auto tmp10 = static_cast<float>(0);
                    auto tmp11 = max_propagate_nan(tmp9, tmp10);
                    auto tmp12 = static_cast<float>(127);
                    auto tmp13 = min_propagate_nan(tmp11, tmp12);
                    auto tmp14 = static_cast<unsigned char>(tmp13);
                    out_ptr0[static_cast<long>(i1 + (3L*i2) + (108L*i0))] = tmp14;
                }
            }
        }
    }
}
''')

kernel_cpp_1 = async_compile.cpp('''
#include "/tmp/torchinductor_weiwen/5d/c5dsrjrcd4jlzryilhxl5hdvcrzsoek52xzzqqy57hcoezvxxxwm.h"
extern "C" void kernel(const unsigned char* in_ptr0,
                       const long* in_ptr1,
                       const float* in_ptr2,
                       float* out_ptr0)
{
    {
        #pragma GCC ivdep
        for(long i0=static_cast<long>(0L); i0<static_cast<long>(2L); i0+=static_cast<long>(1L))
        {
            #pragma GCC ivdep
            for(long i1=static_cast<long>(0L); i1<static_cast<long>(6L); i1+=static_cast<long>(1L))
            {
                #pragma GCC ivdep
                for(long i2=static_cast<long>(0L); i2<static_cast<long>(9L); i2+=static_cast<long>(1L))
                {
                    auto tmp0 = in_ptr0[static_cast<long>(i1 + (6L*i2) + (54L*i0))];
                    auto tmp2 = in_ptr1[static_cast<long>(0L)];
                    auto tmp5 = in_ptr2[static_cast<long>(0L)];
                    auto tmp1 = static_cast<float>(tmp0);
                    auto tmp3 = static_cast<float>(tmp2);
                    auto tmp4 = tmp1 - tmp3;
                    auto tmp6 = decltype(tmp4)(tmp4 * tmp5);
                    auto tmp7 = static_cast<float>(0.5);
                    auto tmp8 = decltype(tmp6)(tmp6 * tmp7);
                    auto tmp9 = static_cast<float>(0.7071067811865476);
                    auto tmp10 = decltype(tmp6)(tmp6 * tmp9);
                    auto tmp11 = std::erf(tmp10);
                    auto tmp12 = static_cast<float>(1);
                    auto tmp13 = tmp11 + tmp12;
                    auto tmp14 = decltype(tmp8)(tmp8 * tmp13);
                    out_ptr0[static_cast<long>(i2 + (9L*i1) + (54L*i0))] = tmp14;
                }
            }
        }
    }
}
''')

async_compile.wait(globals())
del async_compile

def call(args):
    arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args
    args.clear()
    buf0 = torch.ops.quantized_decomposed.quantize_per_channel.default(arg0_1, arg4_1, arg5_1, 0, -128, 127, torch.int8)
    del arg0_1
    buf1 = buf0
    assert_size_stride(buf1, (6, 3, 2, 2), (12, 4, 2, 1))
    del buf0
    buf2 = empty_strided((2, 3, 6, 6), (108, 1, 18, 3), device='cpu', dtype=torch.uint8)
    kernel_cpp_0(c_void_p(arg8_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(buf2.data_ptr()))
    del arg8_1
    buf2 = torch._make_per_tensor_quantized_tensor(buf2, arg2_1, arg3_1)
    buf1 = torch._make_per_channel_quantized_tensor(buf1, arg4_1, arg5_1, 0)
    buf3 = torch.ao.nn.quantized.functional.conv2d(buf2, buf1, arg1_1, (2, 2), (0, 0), (1, 1), 1, 'zeros', arg6_1, arg7_1, torch.uint8)
    assert_size_stride(buf3, (2, 6, 3, 3), (54, 1, 18, 6))
    del arg1_1
    del arg2_1
    del arg3_1
    del arg4_1
    del arg5_1
    del buf1
    del buf2
    buf4 = empty_strided((2, 6, 3, 3), (54, 9, 3, 1), device='cpu', dtype=torch.float32)
    kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(arg7_1.data_ptr()), c_void_p(arg6_1.data_ptr()), c_void_p(buf4.data_ptr()))
    del arg6_1
    del arg7_1
    return (buf4, )

def benchmark_compiled_module(times=10, repeat=10):
    from torch._dynamo.testing import rand_strided
    from torch._inductor.utils import print_performance
    arg0_1 = rand_strided((6, 3, 2, 2), (12, 4, 2, 1), device='cpu', dtype=torch.float32)
    arg1_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32)
    arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
    arg3_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
    arg4_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32)
    arg5_1 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.int64)
    arg6_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
    arg7_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
    arg8_1 = rand_strided((2, 3, 6, 6), (108, 36, 6, 1), device='cpu', dtype=torch.float32)
    return print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1]), times=times, repeat=repeat)

if __name__ == "__main__":
    from torch._inductor.utils import compiled_module_main
    compiled_module_main('None', benchmark_compiled_module)
```

**Test plan**
python test/test_quantization.py TestQuantizePT2EFXX86Inductor.test_inductor_qconv_lowering

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101164
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-06-01 10:22:02 +00:00
Kimish Patel
4f468646d9 [PT2][Quant][BE] refactor tets cose to reduce duplication and standardize (#102497)
Summary:
This refactor introduces an internal function which selectively tests againt fx
quant as well. Notably this does increase  test times so wo need to figure out
how to resolve that.

Test Plan: test_quantization_pt2e

Reviewed By: jerryzh168

Differential Revision: D46154323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102497
Approved by: https://github.com/jerryzh168
2023-05-30 21:37:59 +00:00
Jerry Zhang
f14ac74fce [quant][pt2e] Add support for FixedQParamsQuantizationSpec (#102439)
Summary:
This PR adds support for FixedQParamsQuantizationSpec:

```
dataclass(eq=True, frozen=True)
class FixedQParamsQuantizationSpec(QuantizationSpecBase):
    dtype: torch.dtype
    scale: float
    zero_point: int
    quant_min: Optional[int] = None
    quant_max: Optional[int] = None
    qscheme: Optional[torch.qscheme] = None
```

This is useful to define quantization spec for operators like sigmoid which has predefined and fixed scale/zero_point

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_fixed_qparams_qspec (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D46153082

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102439
Approved by: https://github.com/kimishpatel
2023-05-30 21:28:13 +00:00
Kimish Patel
af70fe9f3e [PT2][Quant] Enable test_qnnpack_quantizer_conv_linear test (#102399)
Earlier this test was disabled due to pattern matching not working correctly.
Enablign this test now since we moved to module partitioner based matching.

Differential Revision: [D46130722](https://our.internmc.facebook.com/intern/diff/D46130722/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102399
Approved by: https://github.com/jerryzh168
2023-05-28 06:44:16 +00:00
Kimish Patel
4cb6add471 [PT2][Quant] Use module partition for fused patterns (#102394)
This diff introduces utility `find_sequential_partitions`.
This utility allows one to specify sequential pattern of
nn.Module/nn.functional and returns a list. Each item in the list contains a
List[SourcePartition] that represents sequentially connected partitions that
are of the pattern requested.
For example `find_sequential_partitions(model, [nn.Conv2d, nn.ReLU])` will find
all nn.Conv2d and nn.ReLU partitions that are sequentially connected.

Furthmore, move to using `find_sequential_partitions` for conv_bn/conv_bn_relu
for QAT.

Differential Revision: [D45948057](https://our.internmc.facebook.com/intern/diff/D45948057/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D45948057/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102394
Approved by: https://github.com/jerryzh168
2023-05-28 05:29:16 +00:00
Jerry Zhang
eda5abf5e0 [quant][pt2e] Fix propagate_annotation after recent refactors (#102422)
Summary:
Recently we changed the annotation from "target_dtype_info" to "quantization_annotation" and introduced QuantizationAnnotation API
and SharedQuantizationSpec API for users to convey sharing between input/outputs, this PR updates the _propagate_annotation
pass to accommadate the recent changes

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: kimishpatel

Differential Revision: D46153084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102422
Approved by: https://github.com/kimishpatel
2023-05-27 16:01:47 +00:00
Jerry Zhang
23223402eb [quant][pt2e] Add Support for DerivedQuantizationSpec (#102282)
Summary:
```
"""
4. DerivedQuantizationSpec
this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors
"""

class DerivedQuantizationSpec(QuantizationSpecBase):
    # specifies which Tensors the quantization parameters are derived from
    # this can either be an edge from argument to node, or a node
    derived_from: List[EdgeOrNode]
    derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]]
     ...
```

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D46097855

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282
Approved by: https://github.com/andrewor14
2023-05-27 00:24:39 +00:00
Kimish Patel
9b5e4c308c [PT2][Quant][BE] Apply formatting to test_quantize_pt2e (#102275)
Summary: Just formatting diff

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D45948056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102275
Approved by: https://github.com/andrewor14
2023-05-26 14:24:34 +00:00
Jerry Zhang
ed87508b32 [quant][pt2e] Add support for SharedQuantizationSpec (#102184)
Summary:
This PR adds support for SharedQuantizationSpec, it's used to express the sharing between
two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or
output of some node (expressed as an fx Node)

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Differential Revision: D46043026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184
Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel
2023-05-25 17:31:59 +00:00
Riley Dulin
424c930f76 Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926)
Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules
nn.PixelShuffle and nn.PixelUnshuffle.

torch.nn.PixelUnshuffle accepts both float and quantized inputs.
However, previously we would unnecessarily dequantize quantized inputs into floats
before passing them to the function. This commit fixes this by lowering the pattern
[dequant - PixelShuffle - quant].
[dequant - PixelUnshuffle - quant].

Test Plan:

python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926
Approved by: https://github.com/jerryzh168
2023-05-24 19:33:26 +00:00
Jerry Zhang
94ed26d177 [quant][pt2e] prepare_pt2e use quantization spec directly (#102054)
Summary:
In this PR we aligned with the design of annotation API and uses quantization spec directly for annotation.
main change is in prepare, we consume quantization_spec object directly instead of the observer or fake quant constructor, we create the constructor
inside prepare, and annotation api users only need to interact with quantization spec object after this PR

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D45934088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102054
Approved by: https://github.com/kimishpatel
2023-05-23 23:25:56 +00:00
leslie-fang-intel
488a4303a5 Enable quantized_max_pool3d (#101654)
**Summary**
Enable `quantized_max_pool3d` kernel to fix the issue https://github.com/pytorch/pytorch/issues/101386.

**Test Plan**
```
clear && python -u -m pytest -s -v test_quantized_op.py -k test_max_pool3d
clear && python -u -m pytest -s -v test_quantized_op.py -k test_max_pool3d_nhwc
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101654
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/mingfeima
2023-05-23 00:45:38 +00:00
Jerry Zhang
15495f2d96 [quant][pt2e] Introduce QuantizationAnnotation API (#101708)
Summary:
This diff adds QuantizationAnnotation and also refactors the existing annotation to use this object

```
dataclass
class QuantizationAnnotation:
  # How some input nodes should be quantized, expressed as QuantizationSpec
  # a map from torch.fx.Node to QuantizationSpec
  input_qspec_map: Dict[Node, QuantizationSpec]

  # How the output of this node is quantized, expressed as QuantizationSPec
  output_qspec: QuantizationSpec

class QuantizationSpec:
    dtype: torch.dtype
    is_dynamic: bool = False
    quant_min: Optional[int] = None
    quant_max: Optional[int] = None
    qscheme: Optional[torch.qscheme] = None
    ch_axis: Optional[int] = None
    # TODO: follow up PR will add this
    # Kind of observer such as MinMaxObserver, PerChannelHistogramObserver etc.
    # observer_or_fake_quant_type: Union[ObserverBase, FakeQuantizeBase]
```

Example after full refactor:

```
int8_qspec = QuantizationSpec(dtype=torch.int8, ...)
weight_qspec = QuantizationSpec(dtype=torch.int8, ...)
conv_node["quantization_annotation"] = QuantizationAnnotation(
    input_qspec_map={input_node: int8_qspec, weight_node: weight_qspec}
    output_qspec=int8_qspec,
)
```

Note: right now input_qspec_map and output_qspec map are still using observer and fake quant constructors.
Follow up PR: change the input_qspec_map and output_qspec to use QuantizationSpec directly

Test Plan:
```
buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Differential Revision: D45895027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101708
Approved by: https://github.com/andrewor14
2023-05-19 22:54:27 +00:00
andrewor14
8e51521cee [quant][pt2] Handle maxpool + conv + bn case in prepare QAT (#100941)
Summary: This commit fixes a bug where we copy the metadata from
the wrong node after replace_pattern. This happened in the case
of [maxpool -> getitem1 -> conv -> bn -> getitem2], where
`getitem1` is the placeholder node fed into the fused conv + bn
pattern, and we incorrectly copied the metadata from `getitem1`
instead of from `getitem2`. We fix this bug by filtering out
the placeholder nodes before doing the metadata copying.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_getitem_placeholder

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45916751](https://our.internmc.facebook.com/intern/diff/D45916751)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100941
Approved by: https://github.com/jerryzh168
2023-05-17 17:36:32 +00:00
andrewor14
964e61ee95 [quant][pt2] Handle no conv bias in prepare QAT fusion (#100610)
Summary: This commit adds support for conv + BN fusion for the
case where conv has no bias. Since the replacement patterns with
and without conv bias are substantially different, we perform the
replacement for each of these two cases separately.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45743510](https://our.internmc.facebook.com/intern/diff/D45743510)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100610
Approved by: https://github.com/jerryzh168
2023-05-16 04:05:53 +00:00
PyTorch MergeBot
66eef31444 Revert "[fx] change from #users to num_users in graph printout (#101140)"
This reverts commit e568c5a18d.

Reverted https://github.com/pytorch/pytorch/pull/101140 on behalf of https://github.com/jeanschmidt due to There are internal changes to this commit that are preventing landing, so I am reverting to unblock the diff train ([comment](https://github.com/pytorch/pytorch/pull/101140#issuecomment-1547989487))
2023-05-15 14:35:22 +00:00
Aaron Gokaslan
616208b4fe [BE]: Cleanup deprecated stdlib imports (UP006,UP035) (#101361)
Automated fix to cleanup some deprecated/useless python imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101361
Approved by: https://github.com/zou3519
2023-05-15 14:32:41 +00:00
andrewor14
4434b9af6a [quant][pt2] Handle constant conv args in prepare QAT fusion (#100525)
Summary: Previously, we would only match and replace conv + BN
patterns with default constant args for conv (stride, padding,
dilation etc.). If the user sets one of these args to values
that are different from the default, we would simply not fuse
the pattern. This is due to a limitation in the subgraph
rewriter: see https://github.com/pytorch/pytorch/issues/100419.

This commit works around the above limitation by first
configuring the subgraph rewriter to ignore literals when
matching, and then manually copy over the constant args to the
new subgraph after `replace_pattern`.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_constant_args

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45515437](https://our.internmc.facebook.com/intern/diff/D45515437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100525
Approved by: https://github.com/jerryzh168
2023-05-12 19:15:47 +00:00
Michael Suo
e568c5a18d [fx] change from #users to num_users in graph printout (#101140)
`#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140
Approved by: https://github.com/ezyang
2023-05-12 04:34:01 +00:00
leslie-fang-intel
a66de845de [Quant][PT2E]Fix pt2e quantization maxpool input observer issue (#100961)
**Summary**
Fix the issue https://github.com/pytorch/pytorch/issues/100959. The root cause is for node of `torch.ops.aten.max_pool2d_with_indices.default`, there are 2 output node as output tensor and max indices. So in its `node.meta["val"]` is a tuple of `FakeTensors` (For example: `'val': (FakeTensor(..., size=(1, 2, s1, s1)), FakeTensor(..., size=(1, 2, s1, s1), dtype=torch.int64))`). It will fail the check  of inserting observer since which only accept one `FakeTensor` case.

**Test Plan**
```
python -m pytest test_quantize_pt2e.py -k test_max_pool2d_quantizer
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100961
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2023-05-11 06:14:34 +00:00
Jerry Zhang
c3f3cb5b0f [quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442)
Summary:
This PR adds support for folding bn weights into conv for QAT flow, this is equivalent
to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223

Items that needs followup:
* there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: andrewor14

Differential Revision: D45344281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442
Approved by: https://github.com/kimishpatel
2023-05-09 19:43:51 +00:00
andrewor14
4154c8ea15 [quant][pt2] Add Conv + BN + ReLU fusion for prepare QAT (#100283)
Summary: This follows https://github.com/pytorch/pytorch/pull/98568,
which lays all the groundwork for Conv + BN fusion in prepare QAT.
Conv + BN + ReLU fusion can reuse the same match and replace
patterns and is handled similarly.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_fusion
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_numerics

Reviewers: kimishpatel, jerryzh168

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D45515494](https://our.internmc.facebook.com/intern/diff/D45515494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100283
Approved by: https://github.com/jerryzh168
2023-05-07 20:35:16 +00:00
andrewor14
d4dad36cf1 [quant][pt2] Improve prepare_qat Conv + BN numerics test (#100271)
Summary: This commit makes two improvements to the existing
test for Conv + BN fusion in `prepare_qat_pt2e`:

(1) Test `per_tensor_symmetric` in addition to `per_channel_symmetric`
(2) Initialize BN stats the same way in both flows. This is
    necessary to get the `per_tensor_symmetric` case to pass.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_numerics

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45512851](https://our.internmc.facebook.com/intern/diff/D45512851)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100271
Approved by: https://github.com/jerryzh168
2023-05-05 04:46:13 +00:00
Kimish Patel
24e9b8f5f4 [PT2E][Quant] Use subgraph matcher annotate linear pattern (#100566)
This diff adds subgraph matcher for pattern matching. Furthermore, we also move
annotations for the matched subgraph in a way that only input and output nodes
of the matched subgraph have quantization related valid annotations.

Differential Revision: [D45535539](https://our.internmc.facebook.com/intern/diff/D45535539/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100566
Approved by: https://github.com/jerryzh168
2023-05-04 21:31:59 +00:00
andrewor14
d176e3ff69 [quant][pt2] Add test for prepare_qat Conv + BN numerics (#99846)
Summary: This adds the test to compare the numerics of PT2 vs
FX after the Conv + BN fusion in prepare_qat.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_numerics

Reviewers: kimishpatel, jerryzh168

Differential Revision: [D45360706](https://our.internmc.facebook.com/intern/diff/D45360706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99846
Approved by: https://github.com/jerryzh168
2023-04-28 16:43:10 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Edward Z. Yang
0eb59ad093 Change export tracing_mode default to symbolic (#99877)
Differential Revision: [D45231039](https://our.internmc.facebook.com/intern/diff/D45231039/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99877
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2023-04-25 00:12:12 +00:00
PyTorch MergeBot
c83e1f517d Revert "Delete tracing_mode argument to export (#99555)"
This reverts commit e9786149ab.

Reverted https://github.com/pytorch/pytorch/pull/99555 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-24 08:21:41 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
Edward Z. Yang
e9786149ab Delete tracing_mode argument to export (#99555)
You can have any color you want, as long as it's tracing_mode="symbolic"

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99555
Approved by: https://github.com/voznesenskym
2023-04-21 16:20:51 +00:00
andrewor14
22af604e1b [quant][pt2] Add Conv + BN fusion for prepare QAT (#98568)
**Summary:** This commit adds the `prepare_qat_pt2e` API and the
fusion logic for Conv + BN. We use the subgraph rewriter to
match and replace the pattern with the existing logic in
`nniqat.ConvBn2d`. Note this is not the end-to-end flow yet.
In particular, the convert flow needs to swap the new subgraph
with another one that merges the batchnorm stats back into conv.

The Conv + BN fusion is implemented in the following steps:

1. Annotate all nodes in the pattern `[conv - bn - getitem]`

2. Match and replace this pattern with the fused QAT pattern
   (note that this is a larger subgraph than the original one)

3. Copy over metadata from the original nodes to the
   corresponding nodes in the new subgraph, to ensure the
   stack traces and dtype annotations are preserved

4. Prepare will insert fake quantizes in the right places
   based on the annotations

**Test Plan:**
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion

**Reviewers:** jerryzh168, kimishpatel, yanboliang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98568
Approved by: https://github.com/kimishpatel
2023-04-20 20:15:28 +00:00
Jerry Zhang
b0df0cd7cc [reland][quant][fix] Compare resnet with quantizer api with the prepare_fx and decomposed convert flow (#99355)
Summary:
Using a decomposed convert to make sure we get exact match, this means the nodes in resnet are
annotated correctly, reland for https://github.com/pytorch/pytorch/pull/98905

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels.test_resnet18_with_quantizer_api

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D45071168](https://our.internmc.facebook.com/intern/diff/D45071168)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99355
Approved by: https://github.com/kimishpatel
2023-04-19 16:47:15 +00:00
Kimish Patel
c0be06667f [PT2E][Quant] Support for embedding op quantization via
ExecuTorchNativeQuantizer (#99106)

ExecuTorchNativeQuantizer

ExecuTorchNativeQuantizer is a terribly name, I admit, however lets fix it once
we align on what the quantized kernel lib within executorch runtime should be called

Differential Revision: [D44986258](https://our.internmc.facebook.com/intern/diff/D44986258/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44986258/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99106
Approved by: https://github.com/jerryzh168
2023-04-18 16:59:37 +00:00
Kimish Patel
36a95625da [PT2E][Quant][BE] Refactor observer code (#99066)
Combine per channel and per tensor observer code

Differential Revision: [D44918494](https://our.internmc.facebook.com/intern/diff/D44918494/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99066
Approved by: https://github.com/jerryzh168
2023-04-17 16:17:36 +00:00
Kimish Patel
4f4e0db5bd [PT2E][Quant][BE] Split short term and long term tests in different files (#99065)
Just for better organization

Differential Revision: [D44918492](https://our.internmc.facebook.com/intern/diff/D44918492/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44918492/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99065
Approved by: https://github.com/jerryzh168
2023-04-17 16:12:47 +00:00
Kimish Patel
bcf6393024 [PT2E][Quant][BE] Move pt2e quantization test to separate folder (#99064)
Move it out of fx for better code organizations

Differential Revision: [D44918496](https://our.internmc.facebook.com/intern/diff/D44918496/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44918496/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99064
Approved by: https://github.com/jerryzh168
2023-04-17 16:07:03 +00:00
Kimish Patel
31f311a816 [PT2E][Quantization] Refactor Quantizer and QNNPACKQuantizer (#99063)
This diff renames quantization spec/config and operator config. It moves these
datastructures to base quantizer.
Base quantizer API now has get_supported_operators that returns list of
patterns that a quantizer quantizes.
There are two choices being debated for how to convey to user what a particular
quantizer will quantize.

1. Modules. We just convey what nn.Modules will be quantized. Of course that
does not mean that equivalent functional variants wont be quantized, however
for simplifity we just use nn.Module. If certain ops are quatnzied in fused
manner then that will considered internal details. Pros and cons of this
approach
pros:
  - Simple. Only nn Modules are listed.
  - User does not have to see fusion patterns.
Cons:
  - confusing perhaps because it is not clear if supported = nn.Conv2d also
    means that the quantizer supported functional.conv2d
  - Hiding fusion pattern means user has no say in not fusing. Meaning if
    conv2d + relu is fused and user configures to quantize only conv, quantizer
    will also quantize the following relu as if conv2d + relu are fused.

2. Patterns. Be explicit about what is supported and enumerate all possible
compbinations.
Pros:
  - it is very clear what quantizer will do. no surprises.
Cons:
  - It is not simple to parse.
  - It can be argued taht fusion is internal detail of the quantizer. So some
    quantizer implementation may chose to expose fusion patterns, while others
    may not and may not even provide any configurability.

One option is to move set_supported_operators/modules out of base quantizer and
let each quantizer define its own way of communicating what is supported. Issue
with this is that when we want to "Compose" multiple quantizers there is no way
for user to define the order of composition if user does not know what a
quantizer supports. For exampl quantizer A may quantizer conv + relu while B
only conv, but B's implementation is fast. In that case you may compose (B, A)
such B quantizes conv and A quantizes relu. Not knowning what A
and B support, makes such composition harder

Differential Revision: [D44895547](https://our.internmc.facebook.com/intern/diff/D44895547/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44895547/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99063
Approved by: https://github.com/jerryzh168
2023-04-17 00:34:18 +00:00
PyTorch MergeBot
20a1788136 Revert "[quant][fix] Compare resnet with quantizer api with the prepare_fx and decomposed convert flow (#98905)"
This reverts commit 9e0df2379b.

Reverted https://github.com/pytorch/pytorch/pull/98905 on behalf of https://github.com/izaitsevfb due to Conflicts with D44918496 landed internally, blocks diff train import
2023-04-17 00:17:10 +00:00
Jerry Zhang
9e0df2379b [quant][fix] Compare resnet with quantizer api with the prepare_fx and decomposed convert flow (#98905)
Summary:
Using a decomposed convert to make sure we get exact match, this means the nodes in resnet are
annotated correctly

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels.test_resnet18_with_quantizer_api

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98905
Approved by: https://github.com/andrewor14
2023-04-14 16:25:15 +00:00
Jerry Zhang
09ebdf44fa [quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903)
Summary:
Fixed quant_min/quant_max for per channel quantized weight for reference quantized module in decomposed mode,
this bug is triggered while onboard an internal model

Test Plan:
python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx_per_channel_quant_module

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98903
Approved by: https://github.com/andrewor14
2023-04-13 21:55:45 +00:00
Jerry Zhang
6a568779b6 [quant][pt2e][improvement] Remove the need to annotate all nodes with default annotation (#99001)
Summary:
This PR changes prepare to use some default observer/fq constructor when "target_dtype_info" is not set, this allows user to not initialize all nodes to default
observer/fq constructor. Note we may still need to annotate intermediate node after this PR, there will be a follow up PR to allow users to only annotate things they
want to quantize

Test Plan:
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99001
Approved by: https://github.com/kimishpatel, https://github.com/andrewor14
2023-04-13 09:31:51 +00:00