Commit Graph

1153 Commits

Author SHA1 Message Date
Zhicheng Yan
77643ed2eb [torch quantization]raise exception when OOM during combine histogram in observer (#123309)
Summary:
Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on *downsample_rate*.

For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model.

To fix OOM issue better, we use *try-catch* clause to avoid OOM.
Empirically, we set the max size of a single histogram tensor size to 1 GB.

Test Plan: Test the change for Multi-Task model (depth + segmentation)

Differential Revision: D55567292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309
Approved by: https://github.com/jerryzh168
2024-04-06 03:15:02 +00:00
andrewor14
fe29a8fbea [quant][be] Simplify fake_quant_per_channel (#123186)
Summary: We probably don't need
`torch._C._AutoDispatchBelowAutograd()`, which is to prevent
infinite recursion if the implementation calls itself. Let's
remove it and see if anything breaks. The other major change
is registering the op to the more general Autograd dispatch
key so it can be used on cuda as well.

Test Plan:
python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel

Reviewers: zou3519, bdhirsh

Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186
Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel
2024-04-03 18:06:45 +00:00
Zhengxu Chen
dacc73669c [export] Make quantizer compatible with the standard nn_module_stack. (#122819)
Summary: When we migrate to torch.export, we won't put L['self'] as the prefix for all the fqn in nn_module_stack. This diff adds the branch to handle the new case.

Test Plan: buck test mode/opt caffe2/test/quantization:test_quantization -- -r set_module_name

Differential Revision: D55436617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122819
Approved by: https://github.com/tugsbayasgalan
2024-03-28 19:36:46 +00:00
Zhicheng Yan
07f94df1a6 [torch quantization]fix HistogramObserver OOM when (self.max_val - self.min_val) is too small (#122659)
Differential Revision: D55347133

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122659
Approved by: https://github.com/jerryzh168
2024-03-28 17:41:21 +00:00
Jerry Zhang
5af839f86d [quant][pt2e] Enable observer sharing between different quantization specs (#122734)
Summary:

Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly,
since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by
adding is_dynamic to all quantization specs.

Note: SharedQuantizationSpec should probably be its own type in the future
TODO later:
(1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase,
(2). make SharedQuantizationSpec a separate type
(3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py

Test Plan:
python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734
Approved by: https://github.com/andrewor14
2024-03-27 16:45:19 +00:00
haozhe.zhu
e0329cba8a [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-26 08:03:42 +00:00
Guang Yang
c677221798 remove torchao dependency (#122524)
Test Plan:
CI

```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp32 --pt2e_quantize "xnnpack_dynamic" -2
```

```
buck run //executorch/backends/xnnpack/test:test_xnnpack_ops -- executorch.backends.xnnpack.test.ops.linear.TestLinear.test_qd8_fp32_per_token_weight_per_channel_group_int4
```

Differential Revision: D55263008

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122524
Approved by: https://github.com/jerryzh168
2024-03-23 03:18:43 +00:00
PyTorch MergeBot
60bc29aa0b Revert "[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)"
This reverts commit 2c6eeb26d3.

Reverted https://github.com/pytorch/pytorch/pull/122267 on behalf of https://github.com/jeanschmidt due to Not sure if this PR caused breakages in main rocm jobs, I'll remerge if reverting does not fix it ([comment](https://github.com/pytorch/pytorch/pull/122267#issuecomment-2015294491))
2024-03-22 15:04:30 +00:00
andrewor14
ea8e0c75c7 [quant][pt2] Fix create FQ with FixedQParamsQSpec (#122104)
Summary: Before we just returned a _PartialWrapper object when
using FixedQParamsQuantizationSpec in QAT. This is wrong and
we should return a FQ object instead.

Differential Revision: [D55021106](https://our.internmc.facebook.com/intern/diff/D55021106)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122104
Approved by: https://github.com/jerryzh168
2024-03-22 14:23:05 +00:00
haozhe.zhu
2c6eeb26d3 [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-22 08:12:23 +00:00
Jerry Zhang
901ba2be86 [quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ (#122046)
Summary:

also added some utils in xnnpack_quantizer_utils.py
* annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT
* annotate_conv_transpose_relu

conv_transpose + bn weights fusion is performed automatically and can not be disabled currently
we can add support to allow disable this fusion later if needed

Test Plan:
python test/test_quantization.py -k test_conv_transpose_bn_fusion

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046
Approved by: https://github.com/andrewor14
2024-03-19 21:00:57 +00:00
Tugsbayasgalan Manlaibaatar
53d2188df9 Update get_aten_graph_module (#121937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121937
Approved by: https://github.com/andrewor14
2024-03-15 20:35:55 +00:00
Le-Zheng
25e00545bb [Quant][PT2E] Enable linear and linear-unary post-op gelu quant recipe for x86 inductor quantizer (#114853)
**Summary**
Add Gelu for linear-unary post-op quantization recipe to x86 inductor quantizer.

**Test plan**
python -m pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_unary_gelu
python test/test_quantization.py -k test_linear_unary_with_quantizer_api
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114853
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2024-03-14 01:46:35 +00:00
Manuel Candales
c53e3f57b5 allow fp16 in quant/dequant decompositions (#121738)
Test Plan:
```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp16 --pt2e_quantize "xnnpack_dynamic" -2
```

Reviewed By: kirklandsign

Differential Revision: D54785950

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121738
Approved by: https://github.com/jerryzh168
2024-03-13 21:45:08 +00:00
Manuel Candales
6d8a7d6e58 [pytorch] optional zero points on dequantize per channel (#121724)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2364

bypass-github-export-checks

Test Plan: sandcastle

Reviewed By: mikekgfb

Differential Revision: D54709217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121724
Approved by: https://github.com/mikekgfb
2024-03-12 19:54:11 +00:00
kausik
edf22f3a48 Modify signature of dequantize ops for decomposed quantized Tensor (#119173) (#121450)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2308

Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any.

At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization.

This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel

Reviewed By: digantdesai

Differential Revision: D53590486

Pulled By: manuelcandales

Co-authored-by: kausik <kmaiti@habana.ai>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450
Approved by: https://github.com/jerryzh168
2024-03-12 12:36:31 +00:00
Shen Xu
159f30331f [quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548)
Test Plan:
```
buck run caffe2/test:quantization_pt2e
```

Differential Revision: D54454707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548
Approved by: https://github.com/jerryzh168
2024-03-12 02:59:12 +00:00
Jerry Zhang
a6a67da333 [quant] Add error check for input_edge annotation (#121536)
Summary:
Raises error when an input edge contains non-Node elements like constant values etc in annotation.

Test Plan:
python test/test_quantization.py -k test_input_edge_sanity_check

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121536
Approved by: https://github.com/andrewor14
2024-03-09 06:13:04 +00:00
leslie-fang-intel
975d428425 [Quant] Add the operator of decomposed fake quant per channel (#121297)
**Summary**
Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen.

**Test Plan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel
```

**Next Step**
Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-03-08 10:51:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b474a523c6 Ban passing in free function into capture_pre_autograd_graph (#120817)
Summary: Today we don't allow free functions to be tracing callable in torch.export. As a part of migrating capture_preautograd_graph usages to torch.export, we need to ban free functions to capture_preautograd_graph  as well

Test Plan: CI

Differential Revision: D54319597

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120817
Approved by: https://github.com/zhxchen17, https://github.com/andrewor14
2024-03-01 19:38:58 +00:00
andrewor14
91190d8087 [quant][pt2e] Relax model_is_exported input (#120720)
Summary: This commit relaxes the `model_is_exported` API to
additionally work for `torch.nn.Module`s in addition to just
`torch.fx.GraphModule`s, simplifying downstream uses.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D54263935](https://our.internmc.facebook.com/intern/diff/D54263935)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120720
Approved by: https://github.com/tugsbayasgalan
2024-02-28 18:32:03 +00:00
leslie-fang-intel
84de851539 [Inductor] Enable the decomposition of quant/dequant per channel (#119177)
**Summary**
Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type.
Enable decomposition of quant/dequant per channel to make it vectorized code generation.

**TestPlan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input
```

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177
Approved by: https://github.com/peterbell10, https://github.com/jansel
2024-02-19 01:30:44 +00:00
andrewor14
6ea4480818 [quant][pt2e] Add model_is_exported util function (#119726)
Summary: This commit adds the `model_is_exported` util function
for users to be able to easily tell what APIs to call to move
their models between train and eval modes. This has the
additional advantage of hiding the implementation of how we
detect a model is exported, in case the metadata format changes
in the future.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D53812972](https://our.internmc.facebook.com/intern/diff/D53812972)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119726
Approved by: https://github.com/tugsbayasgalan, https://github.com/albanD
2024-02-16 19:29:36 +00:00
andrewor14
8ec8d78ef2 [quant][pt2e][be] Rename eval_utils -> export_utils (#119725)
It's not really eval_utils anymore, since we added some training
related utils. Instead it should be util functions that are
related to general export use cases.

Differential Revision: [D53711494](https://our.internmc.facebook.com/intern/diff/D53711494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119725
Approved by: https://github.com/tugsbayasgalan
2024-02-13 19:10:06 +00:00
andrewor14
830ed6d9b2 [quant][pt2] Fix _disallow_eval_train error message (#119694)
Fix the message to use the right function name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119694
Approved by: https://github.com/tugsbayasgalan
2024-02-13 00:17:53 +00:00
Riley Dulin
44796682d0 [torch][ao] Fix module name filter for pytorch2 quantization for underscores (#119344)
Summary:
There was a bug in the module name filter for modules that had an underscore
already in them, as it was replaced with a "dot" notation.
This is because it was thought that underscores always meant a module separator,
but this isn't the case for modules whose name contains an underscore.

Test Plan:
Added a unit test. Before this change, that test failed (due to applying the wrong
qscheme). Now it passes.

Differential Revision: D53502771

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119344
Approved by: https://github.com/jerryzh168
2024-02-10 00:29:08 +00:00
Jerry Zhang
7082e24ce8 [quant][pt2e][bc-breaking] Set fold_quantize to True in convert_pt2e (#119425)
Summary: This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to set `fold_quantize` flag to True in `convert_pt2e`

Test Plan: CI

Differential Revision: D53550237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119425
Approved by: https://github.com/andrewor14
2024-02-09 18:13:43 +00:00
PyTorch MergeBot
81abc2b249 Revert "[quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)"
This reverts commit 482d952e88.

Reverted https://github.com/pytorch/pytorch/pull/118701 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/118701#issuecomment-1932866964))
2024-02-07 20:56:16 +00:00
Jerry Zhang
482d952e88 [quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)
Summary:
This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to remove `fold_quantize` flag from
`convert_pt2e`

Test Plan: CI

Differential Revision: D53247301

BC Breaking Note:

flag `fold_quantize` set to True `convert_pt2e` and now we'll fold the quantize op in the weight by default, so users will see model size reduction by default after pt2e quantization.
2.2
```
folded_model = convert_pt2e(model, fold_quantize=True)

non_folded_model = convert_pt2e(model)
```

2.3
```
folded_model = convert_pt2e(model)

non_folded_model = convert_pt2e(model, fold_quantize=False)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118701
Approved by: https://github.com/andrewor14, https://github.com/leslie-fang-intel
2024-02-07 19:10:51 +00:00
andrewor14
6c1cca153e [quant][pt2e] Allow users to override train/eval behavior (#119091)
Summary: This commit adds a util for PT2E quantization users
to call `model.train()` and `model.eval()` without error.
Instead, these will automatically call the equivalent
`move_exported_model_to_train/eval` for the user, which only
switch behavior for special ops like dropout and batchnorm.
This enables users to onboard to the PT2E flow more easily.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_exported_model_train_eval

Reviewers: jerryzh168, tugsbayasgalan, zhxchen17

Subscribers: jerryzh168, tugsbayasgalan, zhxchen17, supriyar

Differential Revision: [D53426636](https://our.internmc.facebook.com/intern/diff/D53426636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119091
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan, https://github.com/zhxchen17
2024-02-06 22:19:58 +00:00
andrewor14
70605d150b [quant][pt2] Add move_exported_model_to_train (#113492)
Summary: This is the equivalent API to `model.train()` for
exported models, analogous to `move_exported_model_to_eval`.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_inplace
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_bn

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113492
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan
2024-02-02 17:39:47 +00:00
Jiaxu Zhu
b97ab47619 [pytorch][ao] Update PerChannelMinMaxObserver default _load_from_state_dict (#118659)
Summary:
When `version` is missing in the metadata, use `min_val/max_val` as keys instead of `max_vals/min_vals`

## Reasons
1. It's been almost 2 years since this change D30003700, which means now most checkpoints are using the `max_val/min_val` keys

2. most checkpoints dumps using `model.state_dict()` don't have version info, which will lead a fake `missing keys` error when loading state_dict

Test Plan: CI

Differential Revision: D53233012

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118659
Approved by: https://github.com/jerryzh168
2024-02-01 04:39:31 +00:00
suo
5586d7797e fix up batchnorm folding in pt2 quant (#118720)
Changes to how attributes are structured messed this pass up, fix it

Differential Revision: [D53253601](https://our.internmc.facebook.com/intern/diff/D53253601/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118720
Approved by: https://github.com/SherlockNoMad
2024-01-31 17:40:47 +00:00
Jerry Zhang
82a7460b67 [quant][bc-breaking] Turn on fold_quantize by default (#118605)
Summary:
Previously by default we don't generate quantized weight, that is, we'll have fp32 weight, and
`fp32 weight -> q -> dq -> linear -> ...` in the quantized model

After this PR, we'll produce a graph with int8 weight by default after convert_pt2e:
`int8 weight -> dq -> linear -> ...`

We'll remove the fold_quantize flag in the next PR

Test Plan: CI

Differential Revision: D51730862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118605
Approved by: https://github.com/andrewor14
2024-01-30 21:42:29 +00:00
Catherine Lee
4f5785b6b3 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 21:07:01 +00:00
PyTorch MergeBot
40ece2e579 Revert "Enable possibly-undefined error code (#118533)"
This reverts commit 4f13f69a45.

Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))
2024-01-30 19:00:34 +00:00
Edward Z. Yang
4f13f69a45 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 05:08:10 +00:00
Edward Z. Yang
46712b019d Enable local_partial_types (#118467)
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432
2024-01-28 13:38:22 +00:00
Edward Z. Yang
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
le-zheng
94f0472579 [Quant] [PT2] Add Hardswish into X86InductorQuantizer Conv2d Unary Annotation (#117488)
**Summary**
Add `hardswish`  into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117488
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #117487
2024-01-20 01:37:33 +00:00
Jerry Zhang
8f1bc876b2 [quant] Support custom qmin/qmax for activation and weight for xnnpack quantizer (#117305)
Summary:
att, this allows us to experiment with 4 bit quant in xnnpack

Test Plan:
python test/test_quantization.py -k test_dynamic_linear_int4_weight

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117305
Approved by: https://github.com/digantdesai
2024-01-17 03:22:49 +00:00
Xia, Weiwen
94db6578cc [Quant] Add dynamic quantization config for x86 inductor backend (#115337)
**Description**
Add dynamic quantization config for x86 inductor backend.
To support the QKV structure in self-attention, we removed an assertion in port-metadata-pass that requires single dequantize node after quantize node.

**Test plan**
```
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_dynamic_quant_linear
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_qat_dynamic_quant_linear
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115337
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-01-10 11:33:37 +00:00
Max Ren
d2033a0639 [quant][pt2e][xnnpack_quantizer] add support for linear_relu (#117052)
Add support for linear_relu annotation for XNNPACKQuantizer, this allows the input to linear and the output to relu to share the same quantization parameter.s

Differential Revision: [D52574086](https://our.internmc.facebook.com/intern/diff/D52574086/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117052
Approved by: https://github.com/jerryzh168, https://github.com/digantdesai
2024-01-09 23:19:52 +00:00
Zhengxu Chen
5ac57a06eb [export] Refactor ExportPassBase. (#116778)
Summary:
X-link: https://github.com/pytorch/executorch/pull/1532

as title. This diff decouple the pass base library from torch export and exir, so that different layers can evolve in their own fashion, and we have more head room to divide and conquer in the future.

Test Plan: CI

Reviewed By: angelayi

Differential Revision: D52514517

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116778
Approved by: https://github.com/angelayi
2024-01-04 21:32:14 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
Jerry Zhang
41f265b06a [quant][pt2e] Preserve numeric_debug_handle in quantization flows (#116477)
Summary:
We introduced `node.meta["numeric_debug_handle"]` in https://github.com/pytorch/pytorch/pull/114315 to
indicate the numeric debug handle for values in the graph, in this PR we supported preserving this field
in prepare and convert so that we can use these for numerical debugging

Next: we also want to preserve these in deepcopy of GraphModule as well

Test Plan:
python test/test_quantization.py -k test_quantize_pt2e_preserve_handle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116477
Approved by: https://github.com/tugsbayasgalan
2024-01-03 03:39:00 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Jerry Zhang
8173d98c57 [quant][be] Skip conv-bn folding when there are no batchnorm ops (#116440)
Summary:
`_fold_conv_bn_qat` is taking a long time currently, so skipping it when it's not necessary,
we can have follow up fixes to actually reduce the patterns or cache the patterns if possible

Test Plan:
uncomment the print in `test_speed`, run

python test/test_quantization.py -k test_speed

and make sure the convert time is low, e.g. 0.1s instead of 8-9 seconds

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116440
Approved by: https://github.com/andrewor14
2023-12-28 23:33:21 +00:00
Aaron Gokaslan
bbe3261dd3 [BE]: Use iterable.chain.from_iterable where possible (#116376)
This is more readable and more efficient when dealing with lots of sequences to chain together.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116376
Approved by: https://github.com/albanD
2023-12-27 19:20:07 +00:00
Xuehai Pan
199e07f108 [pytree][BE] update treespec num_children access (#116370)
Change `len(treespec.children_spes) -> treespec.num_children`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116370
Approved by: https://github.com/Skylion007
2023-12-24 20:54:32 +00:00
Jerry Zhang
db25462ffd [quant][pt2e] Relax constraints on dtype and qscheme to allow for customizations (#116287)
Summary:
att

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116287
Approved by: https://github.com/kimishpatel
2023-12-22 03:12:04 +00:00
andrewor14
6988e40b48 [quant][fx] Lower operator.matmul in convert_fx (#113954)
Summary: We support lowering `torch.matmul` but not
`operator.matmul`. This commit adds support for the latter,
which enables lowering the shorthand `@`. This address
https://github.com/pytorch/pytorch/issues/111450.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113954
Approved by: https://github.com/jerryzh168
2023-12-12 00:34:58 +00:00
HDCharles
b5d3d3ebf0 [ao] making hist_obs handle torch.inf and closeby values (#103467)
Summary: This PR does 2 things:

1) Previously this would simply error, now it will ignore any
torch.inf values that it recieves. note: The code checks for torch.inf after
aminmax that way if there are no torch.inf values found, the perf is a
relatively unchanged

2) as mentioned in https://github.com/pytorch/pytorch/issues/100051,
values close to (but not quite at) the maximum/minimum float value could
overflow to infinity in the course of _adjust_min_max() (when this large
value would be multiplied by something in the middle of a calculation
that would otherwise result in a non inf value). This was fixed by
rearranging the order of operations for the lines in question without
altering the actual equations. Specifically, where operations in lines
1095, 1098 and 1100 have multiplication and division of large values,
its better to divide the two large values before multiplying, rather
than multiplying the two large values together (creating overflow) before dividing like it had been.

Test Plan: python test/test_quantization.py
TestObserver.test_histogram_observer_ignore_infinity

python test/test_quantization.py TestObserver.test_histogram_observer_handle_close_to_infinity
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51489345](https://our.internmc.facebook.com/intern/diff/D51489345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103467
Approved by: https://github.com/andrewor14
2023-12-08 21:41:31 +00:00
Jerry Zhang
cc8f6f56dc [quant][pt2e] Add convert callback to Observer module (#115001)
Summary:
This is to allow easier extension of quant workflow in the future, as we are seening more
diverse ways of doing quantization

putting up this for feedbacks first

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_observer_callback

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115001
Approved by: https://github.com/kimishpatel
2023-12-08 13:47:37 +00:00
Jerry Zhang
ecba053cff [quant][pt2e] XNNPACKQuantizer skip inserting observers for non-float Tensors (#114999)
Summary:
att

Test Plan:
python test/test_quantization.py -k test_add_mul_long

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114999
Approved by: https://github.com/kimishpatel, https://github.com/guangy10
2023-12-07 22:13:36 +00:00
leslie-fang-intel
7ec145bfed [Quant] [PT2] Fix XNNPACKQuantizer set_module_type issue (#115252)
**Summary**
Fix the issue https://github.com/pytorch/pytorch/issues/115251, the root-cause is we pass the `filter_fn` parameter of `find_sequential_partitions` in wrong position. Use keyword arg to fix this issue.

**Summary**
```
python -u -m pytest -s -v test_quantization.py -k test_set_module_type_case_2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115252
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-12-07 03:08:20 +00:00
leslie-fang-intel
1489e4bcf3 [Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)
**Summary**
Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d
python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval
```

Differential Revision: [D51853407](https://our.internmc.facebook.com/intern/diff/D51853407)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547
Approved by: https://github.com/jgong5, https://github.com/andrewor14
2023-12-06 19:51:22 +00:00
Jerry Zhang
1474dad28c [quant][pt2e][xnnpack] Add support for QAT dynamic quantization for linear in XNNPACKQuantizer (#113288)
Summary:
FX graph mode quant workflow and also pt2e flow relies on the `is_dynamic` flag in observer/quantizationspec to
convert an observer to dynamic quantization patterns (choose_qparams -> q -> dq), this PR added is_dynamic flag
for all observers so that it's possible to convert these observers to the pattern.

However, this dynamic quantization pattern (choose_qparams -> q -> dq) is actually only valid for MovingAverageObserver(averaging_constant=1)
for the computation before convert and after convert to match in the context of QAT. So we'll have some sanity
checks in other observers to make sure the is_dynamic is False.

Test Plan:
python test/test_quantization.py TestXNNPACKQuantizer.test_qat_dynamic_linear

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51124725](https://our.internmc.facebook.com/intern/diff/D51124725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113288
Approved by: https://github.com/kimishpatel
2023-12-04 23:06:38 +00:00
Jerry Zhang
8f164017ee [quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem (#113405)
Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113405
Approved by: https://github.com/mcr229
2023-12-02 00:44:42 +00:00
PyTorch MergeBot
c6e975bc0e Revert "[Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)"
This reverts commit bab054063c.

Reverted https://github.com/pytorch/pytorch/pull/114547 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114547#issuecomment-1836612143))
2023-12-01 18:52:51 +00:00
Jerry Zhang
64fd706b21 [quant][pt2e] Add generate_numeric_debug_handle pass (#114315)
Summary:
This is a util for numeric suite in pt2 export so that we can build
a more streamlined UX for numerical debugging in quant + executorch stack

Test Plan:
python test/test_quantization.py TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114315
Approved by: https://github.com/zhxchen17
2023-12-01 03:38:17 +00:00
leslie-fang-intel
fd7201029a [Quant] [PT2] Enable Inplace Dropout in _move_exported_model_to_eval (#114725)
**Summary**
Enable Inplace Dropout replacement in `_move_exported_model_to_eval`

**Test Plan**
```
python -u -m pytest -s -v test_quantize_pt2e.py -k test_move_exported_model_to_eval
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114725
Approved by: https://github.com/andrewor14, https://github.com/jgong5
ghstack dependencies: #114547
2023-11-30 04:43:22 +00:00
leslie-fang-intel
bab054063c [Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)
**Summary**
Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d
python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547
Approved by: https://github.com/jgong5, https://github.com/andrewor14
2023-11-30 04:31:27 +00:00
Jerry Zhang
d2f4215dbb [quant][pt2e] Fix the order for implicit sharing code (#114704)
Summary:
Current order of implicit sharing breaks common annotation patterns of SharedQuantizationSpec, so we changed the order here.
But it's not going to work in all possible annotation cases, so quantizer implementors still need to be careful.
In general if people only refer to node/edges that comes before the current node/edge in SharedQuantizationSpec, it should work I think

Test Plan: CI, make sure this Fixed some internal tests

Differential Revision: D51605918

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114704
Approved by: https://github.com/andrewor14
2023-11-29 08:58:28 +00:00
leslie-fang-intel
8c1f65dc2b [Quant] [PT2] Add Hardtanh and ReLU6 into X86InductorQuantizer Conv2d Unary Annotation (#114579)
**Summary**
Add `Hardtanh` and `ReLU6` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114579
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #114578
2023-11-28 07:18:00 +00:00
Jez Ng
5cfa0647a7 Update mypy to 1.7.0 (#114160)
It appears that `mypy` is now checking a few more previously-unchecked files; these files
are being found via import-following. Not sure exactly why they weren't being checked before.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114160
Approved by: https://github.com/eellison
ghstack dependencies: #114162
2023-11-28 06:45:55 +00:00
leslie-fang-intel
74370a8a9d Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe (#114442)
**Summary**
Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_adaptive_avg_pool2d_recipe
python -m pytest test_x86inductor_quantizer.py -k test_flatten_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114442
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-28 01:35:57 +00:00
leslie-fang-intel
e592b9a469 [Quant] [PT2] Fix an issue in Conv Binary Quantization Annotation (#114540)
**Summary**
To annotate a conv-binary pattern, should skip the pattern if the conv node has more than one user.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary2
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114540
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-28 01:06:48 +00:00
Jerry Zhang
d6578b3678 [quant][pt2e] Refactor some internal code for observer insertion (#113500)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113500
Approved by: https://github.com/kimishpatel
2023-11-22 17:44:46 +00:00
Jerry Zhang
a785fbe513 [reland][quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458) (#113920)
Summary:
Previously it is scatter in two different places: before inserting observer and during observer,
this PR moved everything before we insert observer

* Next: refactor QuantizationSpec and check more fields for sharing

Test Plan:
CI (regression tests)

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51420029](https://our.internmc.facebook.com/intern/diff/D51420029)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113920
Approved by: https://github.com/andrewor14
2023-11-22 01:48:51 +00:00
andrewor14
e5102ccd27 [quant][pt2] Support conv1d-bn QAT fusion (#113714)
Summary: Previously the PT2 QAT code only supported conv2d-bn.
This commit extends all existing QAT fusion support to conv1d-bn,
including support for all variants like relu, no bias, literal
args, cuda etc. This commit also refactors the code such that
we can support conv3d-bn easily in the future.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D51428979](https://our.internmc.facebook.com/intern/diff/D51428979)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113714
Approved by: https://github.com/jerryzh168
2023-11-17 22:09:30 +00:00
Aaron Gokaslan
d9f2cf9974 [BE]: Enable ruff rule PIE800 - unnecessary nested dict expansion (#113880)
Adds an additional list which removes unnecessary dict literal unpacking, also applies the fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113880
Approved by: https://github.com/albanD
2023-11-16 22:34:38 +00:00
PyTorch MergeBot
3c4e4d9947 Revert "[quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458)"
This reverts commit 585e315b3a.

Reverted https://github.com/pytorch/pytorch/pull/113458 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing executorch export test for llama2 ([comment](https://github.com/pytorch/pytorch/pull/113458#issuecomment-1815280715))
2023-11-16 20:43:38 +00:00
andrewor14
8241fe6edb [quant][pt2][be] Rewrite QAT annotations using subgraph matcher (#113709)
Summary: This is the recommended way to write quantizers according
to https://pytorch.org/tutorials/prototype/pt2e_quantizer.html#a-note-on-ir-for-pt2e-quantization-flow.
It is agnostic to changes in the aten IR and can be easily extended
to support conv1d-bn and conv3d-bn fusion patterns in the future.
This is the first step towards rewriting XNNPACKQuantizer using
this subgraph matcher.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D51366525](https://our.internmc.facebook.com/intern/diff/D51366525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113709
Approved by: https://github.com/jerryzh168
2023-11-16 03:57:37 +00:00
Jerry Zhang
585e315b3a [quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458)
Summary:
Previously it is scatter in two different places: before inserting observer and during observer,
this PR moved everything before we insert observer

* Next: refactor QuantizationSpec and check more fields for sharing

Test Plan:
CI (regression tests)

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113458
Approved by: https://github.com/kimishpatel
2023-11-15 21:08:39 +00:00
albanD
296c9e3ce7 upgrade lintrunner to the lowest supported versions on python 3.12 (#113562)
As per title, the current versions fail to install on 3.12.

The failures are related to https://github.com/numpy/numpy/issues/25147
They are fixed by adding manual annotations for the code in PyTorch and ignoring them on caffe2 as discussed with @malfet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113562
Approved by: https://github.com/malfet
2023-11-15 18:12:01 +00:00
Andrew Hoblitzell
9724d0fd87 docstyle _correct_bias.py _equalize.py _learnable_fake_quantize.py backend_config experimental fake_quantize.py fuse_modules.py fuser_method_mappings.py (#112992)
Fixes #112988

For files

__init__.py
_correct_bias.py
_equalize.py
_learnable_fake_quantize.py
backend_config
experimental
fake_quantize.py
fuse_modules.py
fuser_method_mappings.py

Correct the following

__init__.py:1 at module level:
        D104: Missing docstring in public package
__init__.py:144 in public function `default_eval_fn`:
        D205: 1 blank line required between summary line and description (found 0)
__init__.py:144 in public function `default_eval_fn`:
        D400: First line should end with a period (not 'f')
__init__.py:144 in public function `default_eval_fn`:
        D401: First line should be in imperative mood; try rephrasing (found 'Default')
__init__.py:152 in private class `_DerivedObserverOrFakeQuantize`:
        D204: 1 blank line required after class docstring (found 0)
__init__.py:152 in private class `_DerivedObserverOrFakeQuantize`:
        D205: 1 blank line required between summary line and description (found 0)
__init__.py:152 in private class `_DerivedObserverOrFakeQuantize`:
        D210: No whitespaces allowed surrounding docstring text
__init__.py:152 in private class `_DerivedObserverOrFakeQuantize`:
        D400: First line should end with a period (not 's')
_correct_bias.py:20 in public function `get_module`:
        D200: One-line docstring should fit on one line with quotes (found 2)
_correct_bias.py:20 in public function `get_module`:
        D210: No whitespaces allowed surrounding docstring text
_correct_bias.py:20 in public function `get_module`:
        D300: Use """triple double quotes""" (found '''-quotes)
_correct_bias.py:20 in public function `get_module`:
        D400: First line should end with a period (not 'l')
_correct_bias.py:25 in public function `parent_child_names`:
        D200: One-line docstring should fit on one line with quotes (found 2)
_correct_bias.py:25 in public function `parent_child_names`:
        D300: Use """triple double quotes""" (found '''-quotes)
_correct_bias.py:25 in public function `parent_child_names`:
        D400: First line should end with a period (not 'e')
_correct_bias.py:25 in public function `parent_child_names`:
        D401: First line should be in imperative mood (perhaps 'Split', not 'Splits')
_correct_bias.py:34 in public function `get_param`:
        D205: 1 blank line required between summary line and description (found 0)
_correct_bias.py:34 in public function `get_param`:
        D210: No whitespaces allowed surrounding docstring text
_correct_bias.py:34 in public function `get_param`:
        D300: Use """triple double quotes""" (found '''-quotes)
_correct_bias.py:34 in public function `get_param`:
        D400: First line should end with a period (not 's')
_correct_bias.py:44 in public class `MeanShadowLogger`:
        D204: 1 blank line required after class docstring (found 0)
_correct_bias.py:44 in public class `MeanShadowLogger`:
        D205: 1 blank line required between summary line and description (found 0)
_correct_bias.py:44 in public class `MeanShadowLogger`:
        D400: First line should end with a period (not 'n')
_correct_bias.py:47 in public method `__init__`:
        D107: Missing docstring in __init__
_correct_bias.py:56 in public method `forward`:
        D205: 1 blank line required between summary line and description (found 0)
_correct_bias.py:56 in public method `forward`:
        D210: No whitespaces allowed surrounding docstring text
_correct_bias.py:56 in public method `forward`:
        D300: Use """triple double quotes""" (found '''-quotes)
_correct_bias.py:56 in public method `forward`:
        D401: First line should be in imperative mood; try rephrasing (found 'The')
_correct_bias.py:77 in public method `clear`:
        D102: Missing docstring in public method
_correct_bias.py:85 in public function `bias_correction`:
        D205: 1 blank line required between summary line and description (found 0)
_correct_bias.py:85 in public function `bias_correction`:
        D210: No whitespaces allowed surrounding docstring text
_correct_bias.py:85 in public function `bias_correction`:
        D300: Use """triple double quotes""" (found '''-quotes)
_correct_bias.py:85 in public function `bias_correction`:
        D400: First line should end with a period (not 's')
_correct_bias.py:85 in public function `bias_correction`:
        D401: First line should be in imperative mood (perhaps 'Use', not 'Using')
_equalize.py:22 in public function `set_module_weight`:
        D103: Missing docstring in public function
_equalize.py:28 in public function `set_module_bias`:
        D103: Missing docstring in public function
_equalize.py:34 in public function `get_module_weight`:
        D103: Missing docstring in public function
_equalize.py:40 in public function `get_module_bias`:
        D103: Missing docstring in public function
_equalize.py:47 in public function `max_over_ndim`:
        D200: One-line docstring should fit on one line with quotes (found 2)
_equalize.py:47 in public function `max_over_ndim`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:47 in public function `max_over_ndim`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:47 in public function `max_over_ndim`:
        D400: First line should end with a period (not 's')
_equalize.py:47 in public function `max_over_ndim`:
        D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies')
_equalize.py:55 in public function `min_over_ndim`:
        D200: One-line docstring should fit on one line with quotes (found 2)
_equalize.py:55 in public function `min_over_ndim`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:55 in public function `min_over_ndim`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:55 in public function `min_over_ndim`:
        D400: First line should end with a period (not 's')
_equalize.py:55 in public function `min_over_ndim`:
        D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies')
_equalize.py:63 in public function `channel_range`:
        D200: One-line docstring should fit on one line with quotes (found 2)
_equalize.py:63 in public function `channel_range`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:63 in public function `channel_range`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:63 in public function `channel_range`:
        D400: First line should end with a period (not 'l')
_equalize.py:63 in public function `channel_range`:
        D401: First line should be in imperative mood (perhaps 'Find', not 'finds')
_equalize.py:63 in public function `channel_range`:
        D403: First word of the first line should be properly capitalized ('Finds', not 'finds')
_equalize.py:76 in public function `cross_layer_equalization`:
        D205: 1 blank line required between summary line and description (found 0)
_equalize.py:76 in public function `cross_layer_equalization`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:76 in public function `cross_layer_equalization`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:76 in public function `cross_layer_equalization`:
        D400: First line should end with a period (not 't')
_equalize.py:120 in public function `equalize`:
        D205: 1 blank line required between summary line and description (found 0)
_equalize.py:120 in public function `equalize`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:120 in public function `equalize`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:120 in public function `equalize`:
        D400: First line should end with a period (not 'l')
_equalize.py:159 in public function `converged`:
        D205: 1 blank line required between summary line and description (found 0)
_equalize.py:159 in public function `converged`:
        D210: No whitespaces allowed surrounding docstring text
_equalize.py:159 in public function `converged`:
        D300: Use """triple double quotes""" (found '''-quotes)
_equalize.py:159 in public function `converged`:
        D400: First line should end with a period (not 's')
_equalize.py:159 in public function `converged`:
        D401: First line should be in imperative mood (perhaps 'Test', not 'Tests')
_learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`:
        D204: 1 blank line required after class docstring (found 0)
_learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`:
        D205: 1 blank line required between summary line and description (found 0)
_learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`:
        D210: No whitespaces allowed surrounding docstring text
_learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`:
        D400: First line should end with a period (not 'h')
_learnable_fake_quantize.py:68 in private method `enable_param_learning`:
        D205: 1 blank line required between summary line and description (found 0)
_learnable_fake_quantize.py:68 in private method `enable_param_learning`:
        D400: First line should end with a period (not 'd')
_learnable_fake_quantize.py:68 in private method `enable_param_learning`:
        D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables')
_learnable_fake_quantize.py:78 in private method `enable_static_estimate`:
        D205: 1 blank line required between summary line and description (found 0)
_learnable_fake_quantize.py:78 in private method `enable_static_estimate`:
        D400: First line should end with a period (not 'f')
_learnable_fake_quantize.py:78 in private method `enable_static_estimate`:
        D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables')
_learnable_fake_quantize.py:87 in private method `enable_static_observation`:
        D205: 1 blank line required between summary line and description (found 0)
_learnable_fake_quantize.py:87 in private method `enable_static_observation`:
        D400: First line should end with a period (not 't')
_learnable_fake_quantize.py:87 in private method `enable_static_observation`:
        D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables')
fake_quantize.py:1 at module level:
        D205: 1 blank line required between summary line and description (found 0)
fake_quantize.py:1 at module level:
        D400: First line should end with a period (not 'n')
fake_quantize.py:61 in public class `FakeQuantizeBase`:
        D205: 1 blank line required between summary line and description (found 0)
fake_quantize.py:61 in public class `FakeQuantizeBase`:
        D210: No whitespaces allowed surrounding docstring text
fake_quantize.py:61 in public class `FakeQuantizeBase`:
        D400: First line should end with a period (not 'e')
fake_quantize.py:74 in public method `__init__`:
        D107: Missing docstring in __init__
fake_quantize.py:83 in public method `forward`:
        D102: Missing docstring in public method
fake_quantize.py:87 in public method `calculate_qparams`:
        D102: Missing docstring in public method
fake_quantize.py:91 in public method `enable_fake_quant`:
        D102: Missing docstring in public method
fake_quantize.py:95 in public method `disable_fake_quant`:
        D102: Missing docstring in public method
fake_quantize.py:99 in public method `enable_observer`:
        D102: Missing docstring in public method
fake_quantize.py:103 in public method `disable_observer`:
        D102: Missing docstring in public method
fake_quantize.py:107 in public method `with_args`:
        D102: Missing docstring in public method
fake_quantize.py:115 in public class `FakeQuantize`:
        D205: 1 blank line required between summary line and description (found 0)
fake_quantize.py:115 in public class `FakeQuantize`:
        D210: No whitespaces allowed surrounding docstring text
fake_quantize.py:115 in public class `FakeQuantize`:
        D412: No blank lines allowed between a section header and its content ('Attributes')
fake_quantize.py:150 in public method `__init__`:
        D107: Missing docstring in __init__
fake_quantize.py:188 in public method `calculate_qparams`:
        D102: Missing docstring in public method
fake_quantize.py:191 in public method `forward`:
        D102: Missing docstring in public method
fake_quantize.py:214 in public method `extra_repr`:
        D102: Missing docstring in public method
fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`:
        D205: 1 blank line required between summary line and description (found 0)
fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`:
        D210: No whitespaces allowed surrounding docstring text
fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`:
        D400: First line should end with a period (not 'n')
fake_quantize.py:268 in public method `__init__`:
        D107: Missing docstring in __init__
fake_quantize.py:279 in public method `calculate_qparams`:
        D102: Missing docstring in public method
fake_quantize.py:283 in public method `extra_repr`:
        D102: Missing docstring in public method
fake_quantize.py:292 in public class `FusedMovingAvgObsFakeQuantize`:
        D205: 1 blank line required between summary line and description (found 0)
fake_quantize.py:292 in public class `FusedMovingAvgObsFakeQuantize`:
        D400: First line should end with a period (not 'e')
fake_quantize.py:307 in public method `__init__`:
        D107: Missing docstring in __init__
fake_quantize.py:322 in public method `calculate_qparams`:
        D102: Missing docstring in public method
fake_quantize.py:326 in public method `extra_repr`:
        D102: Missing docstring in public method
fake_quantize.py:342 in public method `forward`:
        D102: Missing docstring in public method
fake_quantize.py:480 in private function `_is_fake_quant_script_module`:
        D200: One-line docstring should fit on one line with quotes (found 2)
fake_quantize.py:480 in private function `_is_fake_quant_script_module`:
        D210: No whitespaces allowed surrounding docstring text
fake_quantize.py:480 in private function `_is_fake_quant_script_module`:
        D300: Use """triple double quotes""" (found '''-quotes)
fake_quantize.py:480 in private function `_is_fake_quant_script_module`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
fake_quantize.py:491 in public function `disable_fake_quant`:
        D400: First line should end with a period (not ':')
fake_quantize.py:502 in public function `enable_fake_quant`:
        D400: First line should end with a period (not ':')
fake_quantize.py:513 in public function `disable_observer`:
        D400: First line should end with a period (not ':')
fake_quantize.py:524 in public function `enable_observer`:
        D400: First line should end with a period (not ':')
fuse_modules.py:1 at module level:
        D100: Missing docstring in public module
fuse_modules.py:39 in public function `fuse_known_modules`:
        D205: 1 blank line required between summary line and description (found 0)
fuse_modules.py:39 in public function `fuse_known_modules`:
        D400: First line should end with a period (not 'd')
fuse_modules.py:39 in public function `fuse_known_modules`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
fuse_modules.py:104 in public function `fuse_modules`:
        D400: First line should end with a period (not 'e')
fuse_modules.py:167 in public function `fuse_modules_qat`:
        D200: One-line docstring should fit on one line with quotes (found 2)
fuse_modules.py:167 in public function `fuse_modules_qat`:
        D210: No whitespaces allowed surrounding docstring text
fuse_modules.py:167 in public function `fuse_modules_qat`:
        D400: First line should end with a period (not '`')
fuser_method_mappings.py:1 at module level:
        D100: Missing docstring in public module
fuser_method_mappings.py:18 in public function `fuse_conv_bn`:
        D400: First line should end with a period (not 'e')
fuser_method_mappings.py:55 in public function `fuse_conv_bn_relu`:
        D400: First line should end with a period (not 'e')
fuser_method_mappings.py:102 in public function `fuse_linear_bn`:
        D400: First line should end with a period (not 'e')
fuser_method_mappings.py:131 in public function `fuse_convtranspose_bn`:
        D400: First line should end with a period (not 'e')
fuser_method_mappings.py:154 in private function `_sequential_wrapper2`:
        D205: 1 blank line required between summary line and description (found 0)
fuser_method_mappings.py:154 in private function `_sequential_wrapper2`:
        D210: No whitespaces allowed surrounding docstring text
fuser_method_mappings.py:154 in private function `_sequential_wrapper2`:
        D400: First line should end with a period (not 's')
fuser_method_mappings.py:182 in public function `get_fuser_method`:
        D205: 1 blank line required between summary line and description (found 0)
fuser_method_mappings.py:182 in public function `get_fuser_method`:
        D210: No whitespaces allowed surrounding docstring text
fuser_method_mappings.py:182 in public function `get_fuser_method`:
        D300: Use """triple double quotes""" (found '''-quotes)
fuser_method_mappings.py:182 in public function `get_fuser_method`:
        D400: First line should end with a period (not ',')
fuser_method_mappings.py:205 in private function `_get_valid_patterns`:
        D205: 1 blank line required between summary line and description (found 0)
fuser_method_mappings.py:205 in private function `_get_valid_patterns`:
        D400: First line should end with a period (not ',')
fuser_method_mappings.py:205 in private function `_get_valid_patterns`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
fuser_method_mappings.py:238 in public function `get_fuser_method_new`:
        D205: 1 blank line required between summary line and description (found 0)
fuser_method_mappings.py:238 in public function `get_fuser_method_new`:
        D210: No whitespaces allowed surrounding docstring text
fuser_method_mappings.py:238 in public function `get_fuser_method_new`:
        D400: First line should end with a period (not 'd')
fuser_method_mappings.py:238 in public function `get_fuser_method_new`:
        D401: First line should be in imperative mood; try rephrasing (found 'This')

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112992
Approved by: https://github.com/kit1980
2023-11-15 00:59:44 +00:00
andrewor14
14eb92cb43 [quant][pt2][be] Remove add/relu from conv-bn QAT pattern (#113006)
Summary: This commit significantly simplifies the QAT fusion
code for the `conv-bn` pattern by removing add and relu nodes
from the match and replacement patterns. This does not reduce
functionality; patterns like `conv-bn-relu`, `conv-bn-add`,
and `conv-bn-add-relu` are still supported. We simply do not
match these extra nodes, since there is actually no need to
replace them.

This has the additional benefit of reducing the number of
patterns being matched by 16x, since for each add and relu
variant of the `conv-bn` pattern there is also an in-place
variant. This also enables more flexible `conv-bn` pattern
matching in the future and keeps the number of patterns
more scalable.

One important change needed in this commit was to remove
the match filter that requires the input and output
activations to be quantized. This was necessary because
otherwise we would always expect q-dq nodes immediately
after the getitem node, instead of after the add or relu
nodes for example. This has another side benefit of
keeping QAT fusion flexible enough to support weight
only quantization.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113006
Approved by: https://github.com/jerryzh168
2023-11-14 16:08:37 +00:00
Snabel Kabiya
c21320b3b1 CPU Publish: Fix Assign device error, when module has multiple devices (#109149) (#113509)
Summary:
new version of this: https://www.internalfb.com/diff/D49110166?dst_version_fbid=252052334533986

Fix Assign device error, when module has multiple devices
If fc_fp16_quantization enabled for CPU model.
And module REMOTE_OTHER has multiple devices: {device(type='meta'), device(type='cpu')}
We fail on this assertion:
fbcode/caffe2/torch/ao/quantization/fx/utils.py
232
    assert len(devices) <= 1, (
Since CPU models work on CPU devices, added a condition before the assertion.
In case, we have CPU in module list of devices. Set device as CPU.
Please see debug details:
https://docs.google.com/document/d/1pMPCeJyMPA15NhFc2uAyNDkS9azR40uaNyOP0DIgHjU/edit

Test Plan:
AIMP_DISAGG_CPU=true buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true lego/scripts:lego_cli -- run-locally --model_entity_id 959168967 --config_version 28 --publish_context OFFLINE_PUBLISH --lego_pipeline aiplatform.modelstore.model_generation.lego.lego_pipeline_builder.gmpp_lego_pipeline --gmpp_config '{"gmpp_pipeline_descriptor": "aiplatform.modelstore.model_generation.v1.ads_pipelines.aimp_pyper_pipeline.model_generation_pipeline", "worker_process_number":12, "worker_thread_per_process_number": 6, "use_work_assignment": true}' 2>&1 | tee /tmp/gmpp_lc.txt
Snapshot:
https://www.internalfb.com/manifold/explorer/ads_storage_fblearner/tree/user/facebook/fblearner/predictor/959168967/47

Differential Revision: D51226114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113509
Approved by: https://github.com/jerryzh168
2023-11-14 06:15:32 +00:00
Jerry Zhang
501d118255 [quant][pt2e] Add transform_for_annotation method in Quantizer (#113115)
Summary:
Adding the method so that people can do some transformations before annotation to make the graph easier to annotate

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_transform_for_annotation

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51141080](https://our.internmc.facebook.com/intern/diff/D51141080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113115
Approved by: https://github.com/kimishpatel
2023-11-09 20:23:29 +00:00
Jerry Zhang
12c257cc00 [qunat][pt2e] Support allow_implicit_sharing flag (#112929)
Summary:
For a Node: node1 and edge: (node1, node2), since they are observing the same
Tensor, we may want to implicitly share observers, this flag allows people to
turn off this behavior for the output of the node

See the test_allow_implicit_sharing test for use case

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_implicit_sharing

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112929
Approved by: https://github.com/kimishpatel
2023-11-08 23:47:17 +00:00
andrewor14
c0aba9be41 [quant][pt2] Fix custom dtype per channel weight in QAT (#112612)
Summary: Previously we only copied over q/dq args for the per
tensor case. This was because the qparams for `quantize_per_tensor`
are literals while the qparams for `quantize_per_channel` are
`get_attr` nodes (tensors), which disappear from the original
nodes in the graph after subgraph rewriting.

However, this is problematic because, in the per channel case,
not all q/dq args are tensors. In particular, the args after
the qparams (axis, qmin, qmax, dtype) are all literals. For
these literal args we simply used the hardcoded ones
(0, -127, 127, torch.int8 respectively), even if the user
explicitly specified to use a different weight dtype. This
commit fixes this by copying over these literal args for the
per channel case as well.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_per_channel_weight_custom_dtype

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112612
Approved by: https://github.com/jerryzh168
2023-11-07 20:10:53 +00:00
Aaron Gokaslan
8219bf051b [BE]: Apply RUF015 to torch folder (#113025)
Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-11-07 00:48:15 +00:00
andrewor14
b6e85eb8d5 [quant][pt2] Support quantized conv bias in QAT fusion (#112528)
Summary: Previously QAT fusion assumes bias is not quantized.
This works for the existing XNNPACKQuantizer, but not for custom
quantizers that wish to quantize the bias. This commit supports
this by adding the necessary patterns. This requires refactoring
the code, however, since it previously assumed that there will
only be one pair of q-dq (from conv weight) in the matched
pattern, and this is no longer true.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50856377](https://our.internmc.facebook.com/intern/diff/D50856377)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112528
Approved by: https://github.com/jerryzh168
2023-11-06 17:58:57 +00:00
leslie-fang-intel
6ba2748690 [Quant] [PT2] Enable Decomposed quant per tensor/channel to accept bfloat16 input (#112225)
**Summary**
- PR 4 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Enable `decomposed quant_per_tensor` and `quant_per_channel` accepts bfloat16 input.

**TestPlan**
```
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_tensor_bfloat16_input
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_channel_bfloat16_input
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112225
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-03 23:47:43 +00:00
leslie-fang-intel
871e27a61c [Quant] [PT2] Remove the output Annotation of Conv/Linear in x86InductorQuantizer (#112140)
**Summary**
- PR 3 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Remove the output annotation of QConv/QLinear in X86InductorQuantizer.

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d
python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear
python -m pytest test_x86inductor_quantizer.py -k Conv2d
python -m pytest test_x86inductor_quantizer.py -k Linear
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112140
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #112010, #112126
2023-11-03 08:24:55 +00:00
leslie-fang-intel
6c19de07cd [Quant] [PT2] Add ConvBNAdd(ReLU) Annotation into X86InductorQuantizer (#111281)
**Summary**
This PR adds ConvBNAdd(ReLU) QAT Annotation into `X86InductorQuantizer`.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_unary_with_quantizer_api
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add_relu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111281
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #111280
2023-11-02 02:05:49 +00:00
leslie-fang-intel
56ca0043f6 [Quant] [PT2] Enable QAT Quantization flow in X86InductorQuantizer (#111280)
**Summary**
This PR enables PT2 QAT Quantization flow in `X86InductorQuantizer`.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary_with_quantizer_api
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_relu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111280
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-02 02:03:10 +00:00
andrewor14
5cd1208415 [quant][pt2][be] Refactor QAT q-dq patterns (#112279)
Summary: This commit refactors q-dq patterns used in QAT fusion,
reducing code duplication. This is important for future efforts
to support quantizing bias.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112279
Approved by: https://github.com/jerryzh168
ghstack dependencies: #112159
2023-10-31 18:04:23 +00:00
andrewor14
231129ea36 [quant][pt2] Fix QAT conv-bn bias derived qspec (#112159)
Summary: Today, we have special handling for special qspecs like
`SharedQuantizationSpec` or `DerivedQuantizationSpec`, since these
qspecs refer to other nodes in the graph and these node references
need to be updated after replacement (since they referred to nodes
in the original graph that no longer exist in the new graph).

However, we only do the above for special nodes like conv, bn,
getitem, and relu. This doesn't cover the common use case of
having conv bias derive its qparams from those of conv input
activations and conv weight. This commit adds support for this
use case by also replacing the node references for these nodes.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50697078](https://our.internmc.facebook.com/intern/diff/D50697078)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112159
Approved by: https://github.com/jerryzh168
2023-10-31 18:04:23 +00:00
Jerry Zhang
3db0095ea2 [reland][quant][pt2e][be] Cleanup observer insertion logic (#111828) (#112453)
Summary: att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers

Test Plan:
contbuild & OSS CI, see bf998a2c5d

Test plan from GitHub:
python test/test_quantization.py TestQuantizePT2E

CIs

Differential Revision: D50816224

Pulled By: jerryzh168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112453
Approved by: https://github.com/andrewor14
2023-10-31 17:33:24 +00:00
PyTorch MergeBot
797d7100de Revert "[quant][pt2e][be] Cleanup observer insertion logic (#111828)"
This reverts commit bf998a2c5d.

Reverted https://github.com/pytorch/pytorch/pull/111828 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111828#issuecomment-1782154648))
2023-10-27 01:35:27 +00:00
Jerry Zhang
bf998a2c5d [quant][pt2e][be] Cleanup observer insertion logic (#111828)
Summary:
att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers

Test Plan:
python test/test_quantization.py TestQuantizePT2E

CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111828
Approved by: https://github.com/kimishpatel
ghstack dependencies: #111827
2023-10-25 03:48:36 +00:00
Jerry Zhang
6e2dfb360b [quant][be] Clean up prepare code (#111827)
Summary:
att

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111827
Approved by: https://github.com/andrewor14
2023-10-25 00:14:59 +00:00
Kimish Patel
a8760f1b42 [Quantization] Add a test for QAT + PTQ selective quantization in (#111689)
xnnpack quantizer

Summary:
For some workflows you want to quantize some parts of the model via qat
and then continue eager mode training. After training, you want to
export the whole model and perform PTQ on the rest.

Test Plan:
test added

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D50510480](https://our.internmc.facebook.com/intern/diff/D50510480)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111689
Approved by: https://github.com/jerryzh168
2023-10-24 23:25:38 +00:00
Jerry Zhang
43c211facb [quant][pt2e] Actually support transitive sharing for SharedQuantizationSpec (#111172)
Summary:
Previously we actually did not really support this, this PR added the support.

Next
* clean up insert observer logic
* add allow_transitive_sharing boolean flag to allow people to turn this op for certain edges

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_shared_qspec_transitivity

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D50250789](https://our.internmc.facebook.com/intern/diff/D50250789)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111172
Approved by: https://github.com/kimishpatel
2023-10-20 23:25:17 +00:00
Aaron Gokaslan
1ad0f0b308 [BE]: remove unnecessary enumerate calls (#111690)
Remove unnecessary enumerate calls, entirely automated fixes so probably reasonably low risk.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111690
Approved by: https://github.com/malfet
2023-10-20 23:20:29 +00:00
andrewor14
e4e7d34fe9 [pt2][quant] Clean up QAT get conv-bn-relu nodes (#111515)
Summary: Reduces duplicate code to map original matched nodes
to replacement nodes.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111515
Approved by: https://github.com/jerryzh168
2023-10-20 20:01:38 +00:00
Aaron Gokaslan
a0632389b7 [BE]: Update lintrunner mypy to 1.6.0 (#111375)
Follow up to #111305 that updates lintrunner's version too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111375
Approved by: https://github.com/malfet
2023-10-17 01:22:06 +00:00
Jerry Zhang
d589106bcd [quant][pt2e] Disable remove_qconfig (#111000)
Summary:
This is a hacky flag that we had before in fx flow, and we don't want this in the new flow

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111000
Approved by: https://github.com/andrewor14
2023-10-11 19:43:46 +00:00
andrewor14
0e551bbcd7 [quant][pt2] Preserve source_fn_stack after QAT fusion (#110899)
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899
Approved by: https://github.com/jerryzh168
2023-10-11 02:55:52 +00:00
Kazuaki Ishizaki
b5f9696d81 Fix typo under torch directory (#110824)
This PR fixes typo `the the` of comments and exception messages in files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824
Approved by: https://github.com/H-Huang
2023-10-09 19:16:43 +00:00
Jeff Daily
e8f1f4ed66 [quant][pt2][ROCm] follow-up PR 109908 for miopen_batch_norm (#110653)
Fixes recent broken unit tests caused by PR #109908 because cudnn and miopen have separate batch norm functions.

```
2023-10-05T09:35:01.6606614Z _______________ TestQuantizePT2EQAT.test_qat_conv_bn_fusion_cuda _______________
2023-10-05T09:35:01.6606948Z Traceback (most recent call last):
2023-10-05T09:35:01.6607362Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 323, in test_qat_conv_bn_fusion_cuda
2023-10-05T09:35:01.6607767Z     self._verify_symmetric_xnnpack_qat_graph(
2023-10-05T09:35:01.6608217Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 130, in _verify_symmetric_xnnpack_qat_graph
2023-10-05T09:35:01.6608658Z     self._verify_symmetric_xnnpack_qat_graph_helper(
2023-10-05T09:35:01.6609105Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 173, in _verify_symmetric_xnnpack_qat_graph_helper
2023-10-05T09:35:01.6609623Z     m = prepare_qat_pt2e(m, quantizer)
2023-10-05T09:35:01.6610171Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/quantize_pt2e.py", line 178, in prepare_qat_pt2e
2023-10-05T09:35:01.6610561Z     _fuse_conv_bn_qat(model)
2023-10-05T09:35:01.6611072Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 501, in _fuse_conv_bn_qat
2023-10-05T09:35:01.6611497Z     m = _fuse_conv_bn_qat_helper(m, is_cuda=True)
2023-10-05T09:35:01.6612065Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 575, in _fuse_conv_bn_qat_helper
2023-10-05T09:35:01.6612492Z     _get_conv_bn_getitem_nodes(r.replacements)
2023-10-05T09:35:01.6613058Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 383, in _get_conv_bn_getitem_nodes
2023-10-05T09:35:01.6613465Z     assert bn_node is not None
2023-10-05T09:35:01.6613716Z AssertionError
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110653
Approved by: https://github.com/jerryzh168, https://github.com/pruthvistony
2023-10-06 15:30:55 +00:00
Jerry Zhang
7b6042111f [quant][pt2e] Refactor conv related annotation for XNNPACKQuantizer (#110308)
Summary:
Since we changed IR that we are working with to pre autograd aten IR, it's easier
to use plain pattern match instead of relying on source_matcher_utils now, this
PR refactors the annotation for conv to use aten ops directly.

Also fixed reentrant test after this change.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110308
Approved by: https://github.com/kimishpatel
2023-10-05 22:36:18 +00:00
Andrew Or
7c72238e4b Back out "Enable pickling model prepared with QAT qconfig" (#110392)
Summary:
D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out.

we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday.

Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes.

Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392
Approved by: https://github.com/junesg, https://github.com/jerryzh168
2023-10-05 14:41:00 +00:00
andrewor14
62cad5b5b0 [quant][pt2] Support cudnn_batch_norm in QAT fusion (#109908)
Summary: Today, we get different batch norm ops depending on
the device the model is placed on at export time. Exporting
`model.cpu()` gives `_native_batch_norm_legit`, while exporting
`model.cuda()` gives `cudnn_batch_norm`. QAT fusion currently
only supports the former and silently ignores the latter. This
commit fixes this by additionally matching on the latter op
during QAT fusion.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_fusion
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_relu_fusion

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D49615145](https://our.internmc.facebook.com/intern/diff/D49615145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109908
Approved by: https://github.com/jerryzh168
2023-10-05 04:08:44 +00:00
Fabrice Pont
053367b1ed fix: flake8-bugbear code B024 (#107265)
See #106571 item B024

This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes.

Should I also include PEP8 compliant reformatting on the files I had to modify ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265
Approved by: https://github.com/kit1980
2023-10-04 23:52:52 +00:00
Max Ren
08c7dcda65 [pt2e][xnnpack_quantizer] quantize "mul" (#110428)
Adding "mul" to list of partitions that are supported by the quantizer. This shows up in EDSR, where we still want to quantize the mul op

Differential Revision: [D49850151](https://our.internmc.facebook.com/intern/diff/D49850151/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110428
Approved by: https://github.com/jerryzh168
ghstack dependencies: #110427
2023-10-04 05:11:53 +00:00
Max Ren
66202ed29c [pt2e][xnnpack_quantizer] add util function to convert scalars to attrs (#110427)
Jerry provided a notebook solution for converting scalars to attrs so that they may be properly quantized:

https://fburl.com/anp/kzz7tfn1

Adding this pass as a util function in xnnpack_quantizer_utils.py

Differential Revision: [D49850150](https://our.internmc.facebook.com/intern/diff/D49850150/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110427
Approved by: https://github.com/jerryzh168
2023-10-04 05:11:53 +00:00
Jerry Zhang
c9b8e06060 [quant] Enable quantization for wav2letter (#109830)
Summary:
Also added annotation support for conv1d_relu and conv1d in XNNPACKQuantizer, the quantized results still
matches fx quant path (didn't quantize conv1d) so tests are not disabled

Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=w2l --verify

Differential Revision: D49479546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109830
Approved by: https://github.com/kimishpatel
2023-09-29 00:47:34 +00:00
Jerry Zhang
e3eb1d92d8 [quant][docs] Add documentation for prepare_pt2e, prepare_qat_pt2e and convert_pt2e (#110097)
Summary:
att

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110097
Approved by: https://github.com/kimishpatel
2023-09-28 18:24:58 +00:00
Sindi Shkodrani
419ec3b229 Enable pickling model prepared with QAT qconfig (#109288)
Summary:
Resolving error:

AttributeError: Can't pickle local object '_add_module_to_qconfig_obs_ctr.<locals>.get_factory_kwargs_based_on_module_device'

by moving nested function out to the main module

Test Plan: Added test to CI

Reviewed By: andrewor14

Differential Revision: D49187352

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109288
Approved by: https://github.com/andrewor14
2023-09-28 09:51:19 +00:00
Jerry Zhang
1b51d29b66 [quant][pt2e] Enable constant folding for quantize ops (#109343)
Summary:
This PR added constant folding for quantize ops so that instead of storing fp32 weight in the
quantized model, we'll get int8/int16 etc. weight

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_fold_quantize

also will verify in executorch later

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343
Approved by: https://github.com/kimishpatel, https://github.com/jgong5
2023-09-27 06:04:45 +00:00
Jiaxu Zhu
595af261b2 [ao] Support Subclasses of FloatFunctional in eager mode prepare (#109646)
Summary: As title, if a module is subclassing `nnq.FloatFunctional`, also adding observers to it like `nnq.FloatFunctional`

Test Plan: CI

Differential Revision: D49431968

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109646
Approved by: https://github.com/jerryzh168
2023-09-20 08:09:55 +00:00
Kimish Patel
73ac814148 [Pytorch][quant] Move xnnpack quantizer to use aten.linear (#109254)
Summary:
Now that quantization works on pre-dispatch aten IR, moving to full set
of aten ops is ok. Plus when tracing models like ViT, the linear
projections of of k, q, v uses functional.linear and not nn.Linear,
which results not being able to extract nodes corresponding to linear.

Test Plan:
quant tests

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49252194](https://our.internmc.facebook.com/intern/diff/D49252194)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109254
Approved by: https://github.com/jerryzh168
2023-09-18 20:26:44 +00:00
Jerry Zhang
3943afc94e [quant][be] Remove unused APIs (#109342)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109342
Approved by: https://github.com/kimishpatel, https://github.com/andrewor14
2023-09-15 16:07:01 +00:00
Jerry Zhang
41e2189843 [quant] Remove reference representation rewrite for adaptive_avg_pool2d (#108924)
Summary:
integer adaptive_avg_pool2d is not well defined due to different possible ways of rounding fp32 value to integer value, and
this op isn't too critical for numerics (since it appears not too often), so we'll skip this for now.

we might need to revert the changes that adds integer impl for adaptive_avg_pool op as well

Test Plan:
python test/test_quantization.py TestQuantizePT2ERepresentation

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108924
Approved by: https://github.com/kimishpatel
2023-09-14 10:18:36 +00:00
Jerry Zhang
cf26e5575d [quant][be] Reduce warnings in tests (#108922)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108922
Approved by: https://github.com/andrewor14
ghstack dependencies: #108920, #108921
2023-09-12 21:54:33 +00:00
Jerry Zhang
b01b934aca [quant][be] Cleanup xnnpack_quantizer implementation (#108921)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108921
Approved by: https://github.com/andrewor14
2023-09-12 19:28:41 +00:00
Jerry Zhang
241e84bf98 [quant][be] Rewrite xnnpack_quantizer_utils.py to use decorators (#108920)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108920
Approved by: https://github.com/kimishpatel
2023-09-12 00:09:13 +00:00
Andrew Or
e8a402c56e [quant][pt2] Fix and rename move_model_to_eval (#108891)
Summary:
This commit fixes two silent correctness problems with
the current implementation of `move_model_to_eval`:

(1) Previously the user had to manually call `eliminate_dead_code`
before calling `move_model_to_eval`, otherwise the dropout pattern
won't actually get eliminated. This is because subgraph rewriter
complains the match is not self-contained, and so silently does
not do the replacement.

(2) We wish to error when the user calls `model.train()` or
`model.eval()` on an exported model. This error is raised
correctly immediately after export today, but no longer raised
after the user calls prepare or convert.

We fix (1) by moving the `eliminate_dead_code` call into
`move_model_to_eval`, and fix (2) by ensuring the respective
errors are thrown after prepare and convert as well.

Additionally, this commit renames `move_model_to_eval` to
`move_exported_model_to_eval` to be more explicit.

bypass-github-export-checks

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_disallow_eval_train
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_to_eval

Imported from OSS

Differential Revision: D49097293

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108891
Approved by: https://github.com/jerryzh168
2023-09-11 15:37:01 +00:00
Jerry Zhang
b0de6a8002 [quant][executorch] Support inception_v4 in examples (#108382)
Summary: Verified that pt2e quant flow matches the fx flow with executorch backend config

Test Plan:
with-proxy buck2 run executorch/examples/quantization:example -- -m=ic4 --verify

```
[INFO 2023-08-31 16:08:06,923 example.py:77] prepare sqnr: inf
[INFO 2023-08-31 16:08:06,932 example.py:81] quant diff max: 0.0
[INFO 2023-08-31 16:08:06,936 example.py:85] quant sqnr: inf
```

full output: https://www.internalfb.com/intern/paste/P818520579/

Differential Revision: D48889075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108382
Approved by: https://github.com/kimishpatel
2023-09-08 17:39:31 +00:00
Paul Zhang
51c2b587c9 Back out "[PyPer][BE] Fix test_scripted_module in StatCollector" (#108588)
Differential Revision: D48908507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108588
Approved by: https://github.com/jerryzh168
2023-09-08 14:33:58 +00:00
Kimish Patel
c1877e99c5 [Quant] Move to BFS instead of DFS to check for connectedness (#108572)
Summary:
Using dfs to check if two nodes are connecgted is making it very slow.
Use of BFS makes it much faster.

Test Plan:
https://gist.github.com/leslie-fang-intel/9cd828623f567a3afbf41564d3546398

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48971710](https://our.internmc.facebook.com/intern/diff/D48971710)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108572
Approved by: https://github.com/jerryzh168, https://github.com/osalpekar
2023-09-07 00:26:28 +00:00
Jerry Zhang
32a16d4999 [quant][pt2e] Support int16 quantization (#108453)
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
2023-09-06 19:31:20 +00:00
Kimish Patel
ffc0c46092 [Quantization] Add metadata porting for nodes added by quantization (#107107)
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107107
Approved by: https://github.com/jerryzh168
ghstack dependencies: #107105, #107106, #107899, #107900
2023-09-02 06:38:14 +00:00
Kimish Patel
eb67c452c8 [Quant] Add DQ duplication pass (#107900)
Summary:
During convert step observers are first replaced by Q-DQ pair. In some
scenarios like following output DQ has a fan out.

                 ---> OP2 -> Q -> DQ
                /
OP -> Q -> DQ -
                \
                 ---> OP3 -> Q -> DQ

If either op OP2 or OP3 are configured to be quantized, then the input
is expected to quantized. In this case quantized equivalent of some
pattern, that quantizer asked to be quantized, should look like:
[DQ -> {pattern} -> Q]. However, in scenario like above where DQ node
is shared between multiple "quantized" patterns, boundary of "quantized"
pattern is not clear because DQ now belongs to multiple quantized
patterns.

This poses challenge for:
- Porting metadata: which "quantized" partition this DQ node belongs
- Quantized representation, equivalently, needs to identify
self-contained quantized pattern that is replaced by its equivalent pattern
that captures compute in the quantized precision.

Test Plan:
test_duplicate_dq_pass

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel
ghstack dependencies: #107105, #107106, #107899
2023-09-02 06:20:03 +00:00
Kimish Patel
f8d1ca9835 [Quant] Bug fix (#107899)
Summary:
When two layers are quantized differently, observer map update updates
map for key (observed_node, node), whereas it should really be
(original_input, node)

Test Plan:
Test in the next diff adds a test where it otherwise fails

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48663145](https://our.internmc.facebook.com/intern/diff/D48663145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107899
Approved by: https://github.com/jerryzh168
ghstack dependencies: #107105, #107106
2023-09-02 06:20:03 +00:00
Kimish Patel
37b0d76e35 [Quantization] Make annotation util functions return annotated nodes (#107106)
Summary:
Having annotation functions return nodes that are annotated is useful
specifically for adding "quantization_tag" to those nodes

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488415](https://our.internmc.facebook.com/intern/diff/D48488415)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107106
Approved by: https://github.com/jerryzh168
ghstack dependencies: #107105
2023-09-02 06:19:55 +00:00
Kimish Patel
99168c1fa9 [Quant] Use input_qspec_map for weight quantization of linear (#107105)
Summary:
In prepararation for metadata porting diff, it is required that weight
quant annotation happens via edge quantization, i.e. input_qspec_map.

Reason: Metadata is ported via associating DQ node's metadata with its
consumer while associating Q node's metadata with its producer.
Furthermore, such porting must be qualified via user intent to see if
the consumder of DQ, or producer of Q, actually specified intent of
quantization

By making quantization annotation on linear node's weight via
input_qspec_map, we can enable associating DQ of [weight -> Q -> DQ],
with the linear module.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488414](https://our.internmc.facebook.com/intern/diff/D48488414)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107105
Approved by: https://github.com/jerryzh168
2023-09-02 06:19:50 +00:00
Paul Zhang
4a9c6f1b73 [PyPer][BE] Fix test_scripted_module in StatCollector (#108232)
Summary: D41985889 removed the cast to int for the inputs to torch.histc below, allowing the inputs to still be tensors. These tensors still have require_grad_ set to True, causing issues with the call to torch.histc.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/modules/low_level_modules/tests:stat_collector_test -- --exact 'dper3/dper3/modules/low_level_modules/tests:stat_collector_test - test_scripted_module (dper3.dper3.modules.low_level_modules.tests.stat_collector_test.StatCollectorTest_1)'

Differential Revision: D48800879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108232
Approved by: https://github.com/jerryzh168
2023-09-01 04:23:57 +00:00
Jerry Zhang
a9fe0b5b74 [quant][pt2e] Move propagate_annotation from quant flow to quantizer (#108320)
Summary:
Previously we run propagate_annotation by default in quantization flow to propagate annotations for ops like reshape, view etc.

Not all quantizers would need this so we moved this to xnnpack_quantizer_utils for now.

Next Step:
* make propagate_annotation function configurable with a custom list of ops
* remove unneeded ops in `_is_share_obs_or_fq_op`

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48856985](https://our.internmc.facebook.com/intern/diff/D48856985)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108320
Approved by: https://github.com/kimishpatel
2023-09-01 01:49:19 +00:00
leslie-fang-intel
6c342ec368 Revert PR-107951 to only support new graph capture API in Quantization (#108317)
**Summary**
Revert the changes in https://github.com/pytorch/pytorch/pull/107951 to make the utils function only support graph captured by `capture_pre_autograd_graph`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108317
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #108214
2023-09-01 00:47:10 +00:00
leslie-fang-intel
fb808c30c7 x86_inductor_quantizer switches to new graph capture API (#108214)
**Summary**
Update `X86InductorQuantizer` and related testcase to the new graph capture API `capture_pre_autograd_graph`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108214
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-09-01 00:43:45 +00:00
andrewor14
057b807178 [quant] Move dropout replacement to move_model_to_eval (#108184)
Summary: This commit adds a public facing
`torch.ao.quantization.move_model_to_eval` util function
for QAT users. Instead of calling model.eval() on an exported
model (which doesn't work, see
https://github.com/pytorch/pytorch/issues/103681), the user
would call this new util function instead. This ensures special
ops such as dropout and batchnorm (not supported yet) will have
the right behavior when the graph is later used for inference.

Note: Support for an equivalent `move_model_to_train` will be
added in the future. This is difficult to do for dropout
currently because the eval pattern of dropout is simply a clone
op, which we cannot just match and replace with a dropout op.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_model_to_eval

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D48814735](https://our.internmc.facebook.com/intern/diff/D48814735)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108184
Approved by: https://github.com/jerryzh168
2023-08-30 16:33:17 +00:00
Jerry Zhang
147b3495e2 [quant][pt2e] Add reference representation for dynamic quantized linear (#108073)
Summary: att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_dynamic_linear
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e -- 'test_representation_dynamic_linear'

Reviewed By: kimishpatel

Differential Revision: D48703076

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108073
Approved by: https://github.com/andrewor14
2023-08-29 07:12:55 +00:00
Jerry Zhang
9ae3d7ca90 [reland][quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930) (#107992)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107992
Approved by: https://github.com/digantdesai, https://github.com/mcr229
2023-08-27 14:50:03 +00:00
Xia, Weiwen
e9b0f62a19 [Quant][PT2E] Enable linear and linear-unary post-op quant recipe for x86 inductor quantizer (#106781)
**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106781
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #105818
2023-08-27 10:50:17 +00:00
leslie-fang-intel
c85c5954f2 [Quant][PT2E]Make _fuse_conv_bn_ support graph capture by torch._dynamo.export (#107951)
**Summary**
The latest check-in a0cfaf0688 for the conv-bn folding assumes the graph is captured by the new graph capture API `torch._export.capture_pre_autograd_graph`. Since we still need to use the original graph capture API `torch._dynamo_export` in 2.1 release. So, this check-in made negative impact to workloads' performance heavily. Made this PR to fix this issue by trying to make the conv-bn folding function workable with both new and original graph capture API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107951
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #106836, #106838, #106958
2023-08-26 17:19:41 +00:00
leslie-fang-intel
1147a28b0b [Quant][PT2E] Add cat and avg_pool2d recipe into x86InductorQuantizer (#106836)
**Summary**
Add `cat` and `avg_pool2d` quantization recipe as input output share observer into `x86InductorQuantizer`.

**Test Plan**
```
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_same_inputs
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_single_input
clear && python -m pytest test_x86inductor_quantizer.py -k test_avg_pool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106836
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-26 16:51:13 +00:00
Jerry Zhang
15d4dedbbf [quant][pt2e] Add reference representation rewrite for statically quantized linear (#107994)
Summary: att

Test Plan:
```
python test/test_quantization.py TestQuantizePT2E.test_representation_linear
buck2 test 'fbcodemode/opt' fbcodecaffe2/test:quantization_pt2e -- 'test_representation_linear'
```

Reviewed By: kimishpatel

Differential Revision: D48674862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107994
Approved by: https://github.com/mcr229, https://github.com/guangy10
2023-08-26 15:39:52 +00:00
leslie-fang-intel
70ca18f8a0 [Quant][PT2E] Enable X86InductorQuantizer single quantizable op(maxpool2d) (#105639)
**Summary**
In this PR, we mainly enable 2 things.

- Enable the skeleton of quantization recipe for single quantizable operators in `X86InductorQuantizer`.
- Add quantization recipe of `maxpool2d` and annotate it as input./output share observer.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_maxpool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105639
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456
2023-08-26 08:34:15 +00:00
andrewor14
240bdbea61 [quant][pt2e] Fix annotation for conv no bias case (#107971)
Summary: This fixes the no bias case for conv annotations.
Previously this would result in an index out of bounds, since
the new aten.conv2d op may not have the bias arg (unlike the
old aten.convolution op). This was not caught because of a lack
of test cases, which are added in this commit.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_no_bias
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_fusion_no_conv_bias

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel

Differential Revision: [D48696874](https://our.internmc.facebook.com/intern/diff/D48696874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107971
Approved by: https://github.com/jerryzh168
2023-08-26 01:01:54 +00:00
Jerry Zhang
f92f69dbfb [quant][pt2e] Enable testing for reference quant model representations (#107474)
Summary:
Previously these tests were disabled due to time out in dynamo export in fbcode,
this might have been resolved, so trying to enable again

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48619072](https://our.internmc.facebook.com/intern/diff/D48619072)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107474
Approved by: https://github.com/andrewor14
2023-08-26 00:37:45 +00:00
PyTorch MergeBot
8d44b0f5a5 Revert "[quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)"
This reverts commit 1d1739dc6d.

Reverted https://github.com/pytorch/pytorch/pull/107930 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107930#issuecomment-1694069330))
2023-08-26 00:37:02 +00:00
Jerry Zhang
1d1739dc6d [quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107930
Approved by: https://github.com/kimishpatel
2023-08-25 23:36:19 +00:00
leslie-fang-intel
1374974d60 [Quant][Inductor] Enable quantization conv_binary(add/add_relu) pattern fusion inside inductor (#105456)
**Summary**
Enable the `dequant-conv2d-binary_postop(add)-unary_postop(relu)-quant` pattern fusion and lowering inside inductor.

**Test Plan**
```
clear && python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_binary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105456
Approved by: https://github.com/jgong5, https://github.com/eellison
ghstack dependencies: #104580, #104581, #104588, #104590, #105455
2023-08-25 21:16:02 +00:00
Jerry Zhang
a0cfaf0688 [quant][pt2e] Make sure XNNPACKQuantizer works with the pre_dispatch=True (#107872)
Summary: att

Test Plan:
```
buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18

buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D48415977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107872
Approved by: https://github.com/andrewor14
2023-08-25 05:04:01 +00:00
Kimish Patel
2fbe6ef2f8 [pytorch][Quant] Fix bias quant bug (#107810)
Summary: Bias should be quantized by act_scale * weight_scale in conv and linear

Test Plan: Rewrite tests

Differential Revision: D48606828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107810
Approved by: https://github.com/jerryzh168
2023-08-24 23:44:19 +00:00
Jerry Zhang
16fcb07846 [quant][pt2e] Add support for channel in DerivedQuantizationSpec (#107833)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_derived_qspec_per_channel

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48630535](https://our.internmc.facebook.com/intern/diff/D48630535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107833
Approved by: https://github.com/andrewor14
2023-08-24 07:45:13 +00:00
Sherlock Huang
ee4b99cc3a Decomp for aten.dropout (#106274)
When exporting dropout with cpu tensor, we get following graph module
```
    class GraphModule(torch.nn.Module):
        def forward(self, arg0_1: f32[512, 10]):
            empty_memory_format: f32[512, 10] = torch.ops.aten.empty.memory_format([512, 10], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False, memory_format = torch.contiguous_format)
            bernoulli_p: f32[512, 10] = torch.ops.aten.bernoulli.p(empty_memory_format, 0.9);  empty_memory_format = None
            div_scalar: f32[512, 10] = torch.ops.aten.div.Scalar(bernoulli_p, 0.9);  bernoulli_p = None
            mul_tensor: f32[512, 10] = torch.ops.aten.mul.Tensor(arg0_1, div_scalar);  arg0_1 = div_scalar = None
            return (mul_tensor,)
```

In addition, if we export with eval() mode, we will have an empty graph.

However, when exporting with cuda tensor, we got
```
    class GraphModule(torch.nn.Module):
        def forward(self, arg0_1: f32[512, 10]):
            native_dropout_default = torch.ops.aten.native_dropout.default(arg0_1, 0.1, True);  arg0_1 = None
            getitem: f32[512, 10] = native_dropout_default[0];  native_dropout_default = None
            return (getitem,)
```
and exporting under eval() mode will still have a dropout node in graph.

This PR make exporting with CPU tensor also produce aten.native_dropout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106274
Approved by: https://github.com/ezyang
2023-08-23 21:12:37 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
Jerry Zhang
28be2c674a [quant][pt2e] Move specific quantizer related things outside of main quant code base (#106806) (#107259)
Summary:

Currently in quantizer/quantize_pt2e we import things from specific quantizers (XNNPACKQuantizer, QuantizationConfig) etc.
this PR removes them so it's clearer that they are not part of the core quantization code base

This PR also removed get_supported_operators from main Quantizer since we haven't seen a clear need for this API

Test Plan:
CIs

Imported from OSS

Differential Revision: D48340367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107259
Approved by: https://github.com/kimishpatel
2023-08-18 21:29:09 +00:00
Jerry Zhang
d3c4ec767b [quant][pt2e] Fix handling for SharedQuantizationSpec (#106922)
Summary:
Previously if we have:
```
conv1 -> cat
conv2  /
```
and configure output of conv1/conv2 to be int8 quantized, and cat also int8 quantized and with shared inputs,
it will not produce expected results (input of cat will not be shared)

The problem is that there is some missing checks when inserting observers for input for cat

This PR fixes the problem.

Fixes: https://github.com/pytorch/pytorch/issues/106760
Test Plan:
python tes/test_quantization.py TestQuantzePT2E.test_shared_qspec

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106922
Approved by: https://github.com/kimishpatel
2023-08-16 21:16:45 +00:00
Jerry Zhang
4afab40b56 [quant][pt2e] Removed mean/hardtanh annotations and refactored adaptive_avg_pool annotation (#106805)
Summary:
Removed annotations for some ops, since they are handled in torch/ao/quantization/pt2e/_propagate_annotation.py

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106805
Approved by: https://github.com/kimishpatel
2023-08-10 04:51:06 +00:00
Jerry Zhang
97ce979e5d [quant][pt2e] Add reference representation for quantized conv2d (#105784)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_quantize_dequantize_per_channel

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105784
Approved by: https://github.com/kimishpatel
ghstack dependencies: #105783
2023-08-09 22:41:35 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
a44c072c89 Make InternalModel and Resnet work with rexportable flow (#106676)
Summary: Internal model and Resnet uses "re-export" flow now. Also did some refactoring to make the code little cleaner

Some changes for OSS:
1. Correctly use the "cached" fake tensors so that static symbols are still resolved to static
2. Change logic in PassBase to allocate static shapes for parameters
3. Add "is_torch_exported" tag to every node to make it survive during various graph transformations.
4. Added experimental wrapper API for quantization team to get pre_dispatch=True graph. Note that it doesn't actually do that right now. But we plan to switch soon.

Test Plan: CI

Differential Revision: D47890878

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106676
Approved by: https://github.com/jerryzh168
2023-08-09 20:10:48 +00:00
Jerry Zhang
e1a1780626 [quant][pt2e] Move annotate functions in XNNPACKQuantizer to utils (#106642)
Summary:
This is to allow sharing these annotate functions by other quantizers so that writing a new quantizer is easier

note that these annotation functions will be maintained by XNNPACKQuantizer developers instead of AO team

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106642
Approved by: https://github.com/andrewor14
2023-08-09 18:52:39 +00:00
Jerry Zhang
69ecad6f2b [quant][pt2e] Add reference representation for quantize_per_channel and dequantize_per_channel (#105783)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_quantize_dequantize_per_channel

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105783
Approved by: https://github.com/kimishpatel
2023-08-09 01:39:52 +00:00
Jiaxu Zhu
9e35df4adc [pytorch][ao] force weight observer/fake_quant to be on the same device as the weight tensor (#106755)
Summary:
As title.
There's a corner case where both cpu and gpu are avaiable, although the model is moved to cpu, the newly created PTQ weight observer is still on gpu. Therefore, during the convert, this line will fail https://fburl.com/4rhipfvb

Test Plan: CI

Differential Revision: D48141494

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106755
Approved by: https://github.com/jerryzh168
2023-08-09 00:22:49 +00:00
Jerry Zhang
2156f0434c [quant][pt2e] Add reference representation for quantized adaptive_avg_pool2d (#105709)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_adaptive_avg_pool2d

Although right now it is not really testing things since there is some problem with dynamo export

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105709
Approved by: https://github.com/andrewor14
ghstack dependencies: #105708
2023-08-04 18:49:14 +00:00
Jerry Zhang
9e301949ec [quant][pt2e] Add reference representation for quantized max_pool2d (#105708)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_maxpool2d

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105708
Approved by: https://github.com/andrewor14
2023-08-04 08:19:52 +00:00
Jerry Zhang
820e68b58a [quant][pt2e] Add reference representation for quantized add - relu (#105707)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_add_relu

Although right now it is not really testing things since there is some problem with dynamo export
Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105707
Approved by: https://github.com/andrewor14
2023-08-03 00:42:06 +00:00
Jerry Zhang
d528a137e0 [quant][pt2e][quantizer] Suppoert set_module_type in XNNPACKQuantizer (#106094)
Summary:
Added support to allow users to set configurations based on module type in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_type

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106094
Approved by: https://github.com/andrewor14
ghstack dependencies: #106087
2023-08-02 08:33:58 +00:00
Leon
850ad54139 correct spelling mistake (#106309)
Fixes #ISSUE_NUMBER
correct spelling mistake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106309
Approved by: https://github.com/kit1980
2023-08-02 04:38:23 +00:00
Jerry Zhang
92a22a8098 [quant][pt2e][quantizer] Suppoert set_module_name in XNNPACKQuantizer (#106087)
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
2023-08-02 01:19:23 +00:00
PyTorch MergeBot
93b2036bef Revert "[quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)"
This reverts commit 3ca71ed735.

Reverted https://github.com/pytorch/pytorch/pull/105894 on behalf of https://github.com/huydhn due to breaking executorch tests internally ([comment](https://github.com/pytorch/pytorch/pull/105894#issuecomment-1654831950))
2023-07-28 01:16:02 +00:00
Edward Z. Yang
7b9d250f06 Change _dynamo.export to be export(f)(*args, **kwargs) (#106109)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106109
Approved by: https://github.com/voznesenskym
2023-07-27 21:41:13 +00:00
Jerry Zhang
3ca71ed735 [quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)
Summary:
Currently scale/zero_point for per tensor quant is stored as burnt in literals, this means these values can't be serialized in state_dict, this
PR changes them to buffers/Tensors so that they can be serialized

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D47770963](https://our.internmc.facebook.com/intern/diff/D47770963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105894
Approved by: https://github.com/kimishpatel
2023-07-26 20:15:06 +00:00
Jerry Zhang
3a77f9aaaf [quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885)
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations

Test Plan:
CIs

sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"

Differential Revision: D47727838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
2023-07-26 18:20:09 +00:00
Jerry Zhang
d767cff7c7 [quant][fx] Fix docs for prepare_fx/prepare_qat_fx (#105979)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/103661

Test Plan:
visual inspectation of docs https://pytorch.org/docs/2.0/generated/torch.ao.quantization.quantize_fx.prepare_fx.html#torch.ao.quantization.quantize_fx.prepare_fx

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105979
Approved by: https://github.com/andrewor14
2023-07-26 09:56:18 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Jerry Zhang
143c83d637 [quant][pt2e][be] Remove unneeded code (#105676)
Summary:
att

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105676
Approved by: https://github.com/andrewor14
2023-07-21 00:51:22 +00:00
Jerry Zhang
dff4e034b8 [quant][pt2e][be] Rename qnnpack quantizer to xnnpack quantizer (#105551)
Summary: att

Test Plan: sandcastle CI and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422894

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105551
Approved by: https://github.com/andrewor14
2023-07-20 03:52:40 +00:00
Max Ren
bc6bca9d42 [XNNPACK][QS8] torch.slice (#105252)
Differential Revision: [D47487423](https://our.internmc.facebook.com/intern/diff/D47487423/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105252
Approved by: https://github.com/digantdesai
2023-07-19 23:36:02 +00:00
leslie-fang-intel
fa6be2fa6f [Quant][PT2E] Remove x86 inductor pt2e backend config (#105039)
**Summary**
For the Quantization PT2E path, we recommend to use `X86InductorQuantizer` instead of backend config of `x86_inductor_pt2e_backend_config`. Remove the `x86_inductor_pt2e_backend_config` and the relevant testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105039
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-07-19 23:18:29 +00:00
Justin Chu
c0d8a4af0a [BE] Enable ruff's UP rules and autoformat ao/ (#105430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430
Approved by: https://github.com/albanD, https://github.com/malfet
2023-07-19 13:44:37 +00:00
Jerry Zhang
554052f321 [quant][pt2e][be] Rename prepare_pt2e_quantizer to prepare_pt2e (#105484)
Summary: att

Test Plan: sandcastle and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422892

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105484
Approved by: https://github.com/andrewor14
2023-07-19 04:51:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
5666d20bb8 Add unlifting pass under private config (#104897)
Summary: We wanna do this little by little. For now, I tried only on DissectedPartsModel which needs to use aot_export version.

Test Plan: CI

Reviewed By: zhxchen17

Differential Revision: D46785735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104897
Approved by: https://github.com/JacobSzwejbka
2023-07-19 01:16:35 +00:00
maxren
88f1885ec9 [XNNPACK][QS8] torch.cat (#104800)
Differential Revision: [D47304143](https://our.internmc.facebook.com/intern/diff/D47304143/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104800
Approved by: https://github.com/digantdesai
2023-07-19 00:15:05 +00:00
Nikita Shulga
78829d6e07 Fix isinstance check in quat_utils (#105476)
Calling `isinstance(x, Tuple[Node, Node])` would either fail, or raise a
type error on a more modern Python, as none of the tuples are actually
instances of `Tuple`

```python
>>> from typing import Tuple
>>> from torch.fx import Node
>>> edge_or_node=(Node(None, "foo", "output", "foo", None, None), Node(None, "bar", "output", "bar", None, None))
>>> isinstance(edge_or_node, tuple) and len(edge_or_node) == 2 and all(isinstance(x, Node) for x in edge_or_node)
True
>>> isinstance(edge_or_node, Tuple[Node, Node])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/malfet/miniconda3/lib/python3.10/typing.py", line 994, in __instancecheck__
    return self.__subclasscheck__(type(obj))
  File "/Users/malfet/miniconda3/lib/python3.10/typing.py", line 997, in __subclasscheck__
    raise TypeError("Subscripted generics cannot be used with"
TypeError: Subscripted generics cannot be used with class and instance checks
```

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 40fa451</samp>

> _Fix type annotation_
> _Quantize nodes in the graph_
> _Autumn leaves falling_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105476
Approved by: https://github.com/jerryzh168
2023-07-18 21:16:05 +00:00
Jerry Zhang
ed2b9f1af1 [quant][pt2e] rename _quantize_pt2e to quantize_pt2e (#105377)
Summary: att

Test Plan: CIs

Reviewed By: andrewor14

Differential Revision: D47234357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105377
Approved by: https://github.com/andrewor14
2023-07-18 16:46:05 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
Jerry Zhang
7b4d080496 [quant][pt2e] Rename _pt2e to pt2e (#104668)
Summary:
X-link: https://github.com/pytorch/executorch/pull/3

att

Test Plan: Imported from OSS

Differential Revision: D47202807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668
Approved by: https://github.com/andrewor14
2023-07-15 06:34:17 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
3c5a494d7a Revert "Update mypy to 1.4.1 (#91983)"
This reverts commit 634659e262.

Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))
2023-07-14 15:59:16 +00:00
Jerry Zhang
90b50f0303 [quant][pt2e] change internal code to only import from _quantize_pt2e (#105162)
Summary: This is to make public api clear so that we can make implementation details change easier in the future

Test Plan: CIs

Differential Revision: D47445767

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105162
Approved by: https://github.com/andrewor14
2023-07-14 05:14:29 +00:00
Tuan Tran
85745cd3d9 Fix bug in fuse_modules (#105069)
Summary: This diff fixes the issue reported in https://github.com/pytorch/pytorch/issues/105063 and also related to internal caffe2 bug (reproduced error in internal fb pytorch: N3945540)

Test Plan: Wait for sandcastle with the added unit test in caffe2/torch/ao/quantization/eager/test_fuse_eager

Differential Revision: D47402357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105069
Approved by: https://github.com/jerryzh168
2023-07-13 23:39:59 +00:00
Nikita Shulga
634659e262 Update mypy to 1.4.1 (#91983)
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  -
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi
2023-07-13 16:30:36 +00:00
Aaron Gokaslan
96b91ab248 Fix merged lintrunner error (#105005)
Fixes lintrunner linter race condition. Follow up to #104917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105005
Approved by: https://github.com/malfet, https://github.com/ezyang
2023-07-11 22:04:49 +00:00
Aaron Gokaslan
2f95a3d0fc [BE]: Apply ruff PERF fixes to torch (#104917)
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-11 20:45:21 +00:00
Andrew Or
4b29829ece [quant][pt2] Fix QAT convert for mobilenetv2 (#104110)
Summary:
QAT convert for mobilenetv2 was previously not working
because we incorrectly applied dropout during eval as well as
training. This is because, for exported models, model.eval() does
not change the behavior of dropout, unlike models with torch ops.
This commit simulates the effects of model.eval() for exported
models as well by replacing the aten dropout pattern before eval.
As of this commit, end-to-end QAT numerics now match for
mobilenetv2 between FX and PT2.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Differential Revision: D46750343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110
Approved by: https://github.com/jerryzh168
2023-07-11 18:42:42 +00:00
maxren
332f2057df [XNNPACK][QS8] torch.nn.ELU (#104307)
Differential Revision: [D47075933](https://our.internmc.facebook.com/intern/diff/D47075933/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104307
Approved by: https://github.com/digantdesai
2023-07-11 00:35:13 +00:00
maxren
c4e084e3c7 [XNNPACK][QS8] torch.nn.ConstantPad2d (#104306)
Differential Revision: [D47075932](https://our.internmc.facebook.com/intern/diff/D47075932/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104306
Approved by: https://github.com/digantdesai
2023-07-11 00:35:02 +00:00
maxren
2c960c73a3 [XNNPACK][QS8] torch.permute (#104305)
Differential Revision: [D47075934](https://our.internmc.facebook.com/intern/diff/D47075934/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104305
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
maxren
d41c4a8338 [XNNPACK][QS8] torch.clamp (#104304)
Differential Revision: [D47075935](https://our.internmc.facebook.com/intern/diff/D47075935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104304
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
leslie-fang-intel
2a21469a77 [Quant][PT2E] Enable conv2d unary and binary recipe for x86 inductor quantizer (#98826)
**Summary**

- Recipe to annotate `conv2d_relu` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add_relu` for `X86InductorQuantizer` is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98826
Approved by: https://github.com/jerryzh168
2023-07-04 00:01:10 +00:00
Kimish Patel
bd0f0f40a1 [PT2][Quant] Enable symbolic shape in linear quantization (#104473)
When tracing with symbolic shapes, arbitrary sym_size nodes can appear in the
graph. Earlier changes did not account for this and quantizer fails to annotate
the right nodes. This diff fixes that by not annotating sym_size nodes, which
should really not be relevant for quantization.

As next steps, we should validate in quant workflow that a) sym_int nodes are not
being quantized and b) add similar support, as this diff, for generic
annotations

Differential Revision: [D47132050](https://our.internmc.facebook.com/intern/diff/D47132050/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104473
Approved by: https://github.com/jerryzh168
2023-07-01 05:14:30 +00:00
Digant Desai
36c4dad197 [ET][XNNPACK] Add support for quantized LeakyReLU (#104309)
Summary: Also adds support for backend_config

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Reviewed By: mcr229

Differential Revision: D47043207

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104309
Approved by: https://github.com/salilsdesai, https://github.com/manuelcandales
2023-06-30 17:42:22 +00:00
Jerry Zhang
ecca9591d5 [quant][pt2e] Add reference representation for quantize/dequantize operators (#104395)
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators

Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: kimishpatel

Differential Revision: D46959928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
2023-06-30 04:32:18 +00:00
leslie-fang-intel
945a257277 [Quant][PT2E] Supported customized _EQUIVALENT_TYPES in Module Partition API (#102516)
**Summary**
`Module Partition API` can simplify the pattern match process in Quantization annotation. However, current implementation of
`Module Partition API` has hardcoded `_EQUIVALENT_TYPES` 999bae0f54/torch/ao/quantization/_pt2e/graph_utils.py (L13-L20). So, PyTorch Extension Libraries such as [intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch) can't use `Module Partition API` with customized `_EQUIVALENT_TYPES` . In this PR, we plan to enable customized `_EQUIVALENT_TYPES` by pass in parameter.

**Test Plan**
```
python -m pytest test_graph_utils.py -k test_customized_equivalet_types_dict
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102516
Approved by: https://github.com/jgong5, https://github.com/kimishpatel
2023-06-28 00:20:25 +00:00
Jerry Zhang
c98896b76f [quant][pt2e] Add more precise representation for quantized add (#104130)
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:

float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...

inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:

```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor   /
```

Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:

```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
    x = (x_scale / out_scale) * x_i8
    y = (y_scale / out_scale) * y_i8
    out = x + y
    out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
    out += out_zero_point
    return out
```

Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D45628032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
2023-06-27 20:11:30 +00:00
Digant Desai
ef285faeba [ET][XNNPACK] Add support for quantized Multiply (#104134)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46985169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104134
Approved by: https://github.com/mcr229, https://github.com/salilsdesai
2023-06-27 16:59:28 +00:00
Digant Desai
bd8841101b [ET][XNNPACK] Add support for quantized Sub (#104090)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46924209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104090
Approved by: https://github.com/mcr229
2023-06-26 16:32:15 +00:00
HDCharles
8176cd8c0f [ao] fixing quantized prelu workflow (#103455)
Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu
was not running its observers when the quantization flow was being run,
this was a bug which is now fixed and the relevant prelu tests also now
check for this. Also added a corrected observer for PReLU to
qconfig_mapping

Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455
Approved by: https://github.com/jerryzh168
2023-06-23 16:45:40 +00:00
Andrew Or
7320ef5651 [quant][pt2] Add prepare QAT test for mobilenetv2 (#104068)
Summary:
Prepare QAT for mobilenetv2 has matching numerics with
FX. There were two changes needed to achieve this, however.
First, this commit adds observer sharing for ReLU6, which is
used extensively throughout this model. Second, in the tests we
have to use the same manual seed every time we call the models
in order to get the same results between FX and PT2. This is
because there is a dropout at the end of the model.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Reviewed By: kimishpatel

Differential Revision: D46707786

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104068
Approved by: https://github.com/jerryzh168
2023-06-23 16:34:25 +00:00
andrewor14
0d5f1cb666 [quant] Add torch.flatten to executorch backend_config (#103988)
Summary: This is needed to make the short-term and long-term
quantization numerics match for mobilenetv2.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh, kimishpatel

Subscribers: jerryzh, kimishpatel

Differential Revision: [D46909962](https://our.internmc.facebook.com/intern/diff/D46909962)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103988
Approved by: https://github.com/jerryzh168
2023-06-22 22:11:48 +00:00
Andrew Or
303ff84b04 [quant][pt2] Update special qspecs after QAT rewrite (#103970)
Summary:
Special qspecs like `SharedQuantizationSpec` and
`DerivedQuantizationSpec` refer to other nodes in the graph.
However, after subgraph rewriting in QAT, the nodes referred
to in these special qspecs may be replaced by new nodes.
This could lead to the following error when inserting
observers according to these qspecs:

```
AssertionError: please make sure only refer to edge or node
that has observer/fake_quant inserted: 'getitem' not in
dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3])
```

This commit fixes this by keeping track of the nodes that
are replaced during subgraph rewriting in QAT, and using
this mapping to update the dangling references used in these
special qspecs.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec

Reviewed By: jerryzh168

Differential Revision: D46606614

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970
Approved by: https://github.com/jerryzh168
2023-06-22 20:05:57 +00:00
Andrew Or
873f772df2 [quant][pt2] Fix QAT convert for resnet18 (#103759)
Summary:
Before this commit, only prepare QAT numerics matched
between PT2 and FX for resnet18. Convert numerics diverged,
however, for two reasons:

(1) Existing patterns did not handle inplace ReLUs. This commit
fixes this by adding extra patterns that use these ReLUs instead
of the normal ones.

(2) Subgraph rewriter could not handle skip connections in
quantized models, because the dequantize node is used in both
the conv node within the match pattern, and an inplace add node
outside of the match pattern. This led the subgraph matcher to
filter out the match, complaining that it was not self contained.
This commit fixes this problem by duplicating the dequantize
nodes, one for each user, such that subsequent matches will
be self contained.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_resnet18

Reviewed By: jerryzh168

Differential Revision: D46564114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103759
Approved by: https://github.com/jerryzh168
2023-06-21 15:36:07 +00:00
leslie-fang-intel
9832cfbbfe Quantization oneDNN backend only support VNNI CPU (#103653)
**Summary**

- Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support.
- Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-19 09:50:07 +00:00
leslie-fang-intel
dbc8eb2a8f [Quant][PT2E]Enable x86 inductor quantizer (#98730)
**Summary**

- Enable `X86InductorQuantizer` basics.
- Recipe to annotate conv2d is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98730
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-17 06:10:23 +00:00
Andrew Or
2bc56bec07 [quant][pt2] Handle literal conv args in convert QAT (#103731)
Summary:
Similar to the prepare case, we need to manually copy
over literal conv args such as padding and stride to the new,
replaced conv nodes, since these args are not captured by the
subgraph rewriter.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion_literal_args

Reviewed By: jerryzh168

Differential Revision: D46383130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103731
Approved by: https://github.com/jerryzh168
2023-06-16 17:15:37 +00:00
Andrew Or
dad29f906b [quant][pt2] Fix no conv bias in convert QAT (#103298)
Summary:
Previously, the QAT pattern for conv + bn with no conv
bias was not actually replaced in convert. This commit adds an
extra pattern in the convert path for this case and the numerics
now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias

Reviewed By: jerryzh168

Differential Revision: D46382819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103298
Approved by: https://github.com/jerryzh168
2023-06-16 01:59:48 +00:00
Kimish Patel
90ee6a7354 [PT2][Quant] Update op names for decomposed quantized lib (#103251)
Summary:
Dynamo trace, via dynamo.export, with aten_graph, generates graph with nodes
whose target is an isntance of torch._ops.OpOverload. Quantization workflow
inserting quantize/dequantize ops which are sometimes instances of
torch._ops.OpOverload (quantize_per_tensor.tensor) while other times instances
of torch._ops.OpOverloadPacket (quantizer_per_tensor) is a bit inconsistent.

Also not sure if it is a valid exported model, if it has nodes with target
of type torch._ops.OpOverloadPacket.

Without op overload name attached to the 'target', it fails during executorch
tracing. Reason is that executorch tracing expects node's targets to be
instances of torch._ops.OpOverload and not torch._ops.OpOverloadPacket.

So for consistency and tracing reasons, fixing convert pass to insert ops which
are torch._ops.OpOverload

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D46342822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103251
Approved by: https://github.com/andrewor14
2023-06-15 04:37:58 +00:00
Piotr Sebastian Kluska
b4056ba744 chore: Update ModelReportObserver variables to buffers (#97971)
This commit changes ModelReportObserver variables to buffers similar to other observers. This will allow for gathering data on other device than CPU.
Moreover, updates InputWeightEqualizationDetector to compute weight stats that are on GPU

Tested with running tests `test/quantization/fx/test_model_report_fx.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97971
Approved by: https://github.com/vkuzo
2023-06-15 03:15:41 +00:00
Kimish Patel
49dcf48e66 [PT2][Quant] Change quat conv bn fusion code (#103556)
Summary:
Dynamo burn in scalars instead of keeping them on module. This results in
quantize_per_tensor and dequantize_per_tensor nodes to have burnt in scale and
zero point value, if we trace them scalar.

Graph rewrite ignores literals and when match pattern is replaced with
replacement pattern, we lose the scale/zp and other values from nodes in
original graph and instead get one from replacement graph.

This diff fixes that for q/dq per tensor node by manually copying these values
over.

Note that this is not robust because it works only when there is only a single
q/dq node

Test Plan: quantization_pt2e

Reviewed By: andrewor14

Differential Revision: D46614000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103556
Approved by: https://github.com/andrewor14
2023-06-14 18:37:43 +00:00
Jerry Zhang
0cd155b042 [reland][quant][pt2e] Annotate GRU module (#103358) (#103526)
Summary:

att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Differential Revision: D46689428

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103526
Approved by: https://github.com/andrewor14
2023-06-13 23:43:10 +00:00
PyTorch MergeBot
13777e3391 Revert "[quant][pt2e] Annotate GRU module (#103358)"
This reverts commit 23892d8ee4.

Reverted https://github.com/pytorch/pytorch/pull/103358 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/103358#issuecomment-1588729657))
2023-06-13 07:45:40 +00:00
Jerry Zhang
23892d8ee4 [quant][pt2e] Annotate GRU module (#103358)
Summary: att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Reviewed By: kimishpatel

Differential Revision: D46384329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103358
Approved by: https://github.com/HDCharles
2023-06-13 04:10:13 +00:00
Yash Vardhan
6ed3c4499a Fix fuse_custom_config_dict arg from being None (#102154)
`fuse_custom_config_dict` in [fuse_modules.py](https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fuse_modules.py#L164) being passed as None even if a fuse_custom_config_dict is provided.

This patch fixes the `fuse_custom_config_dict` from being passed as None.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102154
Approved by: https://github.com/kit1980
2023-06-13 03:45:20 +00:00
maxren
f37be77813 [Quant][XNNPACK] Delegate add_relu fusion (#103266)
Quantized Resnet currently sees fused add-relu
```
--> dq
       \
        add --> relu --> quant
       /
--> dq
```

Let us support this fusion in the delegate as xnnpack can use the output_min and output_max of the op nodes to clamp the values and perform a fused add - relu operation

Differential Revision: [D45258028](https://our.internmc.facebook.com/intern/diff/D45258028/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103266
Approved by: https://github.com/jerryzh168
2023-06-12 04:35:29 +00:00
Andrew Or
89d57f269f [quant][pt2] Fix convert in Conv + BN + ReLU QAT fusion (#102993)
Summary:
Previously, the QAT pattern for conv + bn + relu was
not actually replaced in convert. This is because the quantized
QAT pattern used in convert doesn't actually have a relu node.
This commit adds this extra pattern in the convert path and
the numerics now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_numerics

Reviewed By: jerryzh168

Differential Revision: D46372411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102993
Approved by: https://github.com/jerryzh168
2023-06-08 22:10:29 +00:00
Kimish Patel
a49aefdce2 [PT2][Quant] In linear partition include functional.linear (#103186)
Summary: as title

Test Plan: tested in subsequent diff

Reviewed By: jerryzh168

Differential Revision: D46342824

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103186
Approved by: https://github.com/jerryzh168
2023-06-08 09:48:09 +00:00
Kimish Patel
471407cf78 [PT2][Quant] Use composble quantizer for embedding + static conv + dynamic (#103116)
Summary:
In this diff we test a module that does a) emedding lookup b) runs 1D
(converted to 2D) conv and c) runs linear on the output of 1d conv.

a is quantized using embedding quantizer.
c is quantized using dynamic quantization.
b is quantized using static quantization.

We compose quantizer from [a, c, b]. Tested it against similar fx config.

Test Plan: test_embedding_conv_linear_quantization

Reviewed By: jerryzh168

Differential Revision: D46267688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103116
Approved by: https://github.com/jerryzh168
2023-06-07 17:34:59 +00:00
Kimish Patel
8e0837cf84 [PT2][Quant] Move embedding quantization to osss (#103088)
Summary:
This is in preperation to enable embeddign quantization on models with
embeddings.

Test Plan: test_embedding_quantizer

Reviewed By: jerryzh168

Differential Revision: D46267689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103088
Approved by: https://github.com/andrewor14
2023-06-06 23:07:57 +00:00
Xuan Xie
6261055471 dst_bin_of_end_center is defined twice (#102755)
(line 995 and line 1011)
both definations are the same.
Delete one of them.

Fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102755
Approved by: https://github.com/janeyx99
2023-06-06 21:17:07 +00:00
Kimish Patel
8824101fb6 [PT2][Quant] Introduce composable quantizer (#102846)
Summary:
Using composable quantizer, we can now composable two or more quantizers. In
the test here we compose quantizer configured with dynamic linear quantization,
with quantizer configured for static quantization.

Note that composable quantizer has strict order in which annotations are
applied

Test Plan: test_composable_quantizer*

Reviewed By: jerryzh168

Differential Revision: D46267690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102846
Approved by: https://github.com/andrewor14
2023-06-06 14:01:55 +00:00
Jerry Zhang
5fbbae4283 [quant][pt2e][be] Cleanup prepare function in _pt2e (#103022)
Summary: att

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Differential Revision: D46346087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103022
Approved by: https://github.com/andrewor14
2023-06-06 04:33:05 +00:00
Andrew Or
604a414bfc [quant][pt2] Fix convert in Conv + BN QAT fusion (#102224)
Summary:
Previously, the test for the convert flow in Conv + BN
QAT fusion was not enabled by mistake. However, reenabling this
test uncovered several bugs:

(1) The replaced nodes returned by subgraph rewriter were not
handled correctly. This is because a recent change in the subgraph
rewriter (#100556) fixed only the prepare case but not the convert
case. This commit brings this fix to the convert case as well and
deduplicates some code between the two cases.

(2) When folding BN into conv, we used the wrong arg index to get
the BN eps value. This resulted in an incorrect conv weight.

(3) In FX, we currently do a hack for weighted modules where we
observe the weights once in convert in order to ensure we get the
right shapes for these weight observers. This caused the numerics
to diverge between PT2 and FX. This commit fixes this by skipping
this unnecessary hack for `_convert_to_reference_decomposed_fx`.

(4) Per channel support was simply missing. This commit adds
support for this by matching the quantize_per_channel and
dequantize_per_channel ops in addition to the existing ones.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_numerics

Reviewed By: jerryzh168

Differential Revision: D46097783

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102224
Approved by: https://github.com/jerryzh168
2023-06-05 18:09:28 +00:00
Jerry Zhang
eb0971cfe9 [quant][pt2e][be] Remove _input_output_share_observers and _reuse_input_obs_or_fq from QuantizationAnnotation (#102854)
Summary:
att, after we support SharedQuantizationSpec we don't need these things anymore, this PR refactors the
uses of _input_output_share_observers to SharedQuantizationSpec

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: andrewor14

Differential Revision: D46301342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102854
Approved by: https://github.com/andrewor14
2023-06-03 07:31:09 +00:00
Kimish Patel
a53acafd2b [PT2][Quant] Enable dynamic quantization (#102703)
Enable dynamic quantization of linear layers.

Differential Revision: [D46235070](https://our.internmc.facebook.com/intern/diff/D46235070/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102703
Approved by: https://github.com/andrewor14
2023-06-02 17:52:14 +00:00
Kimish Patel
2301b624ae [PT2][Quant] Update quconfig to contain input/qoutput activation qspec (#102702)
As title

Differential Revision: [D46342823](https://our.internmc.facebook.com/intern/diff/D46342823/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102702
Approved by: https://github.com/andrewor14
2023-06-02 17:41:46 +00:00
Kimish Patel
6492b7d22e [PT2][Quant][BE] Refactor qnnpack_quantizer.py (#102701)
This diff refactors annotate functions so as to couple annotate functions with
corresponding quantization configs that they support. This will help in dynamic
quantization which is only supported for linear layers

Differential Revision: [D46235071](https://our.internmc.facebook.com/intern/diff/D46235071/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102701
Approved by: https://github.com/jerryzh168
2023-06-02 17:14:56 +00:00
Jerry Zhang
ce8d31551b [quant][be] Change return type for zero_point to be int32 Tensor (#102234)
Summary: This is probably a typo

Test Plan: CI

Reviewed By: salilsdesai

Differential Revision: D46172706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102234
Approved by: https://github.com/salilsdesai
2023-06-01 18:30:44 +00:00
Jerry Zhang
d930bfc419 [quant][pt2e][be] Add QuantizationSpecBase (#102582)
Summary:
Make all quantization spec to inherit from the same base class in order to simplify the typing
for QuantizationAnnotation

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: kimishpatel

Differential Revision: D46173954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102582
Approved by: https://github.com/andrewor14
2023-06-01 17:55:22 +00:00
Jerry Zhang
f14ac74fce [quant][pt2e] Add support for FixedQParamsQuantizationSpec (#102439)
Summary:
This PR adds support for FixedQParamsQuantizationSpec:

```
dataclass(eq=True, frozen=True)
class FixedQParamsQuantizationSpec(QuantizationSpecBase):
    dtype: torch.dtype
    scale: float
    zero_point: int
    quant_min: Optional[int] = None
    quant_max: Optional[int] = None
    qscheme: Optional[torch.qscheme] = None
```

This is useful to define quantization spec for operators like sigmoid which has predefined and fixed scale/zero_point

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_fixed_qparams_qspec (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D46153082

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102439
Approved by: https://github.com/kimishpatel
2023-05-30 21:28:13 +00:00
Kimish Patel
af70fe9f3e [PT2][Quant] Enable test_qnnpack_quantizer_conv_linear test (#102399)
Earlier this test was disabled due to pattern matching not working correctly.
Enablign this test now since we moved to module partitioner based matching.

Differential Revision: [D46130722](https://our.internmc.facebook.com/intern/diff/D46130722/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102399
Approved by: https://github.com/jerryzh168
2023-05-28 06:44:16 +00:00
Kimish Patel
0d876f7d43 [PT2][Quant] Move observer sharing ops to use module partitions (#102398)
As title

Differential Revision: [D46095331](https://our.internmc.facebook.com/intern/diff/D46095331/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102398
Approved by: https://github.com/jerryzh168
2023-05-28 05:50:15 +00:00
Kimish Patel
9fac5afbcc [PT2][Quant] Move add/add relu pattern via module partitioner (#102397)
This diff uses module partitioners to find add and add + relu patterns.

Differential Revision: [D46095330](https://our.internmc.facebook.com/intern/diff/D46095330/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102397
Approved by: https://github.com/jerryzh168
2023-05-28 05:47:43 +00:00
Kimish Patel
3d8f405022 [PT2][Quant] Move maxpool_2d quant to use module partitioners (#102396)
As summary

Differential Revision: [D46095332](https://our.internmc.facebook.com/intern/diff/D46095332/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102396
Approved by: https://github.com/jerryzh168
2023-05-28 05:44:54 +00:00
Kimish Patel
d997e3aac6 [PT2][Quant] Use module partitions for conv2d and conv2d + relu (#102395)
In this diff we continue to use source partition for identifying node patterns
to annotate. Here we expand the usecase for conv2d+relu and conv2d

Differential Revision: [D46095329](https://our.internmc.facebook.com/intern/diff/D46095329/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102395
Approved by: https://github.com/jerryzh168
2023-05-28 05:40:45 +00:00
Kimish Patel
4cb6add471 [PT2][Quant] Use module partition for fused patterns (#102394)
This diff introduces utility `find_sequential_partitions`.
This utility allows one to specify sequential pattern of
nn.Module/nn.functional and returns a list. Each item in the list contains a
List[SourcePartition] that represents sequentially connected partitions that
are of the pattern requested.
For example `find_sequential_partitions(model, [nn.Conv2d, nn.ReLU])` will find
all nn.Conv2d and nn.ReLU partitions that are sequentially connected.

Furthmore, move to using `find_sequential_partitions` for conv_bn/conv_bn_relu
for QAT.

Differential Revision: [D45948057](https://our.internmc.facebook.com/intern/diff/D45948057/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D45948057/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102394
Approved by: https://github.com/jerryzh168
2023-05-28 05:29:16 +00:00
Jerry Zhang
eda5abf5e0 [quant][pt2e] Fix propagate_annotation after recent refactors (#102422)
Summary:
Recently we changed the annotation from "target_dtype_info" to "quantization_annotation" and introduced QuantizationAnnotation API
and SharedQuantizationSpec API for users to convey sharing between input/outputs, this PR updates the _propagate_annotation
pass to accommadate the recent changes

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: kimishpatel

Differential Revision: D46153084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102422
Approved by: https://github.com/kimishpatel
2023-05-27 16:01:47 +00:00
Jerry Zhang
23223402eb [quant][pt2e] Add Support for DerivedQuantizationSpec (#102282)
Summary:
```
"""
4. DerivedQuantizationSpec
this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors
"""

class DerivedQuantizationSpec(QuantizationSpecBase):
    # specifies which Tensors the quantization parameters are derived from
    # this can either be an edge from argument to node, or a node
    derived_from: List[EdgeOrNode]
    derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]]
     ...
```

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D46097855

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282
Approved by: https://github.com/andrewor14
2023-05-27 00:24:39 +00:00
Jerry Zhang
ed87508b32 [quant][pt2e] Add support for SharedQuantizationSpec (#102184)
Summary:
This PR adds support for SharedQuantizationSpec, it's used to express the sharing between
two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or
output of some node (expressed as an fx Node)

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Differential Revision: D46043026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184
Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel
2023-05-25 17:31:59 +00:00
Riley Dulin
424c930f76 Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926)
Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules
nn.PixelShuffle and nn.PixelUnshuffle.

torch.nn.PixelUnshuffle accepts both float and quantized inputs.
However, previously we would unnecessarily dequantize quantized inputs into floats
before passing them to the function. This commit fixes this by lowering the pattern
[dequant - PixelShuffle - quant].
[dequant - PixelUnshuffle - quant].

Test Plan:

python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926
Approved by: https://github.com/jerryzh168
2023-05-24 19:33:26 +00:00
Jerry Zhang
3baa67caee [quant][pt2e][be] Move annotate helper function to quantizer/utils.py (#102127)
Summary: att

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D46001285

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102127
Approved by: https://github.com/kimishpatel
2023-05-24 16:13:28 +00:00