Commit Graph

1019 Commits

Author SHA1 Message Date
Andrew Or
4b29829ece [quant][pt2] Fix QAT convert for mobilenetv2 (#104110)
Summary:
QAT convert for mobilenetv2 was previously not working
because we incorrectly applied dropout during eval as well as
training. This is because, for exported models, model.eval() does
not change the behavior of dropout, unlike models with torch ops.
This commit simulates the effects of model.eval() for exported
models as well by replacing the aten dropout pattern before eval.
As of this commit, end-to-end QAT numerics now match for
mobilenetv2 between FX and PT2.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Differential Revision: D46750343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110
Approved by: https://github.com/jerryzh168
2023-07-11 18:42:42 +00:00
maxren
332f2057df [XNNPACK][QS8] torch.nn.ELU (#104307)
Differential Revision: [D47075933](https://our.internmc.facebook.com/intern/diff/D47075933/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104307
Approved by: https://github.com/digantdesai
2023-07-11 00:35:13 +00:00
maxren
c4e084e3c7 [XNNPACK][QS8] torch.nn.ConstantPad2d (#104306)
Differential Revision: [D47075932](https://our.internmc.facebook.com/intern/diff/D47075932/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104306
Approved by: https://github.com/digantdesai
2023-07-11 00:35:02 +00:00
maxren
2c960c73a3 [XNNPACK][QS8] torch.permute (#104305)
Differential Revision: [D47075934](https://our.internmc.facebook.com/intern/diff/D47075934/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104305
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
maxren
d41c4a8338 [XNNPACK][QS8] torch.clamp (#104304)
Differential Revision: [D47075935](https://our.internmc.facebook.com/intern/diff/D47075935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104304
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
leslie-fang-intel
2a21469a77 [Quant][PT2E] Enable conv2d unary and binary recipe for x86 inductor quantizer (#98826)
**Summary**

- Recipe to annotate `conv2d_relu` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add_relu` for `X86InductorQuantizer` is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98826
Approved by: https://github.com/jerryzh168
2023-07-04 00:01:10 +00:00
Kimish Patel
bd0f0f40a1 [PT2][Quant] Enable symbolic shape in linear quantization (#104473)
When tracing with symbolic shapes, arbitrary sym_size nodes can appear in the
graph. Earlier changes did not account for this and quantizer fails to annotate
the right nodes. This diff fixes that by not annotating sym_size nodes, which
should really not be relevant for quantization.

As next steps, we should validate in quant workflow that a) sym_int nodes are not
being quantized and b) add similar support, as this diff, for generic
annotations

Differential Revision: [D47132050](https://our.internmc.facebook.com/intern/diff/D47132050/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104473
Approved by: https://github.com/jerryzh168
2023-07-01 05:14:30 +00:00
Digant Desai
36c4dad197 [ET][XNNPACK] Add support for quantized LeakyReLU (#104309)
Summary: Also adds support for backend_config

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Reviewed By: mcr229

Differential Revision: D47043207

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104309
Approved by: https://github.com/salilsdesai, https://github.com/manuelcandales
2023-06-30 17:42:22 +00:00
Jerry Zhang
ecca9591d5 [quant][pt2e] Add reference representation for quantize/dequantize operators (#104395)
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators

Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: kimishpatel

Differential Revision: D46959928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
2023-06-30 04:32:18 +00:00
leslie-fang-intel
945a257277 [Quant][PT2E] Supported customized _EQUIVALENT_TYPES in Module Partition API (#102516)
**Summary**
`Module Partition API` can simplify the pattern match process in Quantization annotation. However, current implementation of
`Module Partition API` has hardcoded `_EQUIVALENT_TYPES` 999bae0f54/torch/ao/quantization/_pt2e/graph_utils.py (L13-L20). So, PyTorch Extension Libraries such as [intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch) can't use `Module Partition API` with customized `_EQUIVALENT_TYPES` . In this PR, we plan to enable customized `_EQUIVALENT_TYPES` by pass in parameter.

**Test Plan**
```
python -m pytest test_graph_utils.py -k test_customized_equivalet_types_dict
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102516
Approved by: https://github.com/jgong5, https://github.com/kimishpatel
2023-06-28 00:20:25 +00:00
Jerry Zhang
c98896b76f [quant][pt2e] Add more precise representation for quantized add (#104130)
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:

float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...

inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:

```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor   /
```

Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:

```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
    x = (x_scale / out_scale) * x_i8
    y = (y_scale / out_scale) * y_i8
    out = x + y
    out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
    out += out_zero_point
    return out
```

Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D45628032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
2023-06-27 20:11:30 +00:00
Digant Desai
ef285faeba [ET][XNNPACK] Add support for quantized Multiply (#104134)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46985169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104134
Approved by: https://github.com/mcr229, https://github.com/salilsdesai
2023-06-27 16:59:28 +00:00
Digant Desai
bd8841101b [ET][XNNPACK] Add support for quantized Sub (#104090)
Summary:
Also adds support for backend_config with relu fusion since XNNPACK allows it.

We should revisit the relu fusion once we gain more clarity on quantSrcPartition or some other way to do these fusion and not having to add all combinations.

We should really rename the backend config to et_xnnpack.py or something TODO

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Differential Revision: D46924209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104090
Approved by: https://github.com/mcr229
2023-06-26 16:32:15 +00:00
HDCharles
8176cd8c0f [ao] fixing quantized prelu workflow (#103455)
Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu
was not running its observers when the quantization flow was being run,
this was a bug which is now fixed and the relevant prelu tests also now
check for this. Also added a corrected observer for PReLU to
qconfig_mapping

Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455
Approved by: https://github.com/jerryzh168
2023-06-23 16:45:40 +00:00
Andrew Or
7320ef5651 [quant][pt2] Add prepare QAT test for mobilenetv2 (#104068)
Summary:
Prepare QAT for mobilenetv2 has matching numerics with
FX. There were two changes needed to achieve this, however.
First, this commit adds observer sharing for ReLU6, which is
used extensively throughout this model. Second, in the tests we
have to use the same manual seed every time we call the models
in order to get the same results between FX and PT2. This is
because there is a dropout at the end of the model.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Reviewed By: kimishpatel

Differential Revision: D46707786

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104068
Approved by: https://github.com/jerryzh168
2023-06-23 16:34:25 +00:00
andrewor14
0d5f1cb666 [quant] Add torch.flatten to executorch backend_config (#103988)
Summary: This is needed to make the short-term and long-term
quantization numerics match for mobilenetv2.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh, kimishpatel

Subscribers: jerryzh, kimishpatel

Differential Revision: [D46909962](https://our.internmc.facebook.com/intern/diff/D46909962)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103988
Approved by: https://github.com/jerryzh168
2023-06-22 22:11:48 +00:00
Andrew Or
303ff84b04 [quant][pt2] Update special qspecs after QAT rewrite (#103970)
Summary:
Special qspecs like `SharedQuantizationSpec` and
`DerivedQuantizationSpec` refer to other nodes in the graph.
However, after subgraph rewriting in QAT, the nodes referred
to in these special qspecs may be replaced by new nodes.
This could lead to the following error when inserting
observers according to these qspecs:

```
AssertionError: please make sure only refer to edge or node
that has observer/fake_quant inserted: 'getitem' not in
dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3])
```

This commit fixes this by keeping track of the nodes that
are replaced during subgraph rewriting in QAT, and using
this mapping to update the dangling references used in these
special qspecs.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec

Reviewed By: jerryzh168

Differential Revision: D46606614

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970
Approved by: https://github.com/jerryzh168
2023-06-22 20:05:57 +00:00
Andrew Or
873f772df2 [quant][pt2] Fix QAT convert for resnet18 (#103759)
Summary:
Before this commit, only prepare QAT numerics matched
between PT2 and FX for resnet18. Convert numerics diverged,
however, for two reasons:

(1) Existing patterns did not handle inplace ReLUs. This commit
fixes this by adding extra patterns that use these ReLUs instead
of the normal ones.

(2) Subgraph rewriter could not handle skip connections in
quantized models, because the dequantize node is used in both
the conv node within the match pattern, and an inplace add node
outside of the match pattern. This led the subgraph matcher to
filter out the match, complaining that it was not self contained.
This commit fixes this problem by duplicating the dequantize
nodes, one for each user, such that subsequent matches will
be self contained.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_resnet18

Reviewed By: jerryzh168

Differential Revision: D46564114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103759
Approved by: https://github.com/jerryzh168
2023-06-21 15:36:07 +00:00
leslie-fang-intel
9832cfbbfe Quantization oneDNN backend only support VNNI CPU (#103653)
**Summary**

- Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support.
- Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-19 09:50:07 +00:00
leslie-fang-intel
dbc8eb2a8f [Quant][PT2E]Enable x86 inductor quantizer (#98730)
**Summary**

- Enable `X86InductorQuantizer` basics.
- Recipe to annotate conv2d is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98730
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-17 06:10:23 +00:00
Andrew Or
2bc56bec07 [quant][pt2] Handle literal conv args in convert QAT (#103731)
Summary:
Similar to the prepare case, we need to manually copy
over literal conv args such as padding and stride to the new,
replaced conv nodes, since these args are not captured by the
subgraph rewriter.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion_literal_args

Reviewed By: jerryzh168

Differential Revision: D46383130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103731
Approved by: https://github.com/jerryzh168
2023-06-16 17:15:37 +00:00
Andrew Or
dad29f906b [quant][pt2] Fix no conv bias in convert QAT (#103298)
Summary:
Previously, the QAT pattern for conv + bn with no conv
bias was not actually replaced in convert. This commit adds an
extra pattern in the convert path for this case and the numerics
now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias

Reviewed By: jerryzh168

Differential Revision: D46382819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103298
Approved by: https://github.com/jerryzh168
2023-06-16 01:59:48 +00:00
Kimish Patel
90ee6a7354 [PT2][Quant] Update op names for decomposed quantized lib (#103251)
Summary:
Dynamo trace, via dynamo.export, with aten_graph, generates graph with nodes
whose target is an isntance of torch._ops.OpOverload. Quantization workflow
inserting quantize/dequantize ops which are sometimes instances of
torch._ops.OpOverload (quantize_per_tensor.tensor) while other times instances
of torch._ops.OpOverloadPacket (quantizer_per_tensor) is a bit inconsistent.

Also not sure if it is a valid exported model, if it has nodes with target
of type torch._ops.OpOverloadPacket.

Without op overload name attached to the 'target', it fails during executorch
tracing. Reason is that executorch tracing expects node's targets to be
instances of torch._ops.OpOverload and not torch._ops.OpOverloadPacket.

So for consistency and tracing reasons, fixing convert pass to insert ops which
are torch._ops.OpOverload

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D46342822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103251
Approved by: https://github.com/andrewor14
2023-06-15 04:37:58 +00:00
Piotr Sebastian Kluska
b4056ba744 chore: Update ModelReportObserver variables to buffers (#97971)
This commit changes ModelReportObserver variables to buffers similar to other observers. This will allow for gathering data on other device than CPU.
Moreover, updates InputWeightEqualizationDetector to compute weight stats that are on GPU

Tested with running tests `test/quantization/fx/test_model_report_fx.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97971
Approved by: https://github.com/vkuzo
2023-06-15 03:15:41 +00:00
Kimish Patel
49dcf48e66 [PT2][Quant] Change quat conv bn fusion code (#103556)
Summary:
Dynamo burn in scalars instead of keeping them on module. This results in
quantize_per_tensor and dequantize_per_tensor nodes to have burnt in scale and
zero point value, if we trace them scalar.

Graph rewrite ignores literals and when match pattern is replaced with
replacement pattern, we lose the scale/zp and other values from nodes in
original graph and instead get one from replacement graph.

This diff fixes that for q/dq per tensor node by manually copying these values
over.

Note that this is not robust because it works only when there is only a single
q/dq node

Test Plan: quantization_pt2e

Reviewed By: andrewor14

Differential Revision: D46614000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103556
Approved by: https://github.com/andrewor14
2023-06-14 18:37:43 +00:00
Jerry Zhang
0cd155b042 [reland][quant][pt2e] Annotate GRU module (#103358) (#103526)
Summary:

att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Differential Revision: D46689428

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103526
Approved by: https://github.com/andrewor14
2023-06-13 23:43:10 +00:00
PyTorch MergeBot
13777e3391 Revert "[quant][pt2e] Annotate GRU module (#103358)"
This reverts commit 23892d8ee4.

Reverted https://github.com/pytorch/pytorch/pull/103358 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/103358#issuecomment-1588729657))
2023-06-13 07:45:40 +00:00
Jerry Zhang
23892d8ee4 [quant][pt2e] Annotate GRU module (#103358)
Summary: att, we use module partition API to identify the GRU submodule and annotate all necessary patterns

Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'

Reviewed By: kimishpatel

Differential Revision: D46384329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103358
Approved by: https://github.com/HDCharles
2023-06-13 04:10:13 +00:00
Yash Vardhan
6ed3c4499a Fix fuse_custom_config_dict arg from being None (#102154)
`fuse_custom_config_dict` in [fuse_modules.py](https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fuse_modules.py#L164) being passed as None even if a fuse_custom_config_dict is provided.

This patch fixes the `fuse_custom_config_dict` from being passed as None.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102154
Approved by: https://github.com/kit1980
2023-06-13 03:45:20 +00:00
maxren
f37be77813 [Quant][XNNPACK] Delegate add_relu fusion (#103266)
Quantized Resnet currently sees fused add-relu
```
--> dq
       \
        add --> relu --> quant
       /
--> dq
```

Let us support this fusion in the delegate as xnnpack can use the output_min and output_max of the op nodes to clamp the values and perform a fused add - relu operation

Differential Revision: [D45258028](https://our.internmc.facebook.com/intern/diff/D45258028/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103266
Approved by: https://github.com/jerryzh168
2023-06-12 04:35:29 +00:00
Andrew Or
89d57f269f [quant][pt2] Fix convert in Conv + BN + ReLU QAT fusion (#102993)
Summary:
Previously, the QAT pattern for conv + bn + relu was
not actually replaced in convert. This is because the quantized
QAT pattern used in convert doesn't actually have a relu node.
This commit adds this extra pattern in the convert path and
the numerics now match FX's.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_numerics

Reviewed By: jerryzh168

Differential Revision: D46372411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102993
Approved by: https://github.com/jerryzh168
2023-06-08 22:10:29 +00:00
Kimish Patel
a49aefdce2 [PT2][Quant] In linear partition include functional.linear (#103186)
Summary: as title

Test Plan: tested in subsequent diff

Reviewed By: jerryzh168

Differential Revision: D46342824

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103186
Approved by: https://github.com/jerryzh168
2023-06-08 09:48:09 +00:00
Kimish Patel
471407cf78 [PT2][Quant] Use composble quantizer for embedding + static conv + dynamic (#103116)
Summary:
In this diff we test a module that does a) emedding lookup b) runs 1D
(converted to 2D) conv and c) runs linear on the output of 1d conv.

a is quantized using embedding quantizer.
c is quantized using dynamic quantization.
b is quantized using static quantization.

We compose quantizer from [a, c, b]. Tested it against similar fx config.

Test Plan: test_embedding_conv_linear_quantization

Reviewed By: jerryzh168

Differential Revision: D46267688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103116
Approved by: https://github.com/jerryzh168
2023-06-07 17:34:59 +00:00
Kimish Patel
8e0837cf84 [PT2][Quant] Move embedding quantization to osss (#103088)
Summary:
This is in preperation to enable embeddign quantization on models with
embeddings.

Test Plan: test_embedding_quantizer

Reviewed By: jerryzh168

Differential Revision: D46267689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103088
Approved by: https://github.com/andrewor14
2023-06-06 23:07:57 +00:00
Xuan Xie
6261055471 dst_bin_of_end_center is defined twice (#102755)
(line 995 and line 1011)
both definations are the same.
Delete one of them.

Fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102755
Approved by: https://github.com/janeyx99
2023-06-06 21:17:07 +00:00
Kimish Patel
8824101fb6 [PT2][Quant] Introduce composable quantizer (#102846)
Summary:
Using composable quantizer, we can now composable two or more quantizers. In
the test here we compose quantizer configured with dynamic linear quantization,
with quantizer configured for static quantization.

Note that composable quantizer has strict order in which annotations are
applied

Test Plan: test_composable_quantizer*

Reviewed By: jerryzh168

Differential Revision: D46267690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102846
Approved by: https://github.com/andrewor14
2023-06-06 14:01:55 +00:00
Jerry Zhang
5fbbae4283 [quant][pt2e][be] Cleanup prepare function in _pt2e (#103022)
Summary: att

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Differential Revision: D46346087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103022
Approved by: https://github.com/andrewor14
2023-06-06 04:33:05 +00:00
Andrew Or
604a414bfc [quant][pt2] Fix convert in Conv + BN QAT fusion (#102224)
Summary:
Previously, the test for the convert flow in Conv + BN
QAT fusion was not enabled by mistake. However, reenabling this
test uncovered several bugs:

(1) The replaced nodes returned by subgraph rewriter were not
handled correctly. This is because a recent change in the subgraph
rewriter (#100556) fixed only the prepare case but not the convert
case. This commit brings this fix to the convert case as well and
deduplicates some code between the two cases.

(2) When folding BN into conv, we used the wrong arg index to get
the BN eps value. This resulted in an incorrect conv weight.

(3) In FX, we currently do a hack for weighted modules where we
observe the weights once in convert in order to ensure we get the
right shapes for these weight observers. This caused the numerics
to diverge between PT2 and FX. This commit fixes this by skipping
this unnecessary hack for `_convert_to_reference_decomposed_fx`.

(4) Per channel support was simply missing. This commit adds
support for this by matching the quantize_per_channel and
dequantize_per_channel ops in addition to the existing ones.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_numerics

Reviewed By: jerryzh168

Differential Revision: D46097783

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102224
Approved by: https://github.com/jerryzh168
2023-06-05 18:09:28 +00:00
Jerry Zhang
eb0971cfe9 [quant][pt2e][be] Remove _input_output_share_observers and _reuse_input_obs_or_fq from QuantizationAnnotation (#102854)
Summary:
att, after we support SharedQuantizationSpec we don't need these things anymore, this PR refactors the
uses of _input_output_share_observers to SharedQuantizationSpec

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: andrewor14

Differential Revision: D46301342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102854
Approved by: https://github.com/andrewor14
2023-06-03 07:31:09 +00:00
Kimish Patel
a53acafd2b [PT2][Quant] Enable dynamic quantization (#102703)
Enable dynamic quantization of linear layers.

Differential Revision: [D46235070](https://our.internmc.facebook.com/intern/diff/D46235070/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102703
Approved by: https://github.com/andrewor14
2023-06-02 17:52:14 +00:00
Kimish Patel
2301b624ae [PT2][Quant] Update quconfig to contain input/qoutput activation qspec (#102702)
As title

Differential Revision: [D46342823](https://our.internmc.facebook.com/intern/diff/D46342823/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102702
Approved by: https://github.com/andrewor14
2023-06-02 17:41:46 +00:00
Kimish Patel
6492b7d22e [PT2][Quant][BE] Refactor qnnpack_quantizer.py (#102701)
This diff refactors annotate functions so as to couple annotate functions with
corresponding quantization configs that they support. This will help in dynamic
quantization which is only supported for linear layers

Differential Revision: [D46235071](https://our.internmc.facebook.com/intern/diff/D46235071/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102701
Approved by: https://github.com/jerryzh168
2023-06-02 17:14:56 +00:00
Jerry Zhang
ce8d31551b [quant][be] Change return type for zero_point to be int32 Tensor (#102234)
Summary: This is probably a typo

Test Plan: CI

Reviewed By: salilsdesai

Differential Revision: D46172706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102234
Approved by: https://github.com/salilsdesai
2023-06-01 18:30:44 +00:00
Jerry Zhang
d930bfc419 [quant][pt2e][be] Add QuantizationSpecBase (#102582)
Summary:
Make all quantization spec to inherit from the same base class in order to simplify the typing
for QuantizationAnnotation

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: kimishpatel

Differential Revision: D46173954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102582
Approved by: https://github.com/andrewor14
2023-06-01 17:55:22 +00:00
Jerry Zhang
f14ac74fce [quant][pt2e] Add support for FixedQParamsQuantizationSpec (#102439)
Summary:
This PR adds support for FixedQParamsQuantizationSpec:

```
dataclass(eq=True, frozen=True)
class FixedQParamsQuantizationSpec(QuantizationSpecBase):
    dtype: torch.dtype
    scale: float
    zero_point: int
    quant_min: Optional[int] = None
    quant_max: Optional[int] = None
    qscheme: Optional[torch.qscheme] = None
```

This is useful to define quantization spec for operators like sigmoid which has predefined and fixed scale/zero_point

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_fixed_qparams_qspec (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D46153082

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102439
Approved by: https://github.com/kimishpatel
2023-05-30 21:28:13 +00:00
Kimish Patel
af70fe9f3e [PT2][Quant] Enable test_qnnpack_quantizer_conv_linear test (#102399)
Earlier this test was disabled due to pattern matching not working correctly.
Enablign this test now since we moved to module partitioner based matching.

Differential Revision: [D46130722](https://our.internmc.facebook.com/intern/diff/D46130722/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102399
Approved by: https://github.com/jerryzh168
2023-05-28 06:44:16 +00:00
Kimish Patel
0d876f7d43 [PT2][Quant] Move observer sharing ops to use module partitions (#102398)
As title

Differential Revision: [D46095331](https://our.internmc.facebook.com/intern/diff/D46095331/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102398
Approved by: https://github.com/jerryzh168
2023-05-28 05:50:15 +00:00
Kimish Patel
9fac5afbcc [PT2][Quant] Move add/add relu pattern via module partitioner (#102397)
This diff uses module partitioners to find add and add + relu patterns.

Differential Revision: [D46095330](https://our.internmc.facebook.com/intern/diff/D46095330/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102397
Approved by: https://github.com/jerryzh168
2023-05-28 05:47:43 +00:00
Kimish Patel
3d8f405022 [PT2][Quant] Move maxpool_2d quant to use module partitioners (#102396)
As summary

Differential Revision: [D46095332](https://our.internmc.facebook.com/intern/diff/D46095332/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102396
Approved by: https://github.com/jerryzh168
2023-05-28 05:44:54 +00:00
Kimish Patel
d997e3aac6 [PT2][Quant] Use module partitions for conv2d and conv2d + relu (#102395)
In this diff we continue to use source partition for identifying node patterns
to annotate. Here we expand the usecase for conv2d+relu and conv2d

Differential Revision: [D46095329](https://our.internmc.facebook.com/intern/diff/D46095329/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102395
Approved by: https://github.com/jerryzh168
2023-05-28 05:40:45 +00:00
Kimish Patel
4cb6add471 [PT2][Quant] Use module partition for fused patterns (#102394)
This diff introduces utility `find_sequential_partitions`.
This utility allows one to specify sequential pattern of
nn.Module/nn.functional and returns a list. Each item in the list contains a
List[SourcePartition] that represents sequentially connected partitions that
are of the pattern requested.
For example `find_sequential_partitions(model, [nn.Conv2d, nn.ReLU])` will find
all nn.Conv2d and nn.ReLU partitions that are sequentially connected.

Furthmore, move to using `find_sequential_partitions` for conv_bn/conv_bn_relu
for QAT.

Differential Revision: [D45948057](https://our.internmc.facebook.com/intern/diff/D45948057/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D45948057/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102394
Approved by: https://github.com/jerryzh168
2023-05-28 05:29:16 +00:00
Jerry Zhang
eda5abf5e0 [quant][pt2e] Fix propagate_annotation after recent refactors (#102422)
Summary:
Recently we changed the annotation from "target_dtype_info" to "quantization_annotation" and introduced QuantizationAnnotation API
and SharedQuantizationSpec API for users to convey sharing between input/outputs, this PR updates the _propagate_annotation
pass to accommadate the recent changes

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
```

Reviewed By: kimishpatel

Differential Revision: D46153084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102422
Approved by: https://github.com/kimishpatel
2023-05-27 16:01:47 +00:00
Jerry Zhang
23223402eb [quant][pt2e] Add Support for DerivedQuantizationSpec (#102282)
Summary:
```
"""
4. DerivedQuantizationSpec
this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors
"""

class DerivedQuantizationSpec(QuantizationSpecBase):
    # specifies which Tensors the quantization parameters are derived from
    # this can either be an edge from argument to node, or a node
    derived_from: List[EdgeOrNode]
    derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]]
     ...
```

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D46097855

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282
Approved by: https://github.com/andrewor14
2023-05-27 00:24:39 +00:00
Jerry Zhang
ed87508b32 [quant][pt2e] Add support for SharedQuantizationSpec (#102184)
Summary:
This PR adds support for SharedQuantizationSpec, it's used to express the sharing between
two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or
output of some node (expressed as an fx Node)

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Differential Revision: D46043026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184
Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel
2023-05-25 17:31:59 +00:00
Riley Dulin
424c930f76 Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926)
Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules
nn.PixelShuffle and nn.PixelUnshuffle.

torch.nn.PixelUnshuffle accepts both float and quantized inputs.
However, previously we would unnecessarily dequantize quantized inputs into floats
before passing them to the function. This commit fixes this by lowering the pattern
[dequant - PixelShuffle - quant].
[dequant - PixelUnshuffle - quant].

Test Plan:

python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926
Approved by: https://github.com/jerryzh168
2023-05-24 19:33:26 +00:00
Jerry Zhang
3baa67caee [quant][pt2e][be] Move annotate helper function to quantizer/utils.py (#102127)
Summary: att

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D46001285

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102127
Approved by: https://github.com/kimishpatel
2023-05-24 16:13:28 +00:00
Matthew Hoffman
29da75cc55 Enable mypy allow redefinition (#102046)
Related #101528

I tried to enable this in another PR but it uncovered a bunch of type errors: https://github.com/pytorch/pytorch/actions/runs/4999748262/jobs/8956555243?pr=101528#step:10:1305

The goal of this PR is to fix these errors.

---

This PR enables [allow_redefinition = True](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern:

> Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition.

`allow_redefinition` allows mypy to be more flexible by allowing reassignment to an existing variable with a different type... for instance (from the linked PR):

4a1e9230ba/torch/nn/parallel/data_parallel.py (L213)

A `Sequence[Union[int, torch.device]]` is narrowed to `Sequence[int]` thru reassignment to the same variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102046
Approved by: https://github.com/ezyang
2023-05-24 07:05:30 +00:00
Jerry Zhang
94ed26d177 [quant][pt2e] prepare_pt2e use quantization spec directly (#102054)
Summary:
In this PR we aligned with the design of annotation API and uses quantization spec directly for annotation.
main change is in prepare, we consume quantization_spec object directly instead of the observer or fake quant constructor, we create the constructor
inside prepare, and annotation api users only need to interact with quantization spec object after this PR

Test Plan:
```
buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D45934088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102054
Approved by: https://github.com/kimishpatel
2023-05-23 23:25:56 +00:00
Jerry Zhang
f7c736e1e7 [quant][pt2e] Add observer_or_fake_quant_ctr to QuantizationSpec (#101920)
Summary:
This is the second refactor to align the annotation API with design,
next step is to change prepare_pt2e to consume QuantizationSpec object directly

Test Plan:
```
buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D45927416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101920
Approved by: https://github.com/andrewor14
2023-05-23 05:48:23 +00:00
Jerry Zhang
15495f2d96 [quant][pt2e] Introduce QuantizationAnnotation API (#101708)
Summary:
This diff adds QuantizationAnnotation and also refactors the existing annotation to use this object

```
dataclass
class QuantizationAnnotation:
  # How some input nodes should be quantized, expressed as QuantizationSpec
  # a map from torch.fx.Node to QuantizationSpec
  input_qspec_map: Dict[Node, QuantizationSpec]

  # How the output of this node is quantized, expressed as QuantizationSPec
  output_qspec: QuantizationSpec

class QuantizationSpec:
    dtype: torch.dtype
    is_dynamic: bool = False
    quant_min: Optional[int] = None
    quant_max: Optional[int] = None
    qscheme: Optional[torch.qscheme] = None
    ch_axis: Optional[int] = None
    # TODO: follow up PR will add this
    # Kind of observer such as MinMaxObserver, PerChannelHistogramObserver etc.
    # observer_or_fake_quant_type: Union[ObserverBase, FakeQuantizeBase]
```

Example after full refactor:

```
int8_qspec = QuantizationSpec(dtype=torch.int8, ...)
weight_qspec = QuantizationSpec(dtype=torch.int8, ...)
conv_node["quantization_annotation"] = QuantizationAnnotation(
    input_qspec_map={input_node: int8_qspec, weight_node: weight_qspec}
    output_qspec=int8_qspec,
)
```

Note: right now input_qspec_map and output_qspec map are still using observer and fake quant constructors.
Follow up PR: change the input_qspec_map and output_qspec to use QuantizationSpec directly

Test Plan:
```
buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Differential Revision: D45895027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101708
Approved by: https://github.com/andrewor14
2023-05-19 22:54:27 +00:00
Nitin Jain
556bb691fd [AO]Fix observed LSTM layer setup individually observed LSTM (#101299)
Summary: We have found that `_get_lstm_with_individually_observed_parts()` is missing setup step which sets up the LSTM layer state initializing weights and biases of this layer. This diff fixes the observed numerical discrepancy seen by CTRL team in using the above API.

Test Plan: N3358643

Differential Revision: D45821681

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101299
Approved by: https://github.com/andrewor14
2023-05-18 19:15:01 +00:00
andrewor14
8e51521cee [quant][pt2] Handle maxpool + conv + bn case in prepare QAT (#100941)
Summary: This commit fixes a bug where we copy the metadata from
the wrong node after replace_pattern. This happened in the case
of [maxpool -> getitem1 -> conv -> bn -> getitem2], where
`getitem1` is the placeholder node fed into the fused conv + bn
pattern, and we incorrectly copied the metadata from `getitem1`
instead of from `getitem2`. We fix this bug by filtering out
the placeholder nodes before doing the metadata copying.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_getitem_placeholder

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45916751](https://our.internmc.facebook.com/intern/diff/D45916751)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100941
Approved by: https://github.com/jerryzh168
2023-05-17 17:36:32 +00:00
Kimish Patel
07e759eca2 [PT2][Quant] Move to module partitioner for linear pattern quantization (#101122)
Subgraph matcher is somewhat unreliable as the pattern can vary depending on
the dimensionality of input tensor used to trace _and_ what appears before
linear

Differential Revision: [D45713915](https://our.internmc.facebook.com/intern/diff/D45713915/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101122
Approved by: https://github.com/jerryzh168
2023-05-17 15:47:08 +00:00
Kimish Patel
2c807a4acf [PT2][Quant] Remove None annotations (#101120)
None annotations are not needed anymore. Remove them.

Differential Revision: [D45713917](https://our.internmc.facebook.com/intern/diff/D45713917/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101120
Approved by: https://github.com/jerryzh168
2023-05-17 14:38:34 +00:00
Angela Yi
9e023e1818 [fx] Better replacements finder in subgraph rewriter (#100556)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100556
Approved by: https://github.com/mcr229
2023-05-16 14:08:44 +00:00
andrewor14
964e61ee95 [quant][pt2] Handle no conv bias in prepare QAT fusion (#100610)
Summary: This commit adds support for conv + BN fusion for the
case where conv has no bias. Since the replacement patterns with
and without conv bias are substantially different, we perform the
replacement for each of these two cases separately.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45743510](https://our.internmc.facebook.com/intern/diff/D45743510)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100610
Approved by: https://github.com/jerryzh168
2023-05-16 04:05:53 +00:00
PyTorch MergeBot
13056ca229 Revert "[fx] Better replacements finder in subgraph rewriter (#100556)"
This reverts commit 9842d1ef94.

Reverted https://github.com/pytorch/pytorch/pull/100556 on behalf of https://github.com/izaitsevfb due to Reverting temporarily to unblock diff train, see D45743510 and #100610 ([comment](https://github.com/pytorch/pytorch/pull/100556#issuecomment-1548934932))
2023-05-16 03:50:06 +00:00
Angela Yi
9842d1ef94 [fx] Better replacements finder in subgraph rewriter (#100556)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100556
Approved by: https://github.com/mcr229
2023-05-15 20:00:59 +00:00
andrewor14
4434b9af6a [quant][pt2] Handle constant conv args in prepare QAT fusion (#100525)
Summary: Previously, we would only match and replace conv + BN
patterns with default constant args for conv (stride, padding,
dilation etc.). If the user sets one of these args to values
that are different from the default, we would simply not fuse
the pattern. This is due to a limitation in the subgraph
rewriter: see https://github.com/pytorch/pytorch/issues/100419.

This commit works around the above limitation by first
configuring the subgraph rewriter to ignore literals when
matching, and then manually copy over the constant args to the
new subgraph after `replace_pattern`.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_constant_args

Reviewers: jerryzh168, kimishpatel

Differential Revision: [D45515437](https://our.internmc.facebook.com/intern/diff/D45515437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100525
Approved by: https://github.com/jerryzh168
2023-05-12 19:15:47 +00:00
leslie-fang-intel
a66de845de [Quant][PT2E]Fix pt2e quantization maxpool input observer issue (#100961)
**Summary**
Fix the issue https://github.com/pytorch/pytorch/issues/100959. The root cause is for node of `torch.ops.aten.max_pool2d_with_indices.default`, there are 2 output node as output tensor and max indices. So in its `node.meta["val"]` is a tuple of `FakeTensors` (For example: `'val': (FakeTensor(..., size=(1, 2, s1, s1)), FakeTensor(..., size=(1, 2, s1, s1), dtype=torch.int64))`). It will fail the check  of inserting observer since which only accept one `FakeTensor` case.

**Test Plan**
```
python -m pytest test_quantize_pt2e.py -k test_max_pool2d_quantizer
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100961
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2023-05-11 06:14:34 +00:00
Jerry Zhang
058d740f59 [reland][quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005) (#101041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005

Previously the node annotation looks like the following:
```
node.meta["..."] = {
    "input_act_obs_or_fq_ctr": ...,
    "weight_obs_or_fq_ctr": ...,
    "weight_index": 1,
}
```
Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to:
```
node.meta["..."] = {
    "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...},
}
```
This can support specifying the observer/fake quant constructor for any argument of the node

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'

Differential Revision: D45719781

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101041
Approved by: https://github.com/andrewor14
2023-05-10 17:43:21 +00:00
PyTorch MergeBot
2241aaa60c Revert "[quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005)"
This reverts commit f08ddae888.

Reverted https://github.com/pytorch/pytorch/pull/101005 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/101005#issuecomment-1541143426))
2023-05-10 01:27:47 +00:00
Jerry Zhang
f08ddae888 [quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005)
Summary:
Previously the node annotation looks like the following:
```
node.meta["..."] = {
    "input_act_obs_or_fq_ctr": ...,
    "weight_obs_or_fq_ctr": ...,
    "weight_index": 1,
}
```
Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to:
```
node.meta["..."] = {
    "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...},
}
```
This can support specifying the observer/fake quant constructor for any argument of the node

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'

Reviewed By: kimishpatel

Differential Revision: D45553195

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005
Approved by: https://github.com/kimishpatel
2023-05-10 00:42:25 +00:00
Jerry Zhang
c3f3cb5b0f [quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442)
Summary:
This PR adds support for folding bn weights into conv for QAT flow, this is equivalent
to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223

Items that needs followup:
* there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: andrewor14

Differential Revision: D45344281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442
Approved by: https://github.com/kimishpatel
2023-05-09 19:43:51 +00:00
Aaron Gokaslan
8769fb854d [BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715)
Enables B027 and applies fixes by adding abstract method decorators. Autofix generated by ruff master.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100715
Approved by: https://github.com/ezyang
2023-05-09 17:28:48 +00:00
andrewor14
4154c8ea15 [quant][pt2] Add Conv + BN + ReLU fusion for prepare QAT (#100283)
Summary: This follows https://github.com/pytorch/pytorch/pull/98568,
which lays all the groundwork for Conv + BN fusion in prepare QAT.
Conv + BN + ReLU fusion can reuse the same match and replace
patterns and is handled similarly.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_fusion
python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_numerics

Reviewers: kimishpatel, jerryzh168

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D45515494](https://our.internmc.facebook.com/intern/diff/D45515494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100283
Approved by: https://github.com/jerryzh168
2023-05-07 20:35:16 +00:00
Danni Li
4a90deb137 [Doc] Add GRU new gate calculation difference (#100646)
Summary: Add a note for the calculation difference of GRU new gate `n_t` between PyTorch and original paper.

Fix: #99531

Test Plan: Please see GitHub pipelines.

Differential Revision: D45579790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100646
Approved by: https://github.com/mikaylagawarecki
2023-05-05 22:18:54 +00:00
Kimish Patel
24e9b8f5f4 [PT2E][Quant] Use subgraph matcher annotate linear pattern (#100566)
This diff adds subgraph matcher for pattern matching. Furthermore, we also move
annotations for the matched subgraph in a way that only input and output nodes
of the matched subgraph have quantization related valid annotations.

Differential Revision: [D45535539](https://our.internmc.facebook.com/intern/diff/D45535539/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100566
Approved by: https://github.com/jerryzh168
2023-05-04 21:31:59 +00:00
Richard Barnes
6370ac0251 [codemod] Replace hasattr with getattr in caffe2/torch/ao/quantization/stubs.py (#100597)
Summary:
The pattern
```
X.Y if hasattr(X, "Y") else Z
```
can be replaced with
```
getattr(X, "Y", Z)
```

The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate.

**This diff is very low risk. Green tests indicate that you can safely Accept & Ship.**

Test Plan: Sandcastle

Reviewed By: vkuzo

Differential Revision: D44886422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100597
Approved by: https://github.com/Skylion007
2023-05-04 16:36:23 +00:00
Richard Barnes
6120c5842c [codemod] Replace hasattr with getattr in caffe2/torch/ao/quantization/utils.py (#100361)
Summary:
The pattern
```
X.Y if hasattr(X, "Y") else Z
```
can be replaced with
```
getattr(X, "Y", Z)
```

The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate.

**This diff is very low risk. Green tests indicate that you can safely Accept & Ship.**

Test Plan: Sandcastle

Reviewed By: jerryzh168

Differential Revision: D44886493

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100361
Approved by: https://github.com/Skylion007
2023-05-04 14:46:38 +00:00
Kimish Patel
771a9debbe [PT2E][Quant] Refactor quantizer and qnnpack qantizer code to support dqlinear config (#99399)
This diff introduces a few refactors:

- Move observer creation to utils.py.
- Use quantization spec to supply args to observers.
- Use annotation function registration corresponding QuantizationConfig. This
  will be later used in dynamic quantized linear.

Differential Revision: [D45073790](https://our.internmc.facebook.com/intern/diff/D45073790/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99399
Approved by: https://github.com/jerryzh168
2023-05-03 03:23:32 +00:00
Kimish Patel
8ec0a939a2 [PT2E][Quant] Fix but in quant spec of symmetric static quant (#99398)
Activation quant spec should have qscheme = per_tensor_affine
Weights quant spec should have ch_axis=0 for per_channel_symmetric

Differential Revision: [D45073789](https://our.internmc.facebook.com/intern/diff/D45073789/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99398
Approved by: https://github.com/jerryzh168
2023-05-03 00:36:03 +00:00
Max Ren
151d76cc23
[quant][pt2e] remove dropout from fx quant
Differential Revision: D45250152nnPull Request resolved: https://github.com/pytorch/pytorch/pull/99935
2023-04-27 11:22:41 -07:00
andrewor14
6c550bb4d5 [quant][be] Easier way to override default in QConfigMapping (#99888)
Summary: This commit adds a private helper function to override
the default QConfig in the default QConfigMapping. Previously we
needed to override all the object_types manually while skipping
the fixed qparams ops. This led to duplicate code every time
someone wanted a new default QConfig. After this commit, we can
just call the same helper function instead.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99888
Approved by: https://github.com/vkuzo, https://github.com/jerryzh168
2023-04-26 18:14:01 +00:00
Jerry Zhang
df3455b716 [reland][quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220) (#99767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220

Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at

let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```

let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph

(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.

(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization

the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here

Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well

Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Imported from OSS

Differential Revision: D45198323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767
Approved by: https://github.com/kimishpatel
2023-04-25 16:53:02 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
PyTorch MergeBot
c83e1f517d Revert "Delete tracing_mode argument to export (#99555)"
This reverts commit e9786149ab.

Reverted https://github.com/pytorch/pytorch/pull/99555 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-24 08:21:41 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
maxren
e63c502baa [Executorch][XNNPACK] Quantized Max Pool 2d (#99587)
Adding support for Quantized Max Pool 2d

Additions:
- Add quantized max pool 2d to executorch backend config
- modify max pool node visitors to grab quant params from input/output
- Add qmaxpool 2d patterns for partitioners

Differential Revision: [D44977783](https://our.internmc.facebook.com/intern/diff/D44977783/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99587
Approved by: https://github.com/jerryzh168
2023-04-22 07:17:13 +00:00
maxren
a964a3dbed [quant][pt2e] add all convs-relu fusion qat configs (#99586)
Currently when prepare_qat_fx with executorch backend config we do not properly quantize conv or conv - relu

To fix this we add all the necessary qat configs for conv and conv-relu

Differential Revision: [D45135947](https://our.internmc.facebook.com/intern/diff/D45135947/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99586
Approved by: https://github.com/jerryzh168
2023-04-22 06:44:23 +00:00
maxren
c139dfd71e [quant][pt2e] add dropout to executorch backend config (#99585)
OD Model has a dropout layer in training, In order to match eager mode qat, we also fake quantize the drop out layer in prepare_qat_fx.

To do this we add the dropout layer to the default_op_configs in which the observation type uses a different observer from its input

Differential Revision: [D45095936](https://our.internmc.facebook.com/intern/diff/D45095936/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99585
Approved by: https://github.com/jerryzh168
2023-04-22 06:41:44 +00:00
PyTorch MergeBot
75e754800f Revert "[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)"
This reverts commit d56adb1b54.

Reverted https://github.com/pytorch/pytorch/pull/99220 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-04-21 18:04:21 +00:00
Jerry Zhang
d56adb1b54 [quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)
Summary:
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at

let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```

let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph

(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.

(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization

the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here

Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well

Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Approved by: https://github.com/kimishpatel
2023-04-21 16:58:35 +00:00
Edward Z. Yang
e9786149ab Delete tracing_mode argument to export (#99555)
You can have any color you want, as long as it's tracing_mode="symbolic"

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99555
Approved by: https://github.com/voznesenskym
2023-04-21 16:20:51 +00:00
andrewor14
22af604e1b [quant][pt2] Add Conv + BN fusion for prepare QAT (#98568)
**Summary:** This commit adds the `prepare_qat_pt2e` API and the
fusion logic for Conv + BN. We use the subgraph rewriter to
match and replace the pattern with the existing logic in
`nniqat.ConvBn2d`. Note this is not the end-to-end flow yet.
In particular, the convert flow needs to swap the new subgraph
with another one that merges the batchnorm stats back into conv.

The Conv + BN fusion is implemented in the following steps:

1. Annotate all nodes in the pattern `[conv - bn - getitem]`

2. Match and replace this pattern with the fused QAT pattern
   (note that this is a larger subgraph than the original one)

3. Copy over metadata from the original nodes to the
   corresponding nodes in the new subgraph, to ensure the
   stack traces and dtype annotations are preserved

4. Prepare will insert fake quantizes in the right places
   based on the annotations

**Test Plan:**
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion

**Reviewers:** jerryzh168, kimishpatel, yanboliang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98568
Approved by: https://github.com/kimishpatel
2023-04-20 20:15:28 +00:00
Jerry Zhang
36acad58b6 [quant][pt2e][refactor] Move the annotation for observer sharing ops into separate util (#99384)
Summary:
In order to keep quantizer simple, we want to move the annotation code for operators like flatten, hardtanh etc. to
a separate utility function that is called after the quantizer annotation is done, this makes these ops (operator list) not
configurable by user, and also makes prepare_pt2e operator aware instead of operator agnostic, this design is not final,
we may change it in the future if we find there are use cases that need these to be configurable or if we feel it is important for prepare_pt2e
to stay agnostic to operator/operator patterns

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_obs_sharing_ops

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D45071006](https://our.internmc.facebook.com/intern/diff/D45071006)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99384
Approved by: https://github.com/kimishpatel
2023-04-19 23:49:33 +00:00
Nikita Shulga
8a89eec2f8 [BE] Do not use unicode quotes (#99446)
They are mostly used in commented code examples, but even Python-3.12
does not recognize `“foobar”` as valid string literal

I.e. just `s/[“”]/"/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99446
Approved by: https://github.com/huydhn, https://github.com/ezyang
2023-04-18 22:59:56 +00:00
Kimish Patel
c0be06667f [PT2E][Quant] Support for embedding op quantization via
ExecuTorchNativeQuantizer (#99106)

ExecuTorchNativeQuantizer

ExecuTorchNativeQuantizer is a terribly name, I admit, however lets fix it once
we align on what the quantized kernel lib within executorch runtime should be called

Differential Revision: [D44986258](https://our.internmc.facebook.com/intern/diff/D44986258/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44986258/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99106
Approved by: https://github.com/jerryzh168
2023-04-18 16:59:37 +00:00
maxren
80eab63587 [Quant][pt2e] torch.mean and ReLU6 (#98984)
Add nn.Module ReLU6 in addition to functional relu6.

Also add torch .mean to quantization config

Differential Revision: [D44901038](https://our.internmc.facebook.com/intern/diff/D44901038/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98984
Approved by: https://github.com/jerryzh168
2023-04-17 18:33:04 +00:00
maxren
444a9769ae [quant][pt2e] QAT Linear (#98897)
Differential Revision: [D44901039](https://our.internmc.facebook.com/intern/diff/D44901039/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98897
Approved by: https://github.com/tiandiao123, https://github.com/manuelcandales
2023-04-17 18:27:39 +00:00
maxren
568935caca [quant][pt2e] QAT conv + bn + relu (#98896)
Differential Revision: [D44901040](https://our.internmc.facebook.com/intern/diff/D44901040/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98896
Approved by: https://github.com/manuelcandales
2023-04-17 18:24:08 +00:00
Kimish Patel
cdab6c8df9 [PT2E][Quant] Support specifying None for obs_or_fq_ctr in target_dtype_info (#99071)
It is cleaner for quantizer to say what does not need observation instead of
putting fp32 observers. This diff add support for that by checking if
target_dtype_info contains none for specific observers and if so skip inserting
observers for those.

Differential Revision: [D44971357](https://our.internmc.facebook.com/intern/diff/D44971357/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99071
Approved by: https://github.com/jerryzh168
2023-04-17 16:37:16 +00:00
Kimish Patel
36a95625da [PT2E][Quant][BE] Refactor observer code (#99066)
Combine per channel and per tensor observer code

Differential Revision: [D44918494](https://our.internmc.facebook.com/intern/diff/D44918494/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99066
Approved by: https://github.com/jerryzh168
2023-04-17 16:17:36 +00:00
Kimish Patel
31f311a816 [PT2E][Quantization] Refactor Quantizer and QNNPACKQuantizer (#99063)
This diff renames quantization spec/config and operator config. It moves these
datastructures to base quantizer.
Base quantizer API now has get_supported_operators that returns list of
patterns that a quantizer quantizes.
There are two choices being debated for how to convey to user what a particular
quantizer will quantize.

1. Modules. We just convey what nn.Modules will be quantized. Of course that
does not mean that equivalent functional variants wont be quantized, however
for simplifity we just use nn.Module. If certain ops are quatnzied in fused
manner then that will considered internal details. Pros and cons of this
approach
pros:
  - Simple. Only nn Modules are listed.
  - User does not have to see fusion patterns.
Cons:
  - confusing perhaps because it is not clear if supported = nn.Conv2d also
    means that the quantizer supported functional.conv2d
  - Hiding fusion pattern means user has no say in not fusing. Meaning if
    conv2d + relu is fused and user configures to quantize only conv, quantizer
    will also quantize the following relu as if conv2d + relu are fused.

2. Patterns. Be explicit about what is supported and enumerate all possible
compbinations.
Pros:
  - it is very clear what quantizer will do. no surprises.
Cons:
  - It is not simple to parse.
  - It can be argued taht fusion is internal detail of the quantizer. So some
    quantizer implementation may chose to expose fusion patterns, while others
    may not and may not even provide any configurability.

One option is to move set_supported_operators/modules out of base quantizer and
let each quantizer define its own way of communicating what is supported. Issue
with this is that when we want to "Compose" multiple quantizers there is no way
for user to define the order of composition if user does not know what a
quantizer supports. For exampl quantizer A may quantizer conv + relu while B
only conv, but B's implementation is fast. In that case you may compose (B, A)
such B quantizes conv and A quantizes relu. Not knowning what A
and B support, makes such composition harder

Differential Revision: [D44895547](https://our.internmc.facebook.com/intern/diff/D44895547/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44895547/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99063
Approved by: https://github.com/jerryzh168
2023-04-17 00:34:18 +00:00
Aaron Gokaslan
85f38b8a33 [BE] Update flake8-comprehensions and adapt to rule C418 (#99178)
Applies rule C418 and fixes all instances of it. Also updates flake8-comprehension

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99178
Approved by: https://github.com/ezyang
2023-04-15 15:33:42 +00:00
Sudarshan Raghunathan
e45fa1a581 Back out "[core][pruning][be] rename BaseSparsifier to BasePruner (#98747)" (#99171)
Summary: Back out D44856390 since renaming the type breaks backwards compatibility of existing models used in integration tests and likely in prod as well.

Test Plan:
buck2 run //aiplatform/modelstore/model_generation/integration_tests:cogwheel_igr_tab_offline_and_recurring_model_generation_v1_api_test-launcher -- --build-fbpkg --run-disabled --run-harness-in-tupperware

Now fails with an OOM: https://www.internalfb.com/servicelab/experiment/100000000259121/trial/100000000331723/run

It was failing with an import error without this revert.

Differential Revision: D44991351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99171
Approved by: https://github.com/izaitsevfb, https://github.com/osalpekar
2023-04-15 00:37:45 +00:00
Jerry Zhang
09ebdf44fa [quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903)
Summary:
Fixed quant_min/quant_max for per channel quantized weight for reference quantized module in decomposed mode,
this bug is triggered while onboard an internal model

Test Plan:
python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx_per_channel_quant_module

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98903
Approved by: https://github.com/andrewor14
2023-04-13 21:55:45 +00:00
PyTorch MergeBot
dda7ce4bb3 Revert "[core][pruning][be] Rename sparsifier folder to pruner (#98758)"
This reverts commit 778fd1922a.

Reverted https://github.com/pytorch/pytorch/pull/98758 on behalf of https://github.com/jcaip due to https://www.internalfb.com/diff/D44905951 need to fix broken import in fbcode
2023-04-13 16:30:47 +00:00
Jerry Zhang
6a568779b6 [quant][pt2e][improvement] Remove the need to annotate all nodes with default annotation (#99001)
Summary:
This PR changes prepare to use some default observer/fq constructor when "target_dtype_info" is not set, this allows user to not initialize all nodes to default
observer/fq constructor. Note we may still need to annotate intermediate node after this PR, there will be a follow up PR to allow users to only annotate things they
want to quantize

Test Plan:
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99001
Approved by: https://github.com/kimishpatel, https://github.com/andrewor14
2023-04-13 09:31:51 +00:00
PyTorch MergeBot
46a31e9bab Revert "[quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903)"
This reverts commit a2e809f29b.

Reverted https://github.com/pytorch/pytorch/pull/98903 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks Windows tests on trunk a2e809f29b
2023-04-13 01:58:27 +00:00
Jerry Zhang
a2e809f29b [quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903)
Summary:
Fixed quant_min/quant_max for per channel quantized weight for reference quantized module in decomposed mode,
this bug is triggered while onboard an internal model

Test Plan:
python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx_per_channel_quant_module

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98903
Approved by: https://github.com/andrewor14
2023-04-12 22:35:24 +00:00
Wyatt Borsos
6361c3debc Return zero_point from determine_qparams as a int64 (#98746)
Summary:
In some cases, zero_point is returned as an int tensor. We want it to be a long.

This fixes a failed assertion in Executorch op_choose_qparams:
https://www.internalfb.com/code/fbsource/[4609e7dbbf2e]/fbcode/executorch/kernels/quantized/cpu/op_choose_qparams.cpp?lines=49-52

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D44764070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98746
Approved by: https://github.com/jerryzh168
2023-04-11 19:01:05 +00:00
Jesse Cai
778fd1922a [core][pruning][be] Rename sparsifier folder to pruner (#98758)
Summary:
att

Test Plan:
```
python test/test_ao_sparsity.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98758
Approved by: https://github.com/jerryzh168
2023-04-11 17:26:29 +00:00
Kazuaki Ishizaki
a13a63ae9a Fix typos under torch/ao directory (#97679)
This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-04-10 22:25:15 +00:00
Jesse Cai
4584851da5 [core][pruning][be] rename BaseSparsifier to BasePruner (#98747)
Summary:

att

Test Plan:
`python test/test_ao_sparsity.py -- TestBasePruner`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98747
Approved by: https://github.com/jerryzh168
2023-04-10 21:25:19 +00:00
Edward Z. Yang
b09722f540 Convert logging f-strings to use % format, part two (#98700)
This hits multi-line logging strings

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Jerry Zhang
c5269ad6c6 [quant][pt2e] Add support for a few ops in QNNPackQuantizer to enable quantizing internal model (#98560)
Summary:
This PR adds support for adaptive_avg_pool2d (traced as mean.dim), mean and hardtanh to QNNPackQuantizer

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_obs_sharing_ops

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98560
Approved by: https://github.com/andrewor14
2023-04-07 19:26:45 +00:00
maxren
483fd3351a [Quant] Add get_symmetric_qnnpack_qat_qconfig_mapping (#98569)
Differential Revision: [D44776230](https://our.internmc.facebook.com/intern/diff/D44776230/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98569
Approved by: https://github.com/andrewor14
2023-04-07 17:57:56 +00:00
Jerry Zhang
616f50da3a [quant][pt2e] QNNPackQuantizer support annotation for resnet18 (#98507)
Summary:
This PR adds annotation support for conv2d relu, linear, maxpool2d, add and add relu so
that we can successfully quantize resnet18 with the prepare_pt2e_quantizer API and get the same result
as fx graph mode quantization

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels.test_resnet18_with_quantizer_api

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98507
Approved by: https://github.com/vkuzo
2023-04-07 04:27:21 +00:00
Kazuaki Ishizaki
482f87a7bc [quantized] Fix return values of _get_name() in quantized ConvTranspose (#97678)
This PR fixes incorrect return values of _get_name() in quantized `ConvTranspose?d`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97678
Approved by: https://github.com/vkuzo, https://github.com/kit1980
2023-04-07 01:14:42 +00:00
Jerry Zhang
3142ce208f [quant][pt2e] Support quantizer API in prepare_pt2e_quantizer (#97994)
Summary:
This PR added a quantizer API to prepare_pt2e_quantizer, which enables user to annotate the nodes in the graph
directly to configure quantization, instead of relying on QConfigMapping, please see test cases in
test_quantize_pt2e.py for examples. Also added a prototype for QNNPackQuantizer, that will be modified later
to fully support different quantization capabilities of QNNPack/XNNPack

The goal for introducing quantizer is to add flexibility to the quantization API to allow modeling users and backend developers to express their quantization intentions programmably, which will free architecture optimization team from supporting different use cases in the core API in the future, as a concrete example, we used to have https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping as the API for users to express their intent for quantization in fx graph mode quantization, and it has some fancy options like `set_module_name_regex` and `set_module_name_object_type_order`, this is not needed for all backends and adds burden of maintenance to AO team, in the quantizer API we will move these APIs to a backend specific `Quantizer` that needs this feature, and all the backends or even advanced modeling users can implement their own quantizer to express their intent for quantization through annotating the nodes, for example, to express the quantization intention of quantizing a convolution node, a user will find the convolution node in the graph and do:
```
operator_spec = qnnpack_quantizer.get_default_per_channel_symmetric_qnnpack_operator_spec()
conv_node.meta["target_dtype_info"] = {
    "input_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec),
    "weight_obs_or_fq_ctr": _get_weight_obs_or_fq_ctr(operator_spec)
    "bias_obs_or_fq_ctr": _get_bias_obs_or_fq_ctr(operator_spec),
    "output_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec),
    # TODO: validation of weight_index must be set if weight_obs_or_fq_ctr is set
    "weight_index": 1,
    # TODO: validation of bias_index must be set if bias_obs_or_fq_ctr is set
    "bias_index": 2,
}
```
each backend will introduce their own quantizer, e.g. QNNPackQuantizer, which may expose more convenient APIs for modeling users to configure the annotation, and different quantizer can compose with each other to annotate the graph correctly for quantization.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_simple_quantizer
python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_conv

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97994
Approved by: https://github.com/vkuzo
2023-04-06 11:34:10 +00:00
Jerry Zhang
a76114832a [quant][pt2e][fix] Fix the internal test failures caused by refactor (#98378)
Summary: att, this PR removes some incorrect assumptions from `_maybe_insert_observers_before_graph_output`

Test Plan:
internal test

Differential Revision: D44697212

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98378
Approved by: https://github.com/andrewor14
2023-04-05 23:27:34 +00:00
Jesse Cai
93063768da [pruning][core][feature] Implement convert for pruner (#97545)
Summary:

This PR implements `BaseSparsifier.convert()`, which performs module swapping.
The modules and mappings will be merged in a future PR.

Test Plan:
`python test/test_ao_sparsity.py -- TestBaseSparsifier.test_convert`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97545
Approved by: https://github.com/jerryzh168
2023-04-05 16:57:11 +00:00
Tugsbayasgalan Manlaibaatar
75ac6fdcdd Propogate dynamo shape_env to make_fx (#96437)
Currently, when we use assume_static_by_default flag, dynamo won't produce any symbols for input tensors. But when we pass the dynamo generated graph onto make_fx via torchdynamo.export(aten_graph=True), there is no way to pass this flag. We enable this by directly passing the fake tensors dynamo used to make_fx and call make_fx with "real" mode with fake tensors from dynamo.

Note that this is modified version of (https://github.com/pytorch/pytorch/pull/96143)

Differential Revision: [D44561753](https://our.internmc.facebook.com/intern/diff/D44561753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96437
Approved by: https://github.com/jansel, https://github.com/ezyang
2023-04-04 20:37:30 +00:00
Jerry Zhang
b109083098 [quant][pt2e][refactor] Remove backend_config from _maybe_insert_input_observers_for_node (#98094)
Summary:
The goal is to remove the need to use backend_config when pt2e flow code call this function

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98094
Approved by: https://github.com/jcaip
2023-04-04 03:18:24 +00:00
Jerry Zhang
553bb01df9 [quant][pt2e][refactor] Remove extra arguments of _maybe_insert_observers_before_graph_output (#98029)
Summary:
This PR allows _maybe_insert_observers_before_graph_output to be reused by pt2e flow

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98029
Approved by: https://github.com/vkuzo
2023-04-01 05:38:36 +00:00
Jerry Zhang
7dde61ce46 [quant][pt2e][refactor] Remove extra arguments of _maybe_insert_output_observer_for_node (#97959)
Summary:
The goal is for this function to be reused by the pt2e flow

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97959
Approved by: https://github.com/andrewor14
2023-03-31 23:59:43 +00:00
Jesse Cai
d158545b16 [pruning] Add gelu to list of supported activation functions (#95618)
Summary:

This PR adds nn.GELU and F.gelu respectively to the list of suppported
activation functions

Test Plan:
```
python test/test_ao_sparsity.py -- TestBaseSparsifier
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95618
Approved by: https://github.com/andrewor14
2023-03-31 19:55:12 +00:00
Jerry Zhang
1c21cd2213 [quant][pt2e][refactor] Add input_output_share_observers to node.meta["target_dtype_info"] (#97949)
Summary:
The goal for this PR is to unify the flow of information to reduce fragmentation of implementations between fx graph mode quantization
and quantize_pt2e, since quantize_pt2e will be using node.meta to store this information, we'd like to make sure fx graph mode quantization
get this information from the same place

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97949
Approved by: https://github.com/andrewor14
2023-03-31 15:54:19 +00:00
Xia, Weiwen
e073979794 [Quant][FX] Add test case for lowering conv_transpose with kwargs (#97311)
**Summary**
As the title

**Test plan**
python test/test_quantization.py -k test_lowering_functional_conv_transpose_with_kwargs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97311
Approved by: https://github.com/jerryzh168
2023-03-31 10:39:29 +00:00
Xia, Weiwen
e61b842001 [Quant][FX] lower functional conv_transpose ops (#97126)
**Summary**
Support quantizing and lowering functional `conv_transpose1d`, `conv_transpose2d` and `conv_transpose3d`.
Please note that
- `conv_tranpose + relu` fusion is not supported. Remember to keep `relu` node in graph when lowering.
- `conv_tranpose` requires `per-tensor` scheme for weight. Use default `qconfig_mappings` instead of deprecated `qconfig_dict` for test cases.

**Test plan**
python test/test_quantization.py -k test_conv_transpose_not_reference
python test/test_quantization.py -k test_conv_transpose_reference
python test/test_quantization.py -k test_conv_transpose_relu_not_reference
python test/test_quantization.py -k test_conv_transpose_relu_reference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97126
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-03-31 07:17:29 +00:00
maxren
3a5ca4bdd4 [quant][pt2e] Add support for conv bn fusion in et backend config (#97389)
Batch Norm was supported by XNNPACK via fusion with the preceding convolution op. We do the same here by fusing across q -> dq nodes.

We must update the original pass in order to fuse convolution weight/bias with batch norm parameters, this way quantization is supported for batch norm

Differential Revision: [D43976324](https://our.internmc.facebook.com/intern/diff/D43976324/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97389
Approved by: https://github.com/salilsdesai
2023-03-31 05:33:42 +00:00
maxren
fe2bdfb2cd [Executorch][XNNPACK] Quantized mean (#97388)
Support Quantized Mean.dim for xnnpack

Adding another pattern for Quantized Partitioner and test to ensure quantized operator works

Differential Revision: [D43915706](https://our.internmc.facebook.com/intern/diff/D43915706/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97388
Approved by: https://github.com/salilsdesai
2023-03-31 05:08:53 +00:00
Jerry Zhang
f78b44b2d9 [quant][pt2e][refactor] Refactor prepare to remove the use of qconfig in _maybe_insert_input_observer_for_arg_or_kwarg (#97948)
Summary:
The goal is for this function to be reused by quantize_pt2e

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D44558929](https://our.internmc.facebook.com/intern/diff/D44558929)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97948
Approved by: https://github.com/andrewor14
2023-03-31 05:07:58 +00:00
maxren
f9ca48ddb5 [Executorch][XNNPACK] Quantized hardtanh (#97387)
Lower Quantized Hardtanh to XNNPACK

Also add symmetric quantization support for hardtanh in executorch backend config

Differential Revision: [D43901222](https://our.internmc.facebook.com/intern/diff/D43901222/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97387
Approved by: https://github.com/salilsdesai
2023-03-31 04:58:24 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
Jerry Zhang
15271d353a [quant][pt2e] Support convtranspose + bn fusion (#97933)
Summary:
This PR extends `_fuse_conv_bn_` function to support fusing convtranspose and bn

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_transposed_conv_bn_fusion

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97933
Approved by: https://github.com/vkuzo
2023-03-30 07:02:39 +00:00
PyTorch MergeBot
8e5c5d2023 Revert "Propogate dynamo shape_env to make_fx (#96437)"
This reverts commit 3a22916c7a.

Reverted https://github.com/pytorch/pytorch/pull/96437 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-03-29 23:47:59 +00:00
Tugsbayasgalan Manlaibaatar
3a22916c7a Propogate dynamo shape_env to make_fx (#96437)
Currently, when we use assume_static_by_default flag, dynamo won't produce any symbols for input tensors. But when we pass the dynamo generated graph onto make_fx via torchdynamo.export(aten_graph=True), there is no way to pass this flag. We enable this by directly passing the fake tensors dynamo used to make_fx and call make_fx with "real" mode with fake tensors from dynamo.

Note that this is modified version of (https://github.com/pytorch/pytorch/pull/96143)

Differential Revision: [D43994693](https://our.internmc.facebook.com/intern/diff/D43994693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96437
Approved by: https://github.com/jansel, https://github.com/ezyang
2023-03-29 22:34:37 +00:00
Aaron Gokaslan
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
Xia, Weiwen
08766b23de [Quant][FX] lower ConvTranspose3d (#97125)
**Summary**
Enable quantization and lowering of `ConvTranspose3d`.
Add test cases for `ConvTranspose1d`, `ConvTranspose2d` and `ConvTranspose3d` since there were no such test cases.

**Test plan**
python test/test_quantization.py -k test_conv_transpose_not_reference
python test/test_quantization.py -k test_conv_transpose_reference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97125
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-03-28 11:58:29 +00:00
leslie-fang-intel
a6d8c70933 Init quantization backend config for inductor (#96476)
**Summary**
Init the backend config file with quantization recipes for quantization 2.0 inductor path. In this PR, we only init the recipe for `convolution` and `convolution_relu`.

**Test Plan**
```
clear && python -m pytest test_quantization.py -k test_inductor_backend_config_conv
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96476
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jerryzh168
2023-03-22 07:56:56 +00:00
Xia, Weiwen
e8be6d813b [Quant][FX] Fix issue of lowering weighted functional ops with kwargs (#95865)
Fixes #95492

**Summary**
This PR fixes the issue that weighted functional ops with kwargs are not lowered correctly since kwargs are ignored.
These kwargs should be moved from the functional op to its cooresponding prepack op, e.g., from `F.conv2d` to `quantized.conv2d_prepack`.

**Test plan**
python test/test_quantization.py -k test_lowering_functional_conv_with_kwargs
python test/test_quantization.py -k test_lowering_functional_conv_transpose_with_kwargs
python test/test_quantization.py -k test_lowering_functional_linear_with_kwargs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95865
Approved by: https://github.com/jgong5, https://github.com/supriyar
2023-03-21 05:29:03 +00:00
Nitin Jain
40df3b41aa [AO] Update qLSTM implementation to remove unsupported backend ops (#96436)
Summary:
The reference quantized LSTM implementation uses unbind and inplace squeeze both of which are not supported when building BoltNN's Espresso IR graph.

This change adjusts the reference AO Quantizable LSTM implementation without affecting numerically while enabling removal of unsupported ops in BoltNN.

Modifications & Adjustments
1. Unbind ops appear when unstacking tensor in loop. Replaced this by getting first dim from shape and looping using ranged index.
2. Removed unbind ops call where the pattern is
`[x = t.unbind(0) -> x[i]]` can be just replaced by `t[i]` as creating a tuple from unbind is unnecessary.
3. inplace squeeze `squeeze_` uses which were not required has been replaced by `squeeze`.

See notebook N3235193 which was used for testing quantization flow and inspect the torch scripted quantized model for the set of ops used(See last cell).

Test Plan: N3235193

Reviewed By: andrewor14

Differential Revision: D43935389

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96436
Approved by: https://github.com/andrewor14
2023-03-14 17:58:34 +00:00
andrewor14
ca7e53324f [Quant][fx] Remove unused is_qat args in prepare_fx (#96631)
Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: vkuzo, jcaip

Subscribers: vkuzo, jcaip
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96631
Approved by: https://github.com/vkuzo
2023-03-14 00:33:18 +00:00
chenxujun
6a492908cc Update conv_fused.py (#95551)
Fix typos in conv_fused.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95551
Approved by: https://github.com/Skylion007, https://github.com/kit1980, https://github.com/malfet
2023-03-13 23:42:34 +00:00
yiliu30
2ea0cb1207 Fix the typo for the docstring of args in the observer (#95887)
This PR fixes the typo in `torch.ao.quantization.observer.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95887
Approved by: https://github.com/kit1980
2023-03-13 23:03:57 +00:00
Vasiliy Kuznetsov
cdab1d676c pt2e short term quant: respect qmin/qmax for linear weight (#96232)
Summary:

Makes the `nnqr.Linear` module respect the qmin/qmax attributes of weight observer.  This is to unblock some customer teams who are depending on non-default values of these attributes.

Test plan:

```
python test/test_quantization.py -k TestReferenceQuantizedModule.test_linear_decomposed
```

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96232
Approved by: https://github.com/andrewor14
2023-03-10 04:46:20 +00:00
andrewor14
faa4cb29b2 [Quant][fx] Create new FX-based LSTM reference module (#96343)
Summary: The previous LSTM reference module implementation did
not handle dtypes other than quint8 correctly. This is because
the internal LSTM custom module quantization used eager mode,
which did not insert the q-dq ops properly. E.g., we want the
following reference quantized model:

```
[dq -> linear1_fp32 -> q_to_qint32] -> dq -> q_to_quint8 ->
  [dq - linear2_fp32 -> q_to_quint8] -> dq -> ...
```

This requires two sets of `q - dq` pairs between two adjacent
ops that have different dtypes (linear1 and linear2). However,
these `q - dq` pairs were not inserted in the old flow, because
eager mode required users to insert Quant/DeQuantStubs manually.

This commit changes the internal LSTM custom module quantization
to use FX graph mode quantization, which automatically inserts
the `q - dq` ops that convert the dtypes between adjacent ops
correctly. However, using FX graph mode quantization here comes
with its own set of challenges that required some hacks to get
the end-to-end flow to work. These hacks are detailed in the
comments in the util functions.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams

This commit also updates the corresponding test to verify the
dtypes as well as the qparams in the reference quantized graph.
This test case should serve as an example for users to set up
their own LSTM reference module flows.

Reviewers: vkuzo, supriyar, jcaip

Subscribers: vkuzo, supriyar, jcaip
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96343
Approved by: https://github.com/vkuzo
2023-03-09 23:23:48 +00:00
Jiaxu Zhu
08fb13db65 [Quant] Add lowering for pixel_unshuffle/narrow (#96160)
Summary:
## Summary
torch.nn.functional.pixel_unshuffle and torch.narrow accepts both float
and quantized inputs. However, previously we would unnecessarily
dequantize quantized inputs into floats before passing them to
the function. This commit fixes this by lowering the pattern
[dequant - pixel_unshuffle - quant].
[dequant - narrow - quant].

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle
```

```
python test/test_quantization.py TestQuantizeFxOps.test_narrow
```

Differential Revision: D43858199

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96160
Approved by: https://github.com/andrewor14
2023-03-08 05:25:03 +00:00