Commit Graph

146 Commits

Author SHA1 Message Date
Xuehai Pan
30293319a8 [BE][Easy][19/19] enforce style for empty lines in import segments in torch/[o-z]*/ (#129771)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129771
Approved by: https://github.com/justinchuby, https://github.com/janeyx99
2024-08-01 17:07:14 +00:00
Edward Z. Yang
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
Vasiliy Kuznetsov
f15ab8a7f2 AO migration: replace torch internal callsites (#94170)
Summary:

Do the following renames:
`torch.quantization` -> `torch.ao.quantization`
`torch.nn.quantized` -> `torch.ao.nn.quantized`
`torch.nn.quantizable` -> `torch.ao.nn.quantizable`
`torch.nn.qat` -> `torch.ao.nn.qat`
`torch.nn.intrinsic` -> `torch.ao.nn.intrinsic`

And then, do
`torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974

Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace.

Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82

This is for https://github.com/pytorch/pytorch/issues/81667

Test plan: CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170
Approved by: https://github.com/jerryzh168
2023-02-07 02:32:23 +00:00
andrewor14
d80056312a [Quant][fx][bc-breaking] Rename fx/*patterns.py (#89872)
Summary: This commit renames fx/quantization_patterns.py
to fx/quantize_handler.py, and fx/fusion_patterns.py to
fx/fuse_handler.py. This is because these files contain
only QuantizeHandler and FuseHandler respectively, so the
new names are more descriptive. A future commit will
further break BC by removing all the empty *QuantizeHandler
classes.

BC-breaking notes:

The following classes under the
`torch.ao.quantization.fx.quantization_patterns` namespace
are migrated to the `torch.ao.quantization.fx.quantize_handler`
namespace:
```
QuantizeHandler
BinaryOpQuantizeHandler
CatQuantizeHandler
ConvReluQuantizeHandler
LinearReLUQuantizeHandler
BatchNormQuantizeHandler
EmbeddingQuantizeHandler
RNNDynamicQuantizeHandler
DefaultNodeQuantizeHandler
FixedQParamsOpQuantizeHandler
CopyNodeQuantizeHandler
GeneralTensorShapeOpQuantizeHandler
CustomModuleQuantizeHandler
StandaloneModuleQuantizeHandler
```

The following classes under the
`torch.ao.quantization.fx.fusion_patterns` namespace are
migrated to the `torch.ao.quantization.fx.fuse_handler`
namespace:
```
DefaultFuseHandler
FuseHandler
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89872
Approved by: https://github.com/jerryzh168
2022-12-01 17:37:07 +00:00
Jerry Zhang
74454bdb46 [quant][fx] Move backend_config folder to torch.ao.quantization
Summary:
Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented
the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different
workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that
it can be shared by different workflows
Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py)

Test Plan:
python test/teset_quantization.py TestQuantizeFx
python test/teset_quantization.py TestQuantizeFxOps
python test/teset_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestAOMigrationQuantization
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823

Approved by: https://github.com/vkuzo
2022-04-19 15:38:57 +00:00
Jerry Zhang
508845f2b5 [quant] AO migration of the torch/quantization/quantize_fx.py and torch/quantization/fx/* (#65033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033

1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: vkuzo, z-a-f

Differential Revision: D30949749

fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
2021-09-22 09:29:15 -07:00
Jerry Zhang
14347d0dd5 [quant][fx][graphmode] Fix a bug for sub (#65109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109

Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler,
however this is incorrect, the dtype for sub is decided by whether the output is quantized or not,
so we added a check of is_output_quantized while deciding the dtype for the output of sub.

Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30977826

fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a
2021-09-20 10:36:09 -07:00
Jerry Zhang
670853295a [quant][tensorrt] Add tensorrt backend config (#64623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623

The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation

Test Plan:
python torch/fx/experimental/fx2trt/example/unittests.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30800474

fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2
2021-09-14 15:27:33 -07:00
Jerry Zhang
d4a86c1f3b [quant][fx2trt] Add lowering support for reference linear/conv modules (#64368)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64368

Test Plan:
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30708738

fbshipit-source-id: 88142b7ce43ed96093597112dab03a2d277de993
2021-09-10 22:25:27 -07:00
Jerry Zhang
ef2c9d7d8a [quant][fix] Fix quantization for sub_scalar (#64603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64603

We'll insert observer only when both the operator and dtype is supported

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30797025

fbshipit-source-id: a77c21e2749405534fc245374cf33a0657a3d2c8
2021-09-09 17:18:31 -07:00
Zafar Takhirov
9cc44aad21 [quant] AO migration of the quantize.py (resubmission) (#64445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: HDCharles

Differential Revision: D30734870

fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
2021-09-08 04:58:47 -07:00
Zafar Takhirov
046ed57a4d Revert D30055886: [quant] AO migration of the quantize.py
Test Plan: revert-hammer

Differential Revision:
D30055886 (44e3ed88c9)

Original commit changeset: 8ef7470f9fa6

fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
2021-09-02 16:59:59 -07:00
Jerry Zhang
8f88f797db [quant][graphmode][fx] Add reference quantized conv module (#63828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63828

Added reference quantized conv module for the custom backend flow, the reference quantized module will
have the following code:
```
        w(float) -- quant - dequant \
        x(float) ------------- F.conv2d ---
```
In the full model, we will see
```
        w(float) -- quant - *dequant \
        x -- quant --- *dequant --  *F.conv2d --- *quant - dequant
```
and the backend should be able to fuse the ops with `*` into a quantized linear

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30504749

fbshipit-source-id: e1d8c43a0e0d6d9ea2375b8ca59a9c0f455514fb
2021-08-30 14:23:17 -07:00
Zafar Takhirov
44e3ed88c9 [quant] AO migration of the quantize.py (#64086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.

This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.

At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/opt //caffe2/test:quantization`

Reviewed By: jerryzh168, raghuramank100

Differential Revision: D30055886

fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
2021-08-29 20:30:01 -07:00
Jerry Zhang
0d0605eaa9 [quant][graphmode][fx] Add reference quantized linear module (#63627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63627

Added reference quantized linear module for the custom backend flow, the reference quantized module will
have the following code:
```
        w(float) -- quant - dequant \
        x(float) ------------- F.linear ---
```
In the full model, we will see
```
        w(float) -- quant - *dequant \
        x -- quant --- *dequant --  *F.linear --- *quant - dequant
```
and the backend should be able to fuse the ops with `*` into a quantized linear

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30504750

fbshipit-source-id: 5729921745c2b6a0fb344efc3689f3b170e89500
2021-08-27 22:53:24 -07:00
Supriya Rao
294db0603f [quant] Add support for linear_relu fusion for FP16 dynamic quant (#63826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63826

Support the conversion of the intrinsic linearRelu module to the quantized dynamic LinearReLU module
Verify the support works for both linear module and functional linear fusion

Test Plan:
python test/test_quantization.py test_dynamic_with_fusion

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30503513

fbshipit-source-id: 70446797e9670dfef7341cba2047183d6f88b70f
2021-08-26 21:12:06 -07:00
Supriya Rao
c7027f19ef [quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799

Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).

Fixes #55393

Test Plan:
python test/test_quantization.py test_dynamic_fusion

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30502812

fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
2021-08-26 21:10:46 -07:00
Jerry Zhang
0301c3bc01 [quant][graphmode][fx] Make maxpool and flatten produce the reference pattern (#63501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63501

Currently some of the ops are considered as working with both float and quantized input,
so we may have things like "quant - some_op - dequant" this might not work well with the backend,
we may consider change everything to produce "quant - dequant - some_op - quant - dequant" instead
in the future, this PR fixes it for maxpool and flatten only to unblock resnet benchmarking on TensorRT

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: mruberry

Differential Revision: D30402788

fbshipit-source-id: 892c5ff6552775070e2c1453f65846590fb12735
2021-08-24 21:31:01 -07:00
Jerry Zhang
c8527bc398 [qunat][graphmode][fx] Add a separate lower_to_native_backend function for relu (#62861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861

This PR adds a lower_to_native_backend function to lower a quantized reference model
to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove
the fbgemm/qnnpack specific handling in quantization_patterns.py

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30165828

fbshipit-source-id: de1149cd7e7c1840c17c251cd4d35004afd015b7
2021-08-24 21:07:03 -07:00
Jerry Zhang
5b28e3c183 [quant][graphmode][fx] Add reference option support for binary ops (#62698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698

We also removed the special handling in match_utils for binary ops

Test Plan:
python test/test_quantize.py TestQuantizeFx
python test/test_quantize.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30093781

fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f
2021-08-24 18:22:11 -07:00
Jerry Zhang
bcddc71f26 [quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608

Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid
so that we can produce reference patterns for these ops

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053978

fbshipit-source-id: c527944b6e791bb4d45ebe96265af52794203695
2021-08-17 14:42:40 -07:00
Jerry Zhang
990c2190d1 [quant][graphmode] Reference pattern support for elu (#62607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62607

Removing the quantize handler for elu since it can be covered by DefaultNodeQuantizeHandler

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053977

fbshipit-source-id: 426789443e928bb01a88907de616cbda5866f621
2021-08-10 14:00:39 -07:00
Jerry Zhang
cb7f35d47a [quant][refactor] Checking activation_dtype instead of activation_post_process (#62489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62489

Addressing comment from previous PR: https://github.com/pytorch/pytorch/pull/62374#discussion_r679354145

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053980

fbshipit-source-id: 79c216410282eccd6f0a8f24e38c55c4d18ec0d0
2021-08-10 12:17:36 -07:00
Jerry Zhang
3c1d1170a4 [quant][graphmode][fx] Attach a weight qparam dict to linear and conv in reference quantized model (#62488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62488

Instead of attaching weight observer/fake_quant to the float linear and conv, we can
compute the quantization parameters and attach that as a dictionary to these modules so
that we can reduce the model size and make the reference module clearer

TODO: the numerics for linear and conv in reference quantized model is still not correct since
we did not quantize weight, we may explore things like parameterization to implement this support

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30053979

fbshipit-source-id: b5f8497cf6cf65eec924df2d8fb10a9e154b8cab
2021-08-09 16:55:14 -07:00
Jerry Zhang
74291c8347 [quant][graphmode][fx] Fix the calls to load_arg in quantization_patterns.py (#62376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62376

load_arg(quantized=...) accepts a dictionary from index to dtype,
not a list of dtype, the call is just to make sure the inputs are quantized with correct
dtype

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29979711

fbshipit-source-id: 8499976ac5df8eb2019c3beae573dec6c9a56247
2021-07-29 17:28:07 -07:00
Jerry Zhang
219917706e [quant][graphmode] Add support for reference pattern for default ops (#62375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62375

default ops means ops that has one quantized input and one quantized output,
e.g. gelu, silu, leaky_relu etc. and we need to insert observer for the output

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29979712

fbshipit-source-id: ed88210a9d6f1ab5cdb9397b4ff7f1628162ef22
2021-07-29 17:27:37 -07:00
Jerry Zhang
8f519c5e07 [quant][graphmode] Add support for reference pattern for torch.cat (#62374)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62374

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_cat

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29979713

fbshipit-source-id: 2d38991f96fcca783169ffd306bc2b0fb7debc69
2021-07-29 16:31:09 -07:00
Jerry Zhang
dc8b5db5f8 [quant][graphmode] relax the constraint for supported_dtypes for reference option (Linear and Conv) (#62348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62348

Originally we have a supported_dtypes check for linear and conv, but it's only valid for non reference option,
this PR removes the constraint when is_reference=True and enables producing reference patterns for the dtype
combinations that's not supported by fbgemm/qnnpack, for example qint8 activation dtypes

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_linear_qint8_activation

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29968675

fbshipit-source-id: 2abe37940eb62e16fcf0cbb700c174de49719223
2021-07-29 16:31:04 -07:00
Jerry Zhang
cdf85a82ed [quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215

including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29917524

fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923
2021-07-28 10:14:16 -07:00
Jerry Zhang
f4baa83eae [bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942

This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810656

fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
2021-07-28 09:13:40 -07:00
Jerry Zhang
7507aeded5 [reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892) (#62277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Imported from OSS

Reviewed By: ejguan

Differential Revision: D29941079

fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
2021-07-27 15:46:44 -07:00
Erjia Guan
8cdf16d1de Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module
Test Plan: revert-hammer

Differential Revision:
D29810657 (9df605133e)

Original commit changeset: 949615bbc017

fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc
2021-07-27 10:10:13 -07:00
Jerry Zhang
9df605133e [bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810657

fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
2021-07-27 09:49:20 -07:00
Jerry Zhang
2d7c1e3fa8 [bc-breaking] Produce quantization pattern for add_scalar and mul_scalar (#61859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859

BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29770859

fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
2021-07-27 02:46:00 -07:00
Jerry Zhang
457a3fb6d1 [bc-breaking][quant][graphmode][fx] Produce dequant - fp_op - quant pattern for copy nodes (#61763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61763

This PR changes the is_reference=True option for convert_fx to produce a dequant - fp_op - quant
pattern for copy nodes like maxpool op.

Before the PR:
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    maxpool2d = self.maxpool2d(quantize_per_tensor);  quantize_per_tensor = None
    dequantize = maxpool2d.dequantize();  maxpool2d = None
    return dequantize
```

After (we expand the maxpool2d that works with quantized input to "dequant - maxpool2d - quant" pattern
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
    dequantize_1 = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    return dequantize_1
```

note that the call to self.maxpool2d is expanded to
```
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
```

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_copy_node_has_shared_actpp_instance
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29728900

fbshipit-source-id: cf2c7f1f6659e3ba97cbb7c6204dd13983da10bd
2021-07-25 19:49:13 -07:00
Jerry Zhang
cc263ef795 [bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687

Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
node.

Model:
```
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)

    def forward(self, x):
        x = self.maxpool2d(x)
        return x
```
result of prepare:

Before:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    return maxpool2d

After:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d);  maxpool2d = None
    return maxpool2d_activation_post_process_0

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29715566

fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
2021-07-23 21:29:37 -07:00
Charles David Hernandez
32d0c3e8ee Support for reference convert_fx working on gpu
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.

This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.

Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR

Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic

python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert

python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert

python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence

Reviewed By: vkuzo

Differential Revision: D29684114

fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
2021-07-23 10:30:38 -07:00
Vasiliy Kuznetsov
a70505cdbd ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129

Adds support the comparing fp32 model (without quantization) to a
fp32 model prepared with quantization. The main missing feature was
handling conv-bn fusion, since this fusion for PTQ happens outside
of quantization patterns.

Adds testing for this case for comparing weights and comparing
activations

Adds a TODO for also handling this for shadow activations, we need to
first stop removing observers in graph passes before we can add
this support, will be in a future PR.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29520009

fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae
2021-07-17 20:52:23 -07:00
Jerry Zhang
4a3eea9a6a [quant][graphmode][fx] Produce reference linear module in convert (#60152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188263

fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94
2021-06-20 20:08:12 -07:00
Jerry Zhang
2293ab4e53 [quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188264

fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea
2021-06-20 19:41:16 -07:00
Jerry Zhang
47d727fe1b [quant][graphmode][fx] Produce conv reference static quant modules (#60138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29184791

fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b
2021-06-20 19:25:45 -07:00
Jerry Zhang
a029422cae [quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054

Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(1, 1, 1)

        def forward(self, x):
            x = self.conv(x)
            x1 = x.expand_as(x)
            x2 = torch.add(x, x1)
            return x2

def forward(self, x):
    x = self.activation_post_process_0(x)
    x = self.conv(x)
    x = self.activation_post_process_1(x)
    x1 = x.expand_as(x)
    x1 = self.activation_post_process_2(x1)
    x2 = torch.add(x, x1)
    x2 = self.activation_post_process_3(x2)
    return x2

def forward(self, x):
    x = torch.quantize_per_tensor(x, ...)
    x = self.conv(x). # quantized conv
    x = torch.dequantize(x)
    x1 = x.expand_as(x)
    x1 = torch.quantize_per_tensor(x1, ...)
    # Error: x is dequantized
    x2 = torch.ops.quantized.add(x, x1)
    return x2

Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:

quantized_conv - dequantize - expand_as
  \ ------- quantized_add

But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29149408

fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
2021-06-18 13:31:43 -07:00
Jerry Zhang
3218d890dd [quant][graphmode][fx][fix] Fix support for custom module (#59041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041

Static quantization for Custom module support was removed in a previous refactor
https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case
This PR re-enabled the test case and fixed the support

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724866

fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b
2021-06-01 22:31:15 -07:00
Jerry Zhang
06af7618e7 [quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724870

fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
2021-06-01 22:00:49 -07:00
Jerry Zhang
50e6ee3ca2 [quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724874

fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
2021-06-01 21:40:08 -07:00
Jerry Zhang
20348fb32e [quant][graphmode][fx][refactor] Remove find_matches from Quantizer class (#59037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724865

fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08
2021-06-01 16:07:07 -07:00
Jerry Zhang
e4b2684331 [quant][graphmode][fx][refactor] Remove patterns from Quantizer class (#59033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724861

fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a
2021-06-01 13:44:08 -07:00
Jerry Zhang
83892c1861 [quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724868

fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b
2021-06-01 13:26:09 -07:00
Jerry Zhang
3826f7e8e0 [quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724871

fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4
2021-06-01 13:01:54 -07:00
Jerry Zhang
1b4586ee20 [quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724875

fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c
2021-06-01 12:42:28 -07:00