Commit Graph

45 Commits

Author SHA1 Message Date
Justin Chu
c0d8a4af0a [BE] Enable ruff's UP rules and autoformat ao/ (#105430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430
Approved by: https://github.com/albanD, https://github.com/malfet
2023-07-19 13:44:37 +00:00
HDCharles
8176cd8c0f [ao] fixing quantized prelu workflow (#103455)
Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu
was not running its observers when the quantization flow was being run,
this was a bug which is now fixed and the relevant prelu tests also now
check for this. Also added a corrected observer for PReLU to
qconfig_mapping

Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455
Approved by: https://github.com/jerryzh168
2023-06-23 16:45:40 +00:00
leslie-fang-intel
9832cfbbfe Quantization oneDNN backend only support VNNI CPU (#103653)
**Summary**

- Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support.
- Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-19 09:50:07 +00:00
Jerry Zhang
f7c736e1e7 [quant][pt2e] Add observer_or_fake_quant_ctr to QuantizationSpec (#101920)
Summary:
This is the second refactor to align the annotation API with design,
next step is to change prepare_pt2e to consume QuantizationSpec object directly

Test Plan:
```
buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
```

Reviewed By: kimishpatel

Differential Revision: D45927416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101920
Approved by: https://github.com/andrewor14
2023-05-23 05:48:23 +00:00
Aaron Gokaslan
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
Jerry Zhang
59c1b5025f [quant][fx][pt2e] Refactor prepare so it's aligned better with the new API plan in pt2e (#94011)
Summary:
There are three things that happens in the current prepare code,
(1). user express their intention of how they want the model to be quantized with QConfigMapping, we translate that to
node.meta["target_dtype_info"]
(2). we validate the setting against BackendConfig
(3). insert observers based on the validated node.meta["target_dtype_info"]

previously (2) and (3) are mixed together, this PR tries to move (2) closer to (1), with one edge case left, this refactor
moves us closer to our target design for quantization in pytorch 2.0 export path

this is a follow up PR for https://github.com/pytorch/pytorch/pull/92641

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94011
Approved by: https://github.com/vkuzo
2023-02-07 08:23:56 +00:00
XiaobingSuper
4bae860813 quantization: make x86 as default backend (#88799)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88799
Approved by: https://github.com/kit1980
2022-12-01 02:09:54 +00:00
Vasiliy Kuznetsov
22a1b5e243 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)
Summary:

This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers.  This is better aligned
with the reference model spec.

Test plan:

```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
2022-11-24 07:07:34 +00:00
peterjc123
60e59c0755 Fix get_default_qat_qconfig for PT 1.13 (#88876)
See https://github.com/pytorch/pytorch/pull/84329/files#r1019916766 for more context

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88876
Approved by: https://github.com/jgong5, https://github.com/vkuzo
2022-11-15 06:36:24 +00:00
HDCharles
6fe4ccc7cb [ao] qconfig.py fix public v private (#87515)
Summary: made is_reuse_input_qconfig, _activation_is_memoryless,
_partial_wrapper_equals, _obs_or_fq_ctr_equals,
_add_module_to_qconfig_obs_ctr, _assert_valid_qconfig private

Test Plan: python test/test_public_bindings.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D40709280](https://our.internmc.facebook.com/intern/diff/D40709280)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87515
Approved by: https://github.com/jcaip
2022-11-09 22:30:03 +00:00
Jerry Zhang
4caddac534 [quant][api] Add assert for backend in get_default_qconfig related apis (#86259) (#87331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259

Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn"
for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping

Test Plan:
python test/test_quantization.py -k test_get_default_qconfig_mapping

Imported from OSS

Reviewed By: jcaip

Differential Revision: D40236474

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331
Approved by: https://github.com/andrewor14
2022-10-21 16:57:35 +00:00
Jerry Zhang
8a47a49d5e [quant] Move the order of x86 engine to avoid changing the default qengine (#86631)
since the default qengine is the last element of the engine in supported_engines list, adding x86 qengine in the end of the list changes the default quantized engine as well. this PR will be a short term fix to revert the changes. We have an issue here to track the proper fix: https://github.com/pytorch/pytorch/issues/86404

Motivation:
a meta internal team found that the inference failed in onednn prepacking with error: "could not create a primitive descriptor for a reorder primitive." in a COPPER_LAKE machine, we are working with intel to repro and fix the problem. in the mean time, we'll revert the changes of default option back to fbgemm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86631
Approved by: https://github.com/vkuzo
2022-10-11 00:07:41 +00:00
HDCharles
facf210f9a [ao] fixing public v private for qconfig.py (#86026)
Summary: no changes, just removed the exception for this file, someone
had already fixed the actual file

Test Plan: python test/test_public_bindings.py

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86026
Approved by: https://github.com/jerryzh168
2022-10-06 21:42:44 +00:00
Xia, Weiwen
4b86a9359a [Quant] Make x86 backend default when querying qconfig (#85461)
This PR is a follow-up of #84329 [[Quant] Add unified x86 quant backend](https://github.com/pytorch/pytorch/pull/84329)
It makes `x86` backend default when querying `qconfig`. Users get x86's qconfig/qconfig_mappings if backend is not specified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85461
Approved by: https://github.com/jgong5, https://github.com/vkuzo
2022-09-30 23:44:45 +00:00
andrewor14
24fc680ee4 [Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863)
**Summary:** This commit enforces the following constraints on the
QNNPACK BackendConfig:

- `quant_min_lower_bound` = -127 for qint8 weight
- `quant_max_upper_bound` = 127 for qint8 weight
- `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight

These constraints will enable users to use this BackendConfig with
faster XNNPACK quantized ops. They are also consistent with the
existing settings in `default_symmetric_qnnpack_qconfig` and its
per_channel and QAT variants. For more detail on why these exact
values were chosen, please see the description of
https://github.com/pytorch/pytorch/pull/74396.

Note that there are currently no restrictions on the qscheme in
DTypeConfig. This should be added in the future to further enforce
the restriction that the weights must be quantized with either
per_tensor_symmetric or per_channel_symmetric.

Existing default QConfigs such as `get_default_qconfig("qnnpack")`
and `get_default_qat_qconfig("qnnpack")` will continue to be
supported, but only for the existing dtypes, e.g. quint8 activations
for weighted ops like linear and conv. In the future, we should
revisit whether to enable XNNPACK ops using these QConfigs as well.

**Test Plan:**

python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config

**Reviewers:** jerryzh168, vkuzo

**Subscribers:** jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863
Approved by: https://github.com/jerryzh168
2022-09-30 22:53:38 +00:00
Xia, Weiwen
3a3e2002d8 [Quant] Add unified x86 quant backend (#84329)
## Description

Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM.

For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888)

## Validation
**Correctness**
Covered by UT

**Accuracy**
By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend:
[torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx)

**Performance**
Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance.
For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx

With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP.
Models/throughput | fbgemm | x86 | improvement
-- | -- | -- | --
wide_resnet101_2 | 173.5675 | 241.815 | 39.32%
resnext101_32x8d | 174.365 | 339.8175 | 94.89%
resnet50 | 573.155 | 1174.14 | 104.86%
vgg19_bn | 260.335 | 337.92 | 29.80%
vgg19 | 257.935 | 333.265 | 29.21%
inception_v3 | 601.1175 | 1309.33 | 117.82%
densenet161 | 296.645 | 435.5625 | 46.83%
mnasnet1_0 | 1216.7 | 4057.515 | 233.49%
squeezenet1_0 | 1220.085 | 5153.3875 | 322.38%
alexnet | 2294.91 | 2624.6375 | 14.37%
fbnetc_100 | 976.2825 | 3110.1825 | 218.57%
shufflenet_v2_x0_5 | 1555.76 | 3026.125 | 94.51%
spnasnet_100 | 1059.065 | 3502.0975 | 230.68%
pytorch-unet | 192.76 | 246.77 | 28.02%
acgan | 257.32 | 333.7325 | 29.70%
cgan | 7790.6925 | 7803.1025 | 0.16%
sgan | 257.565 | 338.8875 | 31.57%
se_resnet50 | 492.3725 | 916.5175 | 86.14%
vggm | 300.2875 | 316.2075 | 5.30%

Environment:
- PyTorch version: 1.13.0a0+gitcdd625b
- Is debug build: False
- CUDA used to build PyTorch: None
- ROCM used to build PyTorch: N/A
- OS: Ubuntu 20.04.3 LTS (x86_64)
- GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
- Clang version: Could not collect
- CMake version: version 3.22.5
- Libc version: glibc-2.31
- Python version: 3.9.12 (main, Jun  1 2022, 11:38:51)  [GCC 7.5.0] (64-bit runtime)
- Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31
- Is CUDA available: False
- CUDA runtime version: No CUDA
- GPU models and configuration: No CUDA
- Nvidia driver version: No CUDA
- cuDNN version: No CUDA
- HIP runtime version: N/A
- MIOpen runtime version: N/A
- Is XNNPACK available: True

Versions of relevant libraries:
- [pip3] intel-extension-for-pytorch==1.13.0+cpu
- [pip3] numpy==1.23.3
- [pip3] pytorch-widedeep==0.3.7
- [pip3] torch==1.13.0a0+git48b423b
- [pip3] torchvision==0.14.0a0+ebb68f3
- [conda] blas                      1.0                         mkl
- [conda] intel-extension-for-pytorch 1.13.0+cpu               pypi_0    pypi
- [conda] mkl                       2021.4.0           h06a4308_640
- [conda] mkl-include               2022.1.0                 pypi_0    pypi
- [conda] mkl-service               2.4.0            py39h7f8727e_0
- [conda] mkl-static                2022.1.0                 pypi_0    pypi
- [conda] mkl_fft                   1.3.1            py39hd3c417c_0
- [conda] mkl_random                1.2.2            py39h51133e4_0
- [conda] numpy                     1.23.3                   pypi_0    pypi
- [conda] numpy-base                1.22.3           py39hf524024_0
- [conda] torch                     1.13.0a0+git48b423b          pypi_0    pypi
- [conda] torchvision               0.14.0a0+ebb68f3          pypi_0    pypi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329
Approved by: https://github.com/jerryzh168
2022-09-29 00:44:40 +00:00
Vasiliy Kuznetsov
09965957cd quantization: align observer dtype with reference model spec (#85345)
Summary:

Before this PR, the `dtype` attribute of observers was not clearly
defined.  It originally meant `interface_dtype` in the eager mode
workflow, which is how the codebase before this PR is using it.

In the new reference model spec, `dtype` attribute of an observer
represents the `dtype` value which needs to be passed into a `quantize`
function in the reference model spec. This PR aligns the codebase
to this definition of dtype.  In detail:
1. change util functions to interpret `dtype` using the reference model definition
2. change `prepare` to interpret `dtype` using the reference model definition
3. change observers for dynamic quantization to interpret `dtype` using the reference
   model definition.

A future PR (left out of this one to keep LOC small) will deprecate the
`compute_dtype` field and instead expose `is_dynamic` on observers.
"

Test plan:

```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345
Approved by: https://github.com/z-a-f, https://github.com/jerryzh168
2022-09-21 06:34:26 +00:00
Jerry Zhang
446edadd95 [quant][fx] Follow up fixes for qconfig validations for fixedqparams ops (#81010)
Summary:
This adds a few things on top of https://github.com/pytorch/pytorch/pull/80184,
1). node.target was assumed to be "tanh", torch.nn.Tanh etc. this PR handles that properly
2). adds FixedQParamsFakeQuantize support
3). extends the comparison function _partial_wrapper_equals to work with FakeQuantize.with_args(observer=...)

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D37735193](https://our.internmc.facebook.com/intern/diff/D37735193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81010
Approved by: https://github.com/andrewor14
2022-07-14 18:06:23 +00:00
Andrew Or
c44317704a [Quant][fx] Add default configs for fixed qparams ops (#80184)
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
2022-06-29 23:07:26 +00:00
Andrew Or
61a1eef7fc [Quant][fx] Add get_default_qconfig_mapping
Summary: This follows https://github.com/pytorch/pytorch/pull/78452,
which replaced the qconfig_dict with QConfigMapping. This PR
additionally replaces get_default_*qconfig_dict with
get_default_*qconfig_mapping. For backward compatibility, we
deprecate the old functions instead of removing them.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo, supriyar

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618

Approved by: https://github.com/jerryzh168
2022-06-16 16:10:14 +00:00
Andrew Or
5dcbcc6de8 [Quant][fx] Fix get_default_qconfig_dict for fused modules
Summary: Calling `prepare_fx` with `get_default_qconfig_dict`
failed for models with fused modules, such as `ConvReLU2d`.
This commit fixes this by adding qconfig entries for ReLU
and BatchNorm as well.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_dict_with_fused_modules

Reviewers: jerryzh168

Subscribers: jerryzh168, vkuzo

Issue: https://github.com/pytorch/pytorch/issues/75825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75838

Approved by: https://github.com/jerryzh168
2022-04-15 22:37:26 +00:00
Digant Desai
09f32eba7a [quant] Add default symmetric qat qconfig for qnnpack (#74507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507

* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.

Reviewed By: jerryzh168

Differential Revision: D34804808

fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
2022-03-24 16:19:28 +00:00
Digant Desai
cfe1a41b01 [quant] Add default symmetric qconfig for qnnpack (#74396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396

# New qconfig `default_symmetric_qnnpack_qconfig`

Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.

## Restrictions on weights

Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.

This is driven, in part, by the desire to achieve better performance by XNNPACK ops.

## qengine/backend = `qnnpack` and XNNPACK ops

Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.

## Updated EPS value:
* From PyTorch:

eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`

* Requirement from XNNPACK

For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)

* New minimum allowed scale value

With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,

```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```

With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.

Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.

* Impact on accuracy is unclear as of writing this.

Reviewed By: kimishpatel

Differential Revision: D34625300

fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
2022-03-18 13:42:41 +00:00
Weiwen Xia
060f1b822a Add onednn quant backend (#74137)
Summary:
Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820
jerryzh168 Please review. Thanks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137

Reviewed By: samdow

Differential Revision: D34840477

Pulled By: jerryzh168

fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425
(cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)
2022-03-15 01:28:21 +00:00
Jerry Zhang
5a897536f3 Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend
Test Plan: revert-hammer

Differential Revision:
D33716039 (989b24855e)

Original commit changeset: 6f7bb807e857

Original Phabricator Diff: D33716039 (989b24855e)

fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0
(cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)
2022-03-11 22:06:25 +00:00
Xia Weiwen
989b24855e Add ONEDNN quantization backend (#69820)
Summary:
This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend

The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.

ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.
```python
torch.backends.quantized.engine = 'onednn'
```

## Design docs
https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983
https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096

## File changes
**Add ONEDNN to qengine list**
- aten/src/ATen/Context.cpp
- c10/core/QEngine.h
- torch/ao/quantization/qconfig.py
- torch/backends/quantized/\_\_init\_\_.py

**Implement qconv & qlinear for ONEDNN backend**
- aten/src/ATen/native/quantized/cpu/conv_serialization.h
- aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
- aten/src/ATen/native/quantized/cpu/onednn_utils.h
- aten/src/ATen/native/quantized/cpu/qconv.cpp
- aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp

**Skip tests that are not supported by ONEDNN**
- test/ao/sparsity/test_kernels.py
- test/quantization/core/test_quantized_module.py
- test/quantization/core/test_quantized_op.py

## Validation results
This PR has passed `test_quantization.py` and `test_mkldnn.py`.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)

**Table 1. Performance comparison of int8 2d convolution operator**
|No.|	Shape|	FBGEMM|	ONEDNN|	Gain|
|-|-|-|-|-|
|1|	IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	668.310us|	535.630us|	24.8%|
|2|	IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	290.630us|	281.810us|	3.1%|
|3|	IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.045ms|	893.010us|	17.0%|
|4|	IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	385.320us|	373.720us|	3.1%|
|5|	IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.876ms|	1.641ms|	14.3%|
|6|	IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	660.460us|	638.470us|	3.4%|

**Table 2. Performance comparison of int8 linear operator**
|No.|	Shape (m, n, k)|	FBGEMM|	ONEDNN|	Gap|
|-|-|-|-|-|
|1|	64, 800, 320|	80.550us|	96.770us|	20.10%|
|2|	64, 768, 512|	101.230us|	130.720us|	29.10%|
|3|	16, 256, 512|	30.230us|	51.450us|	70.20%|
|4|	128, 128, 128|	33.810us|	50.480us|	49.30%|
|5|	256, 512, 256|	154.490us|	195.050us|	26.30%|
|6|	1024, 1024, 1024|	3.134ms|	3.514ms|	12.10%|

ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820

Reviewed By: HDCharles

Differential Revision: D33716039

Pulled By: jerryzh168

fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd
(cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)
2022-03-11 20:31:49 +00:00
Jerry Zhang
7ddf212f33 [quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863

This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).

This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34778506

fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
2022-03-11 17:11:30 +00:00
Charles David Hernandez
39605a5632 [ao] Removing memoryless observer args for MovingAverage (#73947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947

The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.

TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver

Test Plan: Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34732080

Pulled By: HDCharles

fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
2022-03-11 00:21:49 +00:00
dzdang
a39e8e8f5e [Quant][fx] Added explicit entries for for functional and module conv&linear support into get_default_qconfig_dict&get_default_qat_qconfig_dict (#73528)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73528

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34535572

Pulled By: dzdang

fbshipit-source-id: 883f46e014e47aeba3ea6f9fb401c54e3792b2ac
(cherry picked from commit 66713d518295b2e7306561030aa6b7ca049a708c)
2022-03-04 03:29:20 +00:00
Jerry Zhang
5db711f9d3 [quant][be] Replace QConfigDynamic with QConfig in code (#69864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864

att, will have a follow up PR that removes QConfigDynamic in the api

Test Plan:
regression tests
```
python test/test_quantization.py TestPostTrainingStatic
python test/test_quantization.py TestPostTrainingDynamic
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33073235

fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db
2021-12-17 22:30:57 -08:00
Charles David Hernandez
497ec9d9b8 Getting NS to work with Ferraris (#68908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68908

see description in github

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32928449

fbshipit-source-id: ba7085b823a0ebcd0d9e40f4ac19ca0a2cac1169
2021-12-08 12:26:00 -08:00
Ben Koopman
93aa3603ee [quant][embedding qat] Re-Land Support Embedding QAT via FX API (#69333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333

Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that.

Support QAT workflow by using torch.fx QAT API.  e.g. `prepare_qat_fx` and `convert_fx`.

Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"`

Imported from OSS

Reviewed By: jingsh

Differential Revision: D32814827

fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16
2021-12-08 05:28:07 -08:00
Jerry Zhang
ca945d989a [quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249

This PR added default_replay_qconfig and default_replay_observer which is used
when we want to configure an operator to reuse the observer from input, if the input
Tensor for the operator is not observed, we will not observe the output of this operator either,
if the input Tensor is observed, we will observe the output of the operator with the same observer.

e.g.

```
x1 = x0.reshape()
```
if reshape is configured with default_replay_qconfig:
1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance
2. if x0 is not observed, we won't observe x1 either

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_replay_qconfig
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32774723

fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2
2021-12-06 22:56:14 -08:00
Nikita Shulga
a0367f8980 Revert D32404517: [quant][embedding qat] Support Embedding QAT via FX API
Test Plan: revert-hammer

Differential Revision:
D32404517 (abda069ce2)

Original commit changeset: 0484df8c826b

fbshipit-source-id: 4e7d62b9ccdb84eb4d184cd0b3c9506013fd8336
2021-12-02 14:28:35 -08:00
Ben Koopman
abda069ce2 [quant][embedding qat] Support Embedding QAT via FX API (#68296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68296

Support QAT workflow by using torch.fx QAT API.  e.g. `prepare_qat_fx` and `convert_fx`.

Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"`

Imported from OSS

Reviewed By: jingsh, supriyar

Differential Revision: D32404517

fbshipit-source-id: 0484df8c826b823b60dfecd9def77bf8cffe0527
2021-12-02 08:42:45 -08:00
Ben Koopman
f6e45102d2 [quant][embedding qat] Support non-partial functions in qconfig comparison (#68067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68067

Embedding QAT uses a NoopObserver class for activation,
and a FakeQuant for weight, make sure that qconfig comparison
functions properly for a mix of partial function and class in
qconfig.

Test Plan:
`pytest test/quantization/eager/test_quantize_eager_qat.py  -v -k "test_embedding_qat_qconfig_equal"`

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D32318434

fbshipit-source-id: c036eef9cbabe7c247745930501328e9c75a8cb0
2021-11-12 12:48:00 -08:00
andrewor
4a8f27445d [Quant] Add dynamic QAT Linear module (#67325)
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325

Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`

**Reviewers:** Charles David Hernandez, Jerry Zhang

**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu

**Tasks:** 99696812

**Tags:** pytorch

Reviewed By: malfet, jerryzh168

Differential Revision: D32178739

Pulled By: andrewor14

fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
2021-11-08 10:24:25 -08:00
Ben Koopman
aa7da7b09c [quant][embedding qat] Enable quint4 in EmbeddingBag QAT workflow (#66348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66348

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31691300

Pulled By: b-koopman

fbshipit-source-id: 11bd75b608b972394fe9f7c9b7bf034af42f28b5
2021-10-18 08:51:39 -07:00
Vasiliy Kuznetsov
8b1258698e Improve quantization API docs (#66379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379

Description:

Creates a quantization API reference and fixes all the docblock errors.

This is #66122 to #66210 squashed together

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```

Reviewed By: ejguan

Differential Revision: D31543172

Pulled By: vkuzo

fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9
2021-10-11 18:46:11 -07:00
Mike Ruberry
10633460ce Revert D31447614: Create a documentation page for torch.ao.quantization.QConfig
Test Plan: revert-hammer

Differential Revision:
D31447614 (7332ed13ed)

Original commit changeset: 5d9dd2a4e864

fbshipit-source-id: 6ac15a956222ca61f7fbb75ed36bcc58b23f0f36
2021-10-10 01:51:09 -07:00
Vasiliy Kuznetsov
7332ed13ed Create a documentation page for torch.ao.quantization.QConfig (#66129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66129

Adds a documentation page for `torch.ao.quantization.QConfig`. It is useful
for this to have a separate page since it shared between Eager and FX graph
mode quantization.

Also, ensures that all important functions and module attributes in this
module have docstrings, so users can discover these without reading the
source code.

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, renders correctly
```

Reviewed By: jerryzh168

Differential Revision: D31447614

Pulled By: vkuzo

fbshipit-source-id: 5d9dd2a4e8647fa17b96cefbaae5299adede619c
2021-10-09 06:45:58 -07:00
Ben Koopman
a58ff186e8 [quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443

Test Plan: Imported from OSS

Reviewed By: dagitses, supriyar

Differential Revision: D31456445

Pulled By: b-koopman

fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de
2021-10-07 20:19:29 -07:00
Supriya Rao
8a974a482c [quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674

Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules.
With this PR they can use either the static or dynamic quantization APIs for Embedding quantization

The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float
method of the quantized Embedding/Embedding modules.

To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type.

The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32.

Addresses Issue #65185
ghstack-source-id: 139935419

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: gchanan

Differential Revision: D31211199

fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4
2021-10-06 23:19:38 -07:00
Zafar
0d020effab [quant] Fix the parts that were missing after initial migration (#66058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058

After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change.
This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location.
This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace.

Test Plan: `python test/test_quantization.py`

Reviewed By: vkuzo

Differential Revision: D31366066

Pulled By: z-a-f

fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5
2021-10-05 11:45:37 -07:00
Charles David Hernandez
f309f8fbd4 [quant] ao migration of observer and qconfig (#64982)
Summary:
(Had to recreate this diff so it wasn't dependent on the stack)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982

migration of qconfig.py and observer.py to torch/ao/quantization using new test format
ghstack-source-id: 138215256

Test Plan:
buck test mode/opt //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/3940649742829796

Reviewed By: z-a-f

Differential Revision: D30982534

fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9
2021-09-16 10:33:16 -07:00