Commit Graph

1886 Commits

Author SHA1 Message Date
Kwanghoon An
eb0b16db92 Initial implementation of AdaRound (#126153)
Summary:
This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568

This algorithm is going to be used by multiple people, hence we need make it official implementation.

Differential Revision: D57227565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153
Approved by: https://github.com/jerryzh168, https://github.com/huydhn
2024-05-17 19:44:50 +00:00
andrewor14
6931f781c2 [quant][pt2e] Allow multi users without output observers (#126487)
Summary: The PT2E quantization flow does not support unquantized
outputs yet. To work around this, users may wish to remove the
output observer from their graphs. However, this fails currently
in some cases because the `PortNodeMetaForQDQ` pass is too
restrictive, for example:

```
conv -> obs -------> output0
         \\-> add -> output1
```

Previously we expected conv to always have exactly 1 user,
which is the observer. When the observer is removed, however,
conv now has 2 users, and this fails the check.

```
conv -------> output0
  \\-> add -> output1
```

This commit relaxes the error into a warning to enable
this workaround.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_multi_users_without_output_observer

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar

Differential Revision: [D57472601](https://our.internmc.facebook.com/intern/diff/D57472601)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126487
Approved by: https://github.com/tarun292
2024-05-17 18:48:21 +00:00
PyTorch MergeBot
ae6fdfa539 Revert "Initial implementation of AdaRound (#126153)"
This reverts commit 175c18af81.

Reverted https://github.com/pytorch/pytorch/pull/126153 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit because there are more than one lint issues, torch/optim/asgd.py is just the last one ([comment](https://github.com/pytorch/pytorch/pull/126153#issuecomment-2113902522))
2024-05-16 02:34:49 +00:00
Kwanghoon An
175c18af81 Initial implementation of AdaRound (#126153)
Summary:
This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568

This algorithm is going to be used by multiple people, hence we need make it official implementation.

Differential Revision: D57227565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153
Approved by: https://github.com/jerryzh168
2024-05-16 02:09:18 +00:00
andrewor14
3cba50e478 [quant] Make per_group and per_token quant match torch.fake_quantize (#125781)
Summary: Follow-up to https://github.com/pytorch/ao/pull/229.
This resolves the difference between `input.div(scales)` and
`input.mul(1.0 / scales)`, which results in small numerical
discrepancies on some inputs.

Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781
Approved by: https://github.com/jerryzh168
2024-05-14 18:18:54 +00:00
Aaron Gokaslan
34910f87f0 [BE]: Update ruff to v0.4.4 (#125031)
Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031
Approved by: https://github.com/albanD, https://github.com/malfet
2024-05-12 20:02:37 +00:00
leslie-fang-intel
d83ab88f81 [Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041)
**Summary**
Per the discussion in https://github.com/pytorch/pytorch/pull/123444, the `decomposed quant/dequant` patterns changed after https://github.com/pytorch/pytorch/pull/123445, we can move the optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase to avoid the changes. In this way, we can:

- Avoid the pattern matcher failure introduced in https://github.com/pytorch/pytorch/pull/123445
- Make the quantization pattern clearer in the pattern matcher phase, since the `quant/dequant` nodes have not been decomposed.

**Changes in this PR**

- Move optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase.
- Corresponding changes in the quantization pattern matcher to ensure no bc-breaking.

**TestPlan**
```
python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k test_q
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124041
Approved by: https://github.com/peterbell10, https://github.com/jgong5
2024-05-09 08:40:44 +00:00
PyTorch MergeBot
ea3f625e32 Revert "[Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041)"
This reverts commit 33e6791645.

Reverted https://github.com/pytorch/pytorch/pull/124041 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think there is a land race with the change 33e6791645 ([comment](https://github.com/pytorch/pytorch/pull/124041#issuecomment-2101766558))
2024-05-09 01:34:19 +00:00
leslie-fang-intel
33e6791645 [Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041)
**Summary**
Per the discussion in https://github.com/pytorch/pytorch/pull/123444, the `decomposed quant/dequant` patterns changed after https://github.com/pytorch/pytorch/pull/123445, we can move the optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase to avoid the changes. In this way, we can:

- Avoid the pattern matcher failure introduced in https://github.com/pytorch/pytorch/pull/123445
- Make the quantization pattern clearer in the pattern matcher phase, since the `quant/dequant` nodes have not been decomposed.

**Changes in this PR**

- Move optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase.
- Corresponding changes in the quantization pattern matcher to ensure no bc-breaking.

**TestPlan**
```
python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k test_q
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124041
Approved by: https://github.com/peterbell10, https://github.com/jgong5
2024-05-09 00:54:22 +00:00
PyTorch MergeBot
1b396d69cb Revert "[CUDNN] Remove defunct cuDNN V8 API build flag (#120006)"
This reverts commit ee4cafa098.

Reverted https://github.com/pytorch/pytorch/pull/120006 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm jobs in trunk ee4cafa098 ([comment](https://github.com/pytorch/pytorch/pull/120006#issuecomment-2098849813))
2024-05-07 16:28:04 +00:00
eqy
ee4cafa098 [CUDNN] Remove defunct cuDNN V8 API build flag (#120006)
The flag basically does nothing following #95722

Let's see if the quantization tests break

CC @malfet @atalmanagement

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120006
Approved by: https://github.com/malfet
2024-05-06 23:13:58 +00:00
andrewor14
8242fb62a7 [quant][pt2e] Fix conv-bn weight + bias per channel QAT (#125208)
Summary: This commit fixes the pattern matching for conv-bn
during QAT fusion where both weight and bias are quantized per
channel. Previously this failed because weights and biases used
the same example kwargs for their scales and zero points,
causing these qparams to be tied during pattern matching.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_bn_per_channel_weight_bias
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_bn_per_channel_weight_bias

Reviewers: jerryzh168, angelayi

Subscribers: jerryzh168, angelayi, supriyar

Differential Revision: [D56740694](https://our.internmc.facebook.com/intern/diff/D56740694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125208
Approved by: https://github.com/angelayi
2024-04-30 18:12:25 +00:00
Xia, Weiwen
35b332882b [Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)
As the title
**Test plan**
python test/test_quantization.py -k test_linear_binary

Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #123240
2024-04-27 02:40:57 +00:00
andrewor14
85b28ffc3a [quant][pt2e] Move batch norm op between eval/train for cuda (#123957)
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.

Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn

Reviewers: jerryzh168

Subscribers: jerryzh168, leslie-fang-intel, supriyar

Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
2024-04-24 22:01:50 +00:00
Shen Xu
8885638f95 [quant][pt2e] Propagate get_attr meta through known ops only (#124415)
Summary: Avoid situation where the graph traversal finds a matmul node with a `get_attr` as its `args[0]`, and incorrectly propagate the `get_attr`'s meta to everything downstream.

Test Plan: CI

Differential Revision: D56219120

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124415
Approved by: https://github.com/jerryzh168
2024-04-24 20:55:56 +00:00
PyTorch MergeBot
e739a2d59e Revert "[quant][pt2e] Move batch norm op between eval/train for cuda (#123957)"
This reverts commit 4efb28c900.

Reverted https://github.com/pytorch/pytorch/pull/123957 on behalf of https://github.com/jeanschmidt due to reverting to check if it will fix rocm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/123957#issuecomment-2075158146))
2024-04-24 15:02:11 +00:00
andrewor14
4efb28c900 [quant][pt2e] Move batch norm op between eval/train for cuda (#123957)
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.

Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn

Reviewers: jerryzh168

Subscribers: jerryzh168, leslie-fang-intel, supriyar

Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
2024-04-24 01:02:59 +00:00
Amadeusz Skrzypczak
107f944f22 Support fp8 quantization (#123161)
This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E.

Motivation for using fp8 quantization instead of int8:
- it works better to run inference with the same datatype the model was trained with,
- fp8 can handle outliers better, which is one of the problems in LLMs activations.

The numerical recipe we want to use it for is fp8 inference:
- bgemms/gemms running in float8_e4m3fn,
- Per-Tensor-Quantization/Scaling,
- amax observer for measurement with input_backoff and weight_backoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 13:35:27 +00:00
Mikayla Gawarecki
c82fcb7b30 Add testing and fix weights_only load for quantized types and nn.Parameters with python attrs (#124330)
Adds the following to allowed globals for the `weights_only` unpickler
- [x] `torch._utils._rebuild_qtensor` and qtensor related types
- [x] `torch._utils._rebuild_parameter_with_state` (used deserializing a parameter that has user-defined attributes like `Param.foo`)

The remaining rebuild functions that have not been allowlisted are

- [x] `torch._utils._rebuild_wrapper_subclass` (allowlisted in above PR)
- [ ] `torch._utils._rebuild_device_tensor_from_numpy`
- [ ] `torch._utils._rebuild_xla_tensor` (legacy)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124330
Approved by: https://github.com/albanD
2024-04-23 04:13:26 +00:00
leslie-fang-intel
dd440ac734 Add Matmul recipe into x86_inductor_quantizer (#122776)
**Summary**
Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block
```

Differential Revision: [D56288468](https://our.internmc.facebook.com/intern/diff/D56288468)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 00:25:41 +00:00
Aaron Gokaslan
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
andrewor14
3eea300680 [quant] Do not decompose choose_qparams_per_token_asymmetric (#124178)
Summary: https://github.com/pytorch/pytorch/pull/123452 added
backward support to this op by turning it into
CompositeImplicitAutograd, which meant it gets decomposed during
export/compile. However, this is not desirable behavior for the
PTQ case when we try to lower the model. This commit enables
QAT without breaking PTQ by refactoring the impl into a separate
op that does have backward support.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai, zou3519

Subscribers: jerryzh168, digantdesai, zou3519, supriyar

Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178
Approved by: https://github.com/digantdesai
2024-04-16 22:58:48 +00:00
WeiChunyu-star
635c238bad Enable UFMT on all of test/quantization/jit &pt2e (#124010)
Partially addresses #123062
Ran lintrunner on:
- test/quantization/jit
- test/quantization/pt2e

Detail:
```
$ lintrunner -a --take UFMT --all-files
ok No lint issues.
Successfully applied all patches.
```

cc, please @ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124010
Approved by: https://github.com/ezyang
2024-04-14 06:07:23 +00:00
WeiChunyu-star
6ac8fe46dd Enable UFMT on all of test/quantization/ao_migration &bc (#123994)
Partially addresses #123062
Ran lintrunner on:
- test/quantization/ao_migration
- test/quantization/bc

Detail:
```
$ lintrunner -a --take UFMT --all-files
ok No lint issues.
Successfully applied all patches.
```

@ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123994
Approved by: https://github.com/ezyang
2024-04-13 06:36:10 +00:00
Aaron Gokaslan
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
andrewor14
762e19606e [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519
2024-04-12 20:05:56 +00:00
andrewor14
5c0a380bdf [pt2e][qat] Support conv-transpose-bn[-relu] QAT fusion (#123652)
Summary: This commit adds support for QAT fusion for the
[conv-transpose-bn] and [conv-transpose-bn-relu] patterns.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn_relu
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn_relu

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/122224

Differential Revision: [D55930704](https://our.internmc.facebook.com/intern/diff/D55930704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123652
Approved by: https://github.com/jerryzh168
2024-04-12 17:16:02 +00:00
PyTorch MergeBot
5669334175 Revert "Add Matmul recipe into x86_inductor_quantizer (#122776)"
This reverts commit e8e9261b90.

Reverted https://github.com/pytorch/pytorch/pull/122776 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122776#issuecomment-2051073373))
2024-04-12 06:29:27 +00:00
PyTorch MergeBot
fe092da874 Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)"
This reverts commit c83900887f.

Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs c83900887f https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))
2024-04-11 16:19:28 +00:00
andrewor14
c83900887f [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168
2024-04-11 14:51:42 +00:00
leslie-fang-intel
e8e9261b90 Add Matmul recipe into x86_inductor_quantizer (#122776)
**Summary**
Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #122775
2024-04-11 09:32:47 +00:00
leslie-fang-intel
8798f5bf0d Add Quantization recipe filter per operator type for x86_inductor_quantizer (#122775)
**Summary**
Default recipes are enabled in `X86InductorQuantizer` and request comes to customize recipes based on these defaults.

- Avoid annotation propagation and restrict annotation only to annotate `conv`/`linear`.
- Add `matmul`  in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models.

To meet these requests, we made changes in this PR by introducing interface as `set_function_type_qconfig` and `set_module_type_qconfig`

- `set_function_type_qconfig` accepts functional input as `torch.nn.functional.linear` or `torch.matmul`; `set_module_type_qconfig` accepts nn.Module input as `torch.nn.Conv2d`.
- To disable the recipe for this operator, user can simply exclude it from the list of operations as `quantizer.set_function_type_qconfig(op, None)`.
- To modify or extend the recipe for this operator with default recipe, user can customize as `quantizer.set_function_type_qconfig(op, config)`.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_conv2d_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_linear_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_maxpool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122775
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-11 09:30:31 +00:00
PyTorch MergeBot
8d9af8b91c Revert "[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)"
This reverts commit 82e0153487.

Reverted https://github.com/pytorch/pytorch/pull/122387 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122387#issuecomment-2048294643))
2024-04-10 19:34:26 +00:00
Xia, Weiwen
82e0153487 [Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)
As the title
**Test plan**
python test/test_quantization.py -k test_linear_binary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-04-10 01:34:14 +00:00
Xia, Weiwen
d86cb9c747 [Quant][Inductor] Add qlinear_pointwise.binary op for X86Inductor backend (#123144)
**Note**: This is a reopen of https://github.com/pytorch/pytorch/pull/122288, which was merged by `ghstack land` to its base (not main) by mistake.

**Description**
Add qlinear_binary op for X86Inductor backend of quantization PT2E. It only supports `add` and `add_relu` now.
It will use post op sum if the extra input has the same dtype as output. Otherwise, it uses binary add.
```
+-------------------+--------------+---------------+
| Extra input dtype | Output dtype | Post op       |
+-------------------+--------------+---------------+
| Fp32/bf16         | fp32/bf16    | sum or add*   |
+-------------------+--------------+---------------+
| Fp32/bf16         | int8         | add           |
+-------------------+--------------+---------------+
| int8              | fp32/bf16    | not supported |
+-------------------+--------------+---------------+
| int8              | int8         | sum           |
+-------------------+--------------+---------------+
*Use sum if extra input and output have the same dtype; otherwise use add.
```

**Test plan**
python test_quantization.py -k test_qlinear_add_pt2e
python test_quantization.py -k test_qlinear_add_relu_pt2e

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123144
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
2024-04-09 04:56:37 +00:00
Zhicheng Yan
77643ed2eb [torch quantization]raise exception when OOM during combine histogram in observer (#123309)
Summary:
Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on *downsample_rate*.

For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model.

To fix OOM issue better, we use *try-catch* clause to avoid OOM.
Empirically, we set the max size of a single histogram tensor size to 1 GB.

Test Plan: Test the change for Multi-Task model (depth + segmentation)

Differential Revision: D55567292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309
Approved by: https://github.com/jerryzh168
2024-04-06 03:15:02 +00:00
William Wen
cbde0f048b [dynamo, 3.12] enable tests disabled due to missing dynamo 3.12 support (#123300)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123300
Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/zou3519
2024-04-05 20:13:17 +00:00
Zhengxu Chen
b4c810491e [export] Temporarily block mutating ops in quant tests. (#122863)
Summary: After we migrate to torch.export, we won't see ops like add_ and mul_ due to functionalization. We are rolling out pre dispatch export, so for now we just skip those mutating ops in tests.

Test Plan: buck run mode/opt caffe2/test/quantization:test_quantization

Reviewed By: tugsbayasgalan

Differential Revision: D55442019

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122863
Approved by: https://github.com/clee2000
2024-04-01 16:41:13 +00:00
Xia, Weiwen
2cd3ef4777 Check scale dtype for fake_quantize_per_channel_affine_cachemask (#120987)
Fixes #120903

Scale for fake quant is assumed FP32 but not checked. If scales of double dtype are passed in, an internal error is raised: `TORCH_INTERNAL_ASSERT(!needs_dynamic_casting<func_t>::check(iter));` in aten/src/ATen/native/cpu/Loops.h
This PR adds a check of scale dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120987
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-03-30 07:32:32 +00:00
Mu-Chu Lee
966ae943df Add wrapper for fbgemm quantization operations (#122763)
Summary:
We add wrappers for fbgemm's packing so we can pass it through PT2 to
lowering phase of AOTInductor.

Test Plan:
Included in commit.
test_quantized_ops::test_wrapped_fbgemm_linear_fp16

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122763
Approved by: https://github.com/jerryzh168
ghstack dependencies: #122762
2024-03-28 18:41:18 +00:00
Mu-Chu Lee
a3b30851c5 Add quantized.linear_unpacked_dynamic_fp16 (#122762)
Summary:

We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format.
This op does packing on the fly for each call with standard at::Tensor weight & bias.

Test Plan:
Included in commit.
test_quantized_op::test_unpacked_qlinear_dynamic_fp16

Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122762
Approved by: https://github.com/jerryzh168
2024-03-28 18:02:27 +00:00
Jerry Zhang
5af839f86d [quant][pt2e] Enable observer sharing between different quantization specs (#122734)
Summary:

Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly,
since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by
adding is_dynamic to all quantization specs.

Note: SharedQuantizationSpec should probably be its own type in the future
TODO later:
(1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase,
(2). make SharedQuantizationSpec a separate type
(3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py

Test Plan:
python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734
Approved by: https://github.com/andrewor14
2024-03-27 16:45:19 +00:00
haozhe.zhu
e0329cba8a [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-26 08:03:42 +00:00
PyTorch MergeBot
60bc29aa0b Revert "[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)"
This reverts commit 2c6eeb26d3.

Reverted https://github.com/pytorch/pytorch/pull/122267 on behalf of https://github.com/jeanschmidt due to Not sure if this PR caused breakages in main rocm jobs, I'll remerge if reverting does not fix it ([comment](https://github.com/pytorch/pytorch/pull/122267#issuecomment-2015294491))
2024-03-22 15:04:30 +00:00
andrewor14
ea8e0c75c7 [quant][pt2] Fix create FQ with FixedQParamsQSpec (#122104)
Summary: Before we just returned a _PartialWrapper object when
using FixedQParamsQuantizationSpec in QAT. This is wrong and
we should return a FQ object instead.

Differential Revision: [D55021106](https://our.internmc.facebook.com/intern/diff/D55021106)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122104
Approved by: https://github.com/jerryzh168
2024-03-22 14:23:05 +00:00
haozhe.zhu
2c6eeb26d3 [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-22 08:12:23 +00:00
haozhe.zhu
a337ee0a3a [Quant] Enable QConv2d with silu post op (#122266)
**Summary**
Enable QConv2d implementation with post op `silu`

**Test Plan**
```
python -m pytest test_quantized_op.py -k test_qconv2d_silu_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122266
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-03-22 07:58:45 +00:00
Jerry Zhang
901ba2be86 [quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ (#122046)
Summary:

also added some utils in xnnpack_quantizer_utils.py
* annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT
* annotate_conv_transpose_relu

conv_transpose + bn weights fusion is performed automatically and can not be disabled currently
we can add support to allow disable this fusion later if needed

Test Plan:
python test/test_quantization.py -k test_conv_transpose_bn_fusion

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046
Approved by: https://github.com/andrewor14
2024-03-19 21:00:57 +00:00
Le-Zheng
25e00545bb [Quant][PT2E] Enable linear and linear-unary post-op gelu quant recipe for x86 inductor quantizer (#114853)
**Summary**
Add Gelu for linear-unary post-op quantization recipe to x86 inductor quantizer.

**Test plan**
python -m pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_unary_gelu
python test/test_quantization.py -k test_linear_unary_with_quantizer_api
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114853
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2024-03-14 01:46:35 +00:00
Shen Xu
159f30331f [quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548)
Test Plan:
```
buck run caffe2/test:quantization_pt2e
```

Differential Revision: D54454707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548
Approved by: https://github.com/jerryzh168
2024-03-12 02:59:12 +00:00
Jerry Zhang
a6a67da333 [quant] Add error check for input_edge annotation (#121536)
Summary:
Raises error when an input edge contains non-Node elements like constant values etc in annotation.

Test Plan:
python test/test_quantization.py -k test_input_edge_sanity_check

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121536
Approved by: https://github.com/andrewor14
2024-03-09 06:13:04 +00:00
albanD
6791b0c09e Change default torch_function behavior to be disabled when torch_dispatch is defined (take 2) (#120632)
This does not introduce a new test but is tested by checking that all the classes we already have still behave as before now that they don't explicitly disable torch_function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120632
Approved by: https://github.com/ezyang
2024-03-09 01:08:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b474a523c6 Ban passing in free function into capture_pre_autograd_graph (#120817)
Summary: Today we don't allow free functions to be tracing callable in torch.export. As a part of migrating capture_preautograd_graph usages to torch.export, we need to ban free functions to capture_preautograd_graph  as well

Test Plan: CI

Differential Revision: D54319597

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120817
Approved by: https://github.com/zhxchen17, https://github.com/andrewor14
2024-03-01 19:38:58 +00:00
Nikita Shulga
98c4ba683e [EZ][BE] Fix ResourceWarning (#120886)
By closing the file handle

Fixes
```
/Users/nshulga/git/pytorch/pytorch/test/quantization/core/test_docs.py:132: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/nshulga/git/pytorch/pytorch/docs/source/quantization.rst' mode='r' encoding='UTF-8'>
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120886
Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/Skylion007
2024-02-29 17:07:39 +00:00
andrewor14
91190d8087 [quant][pt2e] Relax model_is_exported input (#120720)
Summary: This commit relaxes the `model_is_exported` API to
additionally work for `torch.nn.Module`s in addition to just
`torch.fx.GraphModule`s, simplifying downstream uses.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D54263935](https://our.internmc.facebook.com/intern/diff/D54263935)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120720
Approved by: https://github.com/tugsbayasgalan
2024-02-28 18:32:03 +00:00
andrewor14
6ea4480818 [quant][pt2e] Add model_is_exported util function (#119726)
Summary: This commit adds the `model_is_exported` util function
for users to be able to easily tell what APIs to call to move
their models between train and eval modes. This has the
additional advantage of hiding the implementation of how we
detect a model is exported, in case the metadata format changes
in the future.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D53812972](https://our.internmc.facebook.com/intern/diff/D53812972)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119726
Approved by: https://github.com/tugsbayasgalan, https://github.com/albanD
2024-02-16 19:29:36 +00:00
gs-olive
e0f6fa6a7c Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/PaliC, https://github.com/thiagocrepaldi
2024-02-14 21:14:36 +00:00
atalman
244b124bb8 Add linux cpu test for 3.12 (#117853)
This is continuation of work: https://github.com/pytorch/pytorch/pull/113987

Co-authored-by: albanD <desmaison.alban@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117853
Approved by: https://github.com/albanD
2024-02-14 20:52:23 +00:00
PyTorch MergeBot
4a5b2cd6cb Revert "Windows Dynamo Error Removal CI Check (#115969)"
This reverts commit 45e7af5818.

Reverted https://github.com/pytorch/pytorch/pull/115969 on behalf of https://github.com/PaliC due to this pr ended up breaking some of our periodic tests ([comment](https://github.com/pytorch/pytorch/pull/115969#issuecomment-1942934386))
2024-02-14 01:11:46 +00:00
Sergii Dymchenko
bd9db6a9c7 Update to TorchFix 0.4.0 (#119424)
`torch.library.Library` updated to `torch.library._scoped_library` in files with many tests where it seems obvious to do, otherwise `noqa: TOR901` added - see https://github.com/pytorch/pytorch/pull/118318 for more context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119424
Approved by: https://github.com/zou3519
2024-02-12 23:30:12 +00:00
Riley Dulin
44796682d0 [torch][ao] Fix module name filter for pytorch2 quantization for underscores (#119344)
Summary:
There was a bug in the module name filter for modules that had an underscore
already in them, as it was replaced with a "dot" notation.
This is because it was thought that underscores always meant a module separator,
but this isn't the case for modules whose name contains an underscore.

Test Plan:
Added a unit test. Before this change, that test failed (due to applying the wrong
qscheme). Now it passes.

Differential Revision: D53502771

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119344
Approved by: https://github.com/jerryzh168
2024-02-10 00:29:08 +00:00
Jerry Zhang
7082e24ce8 [quant][pt2e][bc-breaking] Set fold_quantize to True in convert_pt2e (#119425)
Summary: This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to set `fold_quantize` flag to True in `convert_pt2e`

Test Plan: CI

Differential Revision: D53550237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119425
Approved by: https://github.com/andrewor14
2024-02-09 18:13:43 +00:00
gs-olive
45e7af5818 Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/ezyang
2024-02-08 21:23:45 +00:00
Angela Yi
0827510fd3 [export] Remove torch._export.export (#119095)
XLA changes: https://github.com/pytorch/xla/pull/6486

Test Plan: CI

Differential Revision: D53316196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119095
Approved by: https://github.com/ydwu4, https://github.com/zhxchen17, https://github.com/tugsbayasgalan, https://github.com/avikchaudhuri, https://github.com/jerryzh168
2024-02-08 21:22:04 +00:00
PyTorch MergeBot
81abc2b249 Revert "[quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)"
This reverts commit 482d952e88.

Reverted https://github.com/pytorch/pytorch/pull/118701 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/118701#issuecomment-1932866964))
2024-02-07 20:56:16 +00:00
Jerry Zhang
482d952e88 [quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)
Summary:
This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to remove `fold_quantize` flag from
`convert_pt2e`

Test Plan: CI

Differential Revision: D53247301

BC Breaking Note:

flag `fold_quantize` set to True `convert_pt2e` and now we'll fold the quantize op in the weight by default, so users will see model size reduction by default after pt2e quantization.
2.2
```
folded_model = convert_pt2e(model, fold_quantize=True)

non_folded_model = convert_pt2e(model)
```

2.3
```
folded_model = convert_pt2e(model)

non_folded_model = convert_pt2e(model, fold_quantize=False)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118701
Approved by: https://github.com/andrewor14, https://github.com/leslie-fang-intel
2024-02-07 19:10:51 +00:00
andrewor14
6c1cca153e [quant][pt2e] Allow users to override train/eval behavior (#119091)
Summary: This commit adds a util for PT2E quantization users
to call `model.train()` and `model.eval()` without error.
Instead, these will automatically call the equivalent
`move_exported_model_to_train/eval` for the user, which only
switch behavior for special ops like dropout and batchnorm.
This enables users to onboard to the PT2E flow more easily.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_exported_model_train_eval

Reviewers: jerryzh168, tugsbayasgalan, zhxchen17

Subscribers: jerryzh168, tugsbayasgalan, zhxchen17, supriyar

Differential Revision: [D53426636](https://our.internmc.facebook.com/intern/diff/D53426636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119091
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan, https://github.com/zhxchen17
2024-02-06 22:19:58 +00:00
andrewor14
70605d150b [quant][pt2] Add move_exported_model_to_train (#113492)
Summary: This is the equivalent API to `model.train()` for
exported models, analogous to `move_exported_model_to_eval`.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_inplace
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_bn

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113492
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan
2024-02-02 17:39:47 +00:00
Jiaxu Zhu
b97ab47619 [pytorch][ao] Update PerChannelMinMaxObserver default _load_from_state_dict (#118659)
Summary:
When `version` is missing in the metadata, use `min_val/max_val` as keys instead of `max_vals/min_vals`

## Reasons
1. It's been almost 2 years since this change D30003700, which means now most checkpoints are using the `max_val/min_val` keys

2. most checkpoints dumps using `model.state_dict()` don't have version info, which will lead a fake `missing keys` error when loading state_dict

Test Plan: CI

Differential Revision: D53233012

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118659
Approved by: https://github.com/jerryzh168
2024-02-01 04:39:31 +00:00
Aaron Gokaslan
1562dae62c [BE]: Apply RUF025 dict.fromkeys preview rule (#118637)
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
Peter Bell
1460334436 [quant] Remove deprecated torch.jit.quantized APIs (#118406)
The `torch.jit.quantized` interface has been deprecated since #40102 (June 2020).

BC-breaking message:

All functions and classes under `torch.jit.quantized` will now raise an error if
called/instantiated. This API has long been deprecated in favor of
`torch.ao.nn.quantized`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118406
Approved by: https://github.com/jerryzh168
2024-01-27 18:32:45 +00:00
Jerry Zhang
af1ebc45d3 [quant][pt2e] Add fold_quantize=True for all convert_pt2e calls (#117797)
Summary: In preparation for enabling fold_quantize=True by default

Test Plan: CI

Differential Revision: D52879612

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117797
Approved by: https://github.com/andrewor14
2024-01-24 17:54:13 +00:00
le-zheng
94f0472579 [Quant] [PT2] Add Hardswish into X86InductorQuantizer Conv2d Unary Annotation (#117488)
**Summary**
Add `hardswish`  into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117488
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #117487
2024-01-20 01:37:33 +00:00
le-zheng
f115f1cde1 [Quant] Enable QConv2d with hardswish post op (#117487)
**Summary**
Enable QConv2d implementation with post op `hardswish`

**Test Plan**
```
python -m pytest test_quantized_op.py -k test_qconv2d_hardswish_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117487
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-01-19 13:24:06 +00:00
rzou
db1a6eda9e [codemod] markDynamoStrictTest batch 22 (#117729)
[codemod] markDynamoStrictTest test_autograd
[codemod] markDynamoStrictTest test_ao_sparsity
[codemod] markDynamoStrictTest test_jit
[codemod] markDynamoStrictTest test_quantization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117729
Approved by: https://github.com/bdhirsh
2024-01-18 16:59:26 +00:00
Jerry Zhang
8f1bc876b2 [quant] Support custom qmin/qmax for activation and weight for xnnpack quantizer (#117305)
Summary:
att, this allows us to experiment with 4 bit quant in xnnpack

Test Plan:
python test/test_quantization.py -k test_dynamic_linear_int4_weight

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117305
Approved by: https://github.com/digantdesai
2024-01-17 03:22:49 +00:00
Jerry Zhang
3e397cefc5 Add uint1 to uint7 dtypes (#117208)
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see https://github.com/pytorch-labs/ao/pull/13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117208
Approved by: https://github.com/ezyang
2024-01-13 01:09:23 +00:00
rzou
7b753cc7b8 Skip some slow tests (under Dynamo) (#117389)
Otherwise these may cause timeouts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117389
Approved by: https://github.com/jerryzh168, https://github.com/voznesenskym
ghstack dependencies: #117318, #117320
2024-01-12 22:18:07 +00:00
Xia, Weiwen
94db6578cc [Quant] Add dynamic quantization config for x86 inductor backend (#115337)
**Description**
Add dynamic quantization config for x86 inductor backend.
To support the QKV structure in self-attention, we removed an assertion in port-metadata-pass that requires single dequantize node after quantize node.

**Test plan**
```
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_dynamic_quant_linear
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_qat_dynamic_quant_linear
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115337
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-01-10 11:33:37 +00:00
Huy Do
907e80239d Fix broken lint after #117052 (#117080)
https://hud.pytorch.org/pr/pytorch/pytorch/117052#20318344490 breaks lint, forward fixing with `lintrunner -a`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117080
Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/Skylion007
2024-01-10 00:44:19 +00:00
Max Ren
d2033a0639 [quant][pt2e][xnnpack_quantizer] add support for linear_relu (#117052)
Add support for linear_relu annotation for XNNPACKQuantizer, this allows the input to linear and the output to relu to share the same quantization parameter.s

Differential Revision: [D52574086](https://our.internmc.facebook.com/intern/diff/D52574086/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117052
Approved by: https://github.com/jerryzh168, https://github.com/digantdesai
2024-01-09 23:19:52 +00:00
Jerry Zhang
28e2e12b2a [quant][be] enable xnnpack_quantizer tests to run in internal CI (#116911)
Summary: fixed an import problem for test_xnnpack_quantizer so that it can run in CI

Test Plan:
internal CI
sanity check: buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_conv2d (caffe2.test.quantization.pt2e.test_xnnpack_quantizer.TestXNNPACKQuantizer)'

Differential Revision: D52576449

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116911
Approved by: https://github.com/mcr229
2024-01-08 23:43:47 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
Jerry Zhang
41f265b06a [quant][pt2e] Preserve numeric_debug_handle in quantization flows (#116477)
Summary:
We introduced `node.meta["numeric_debug_handle"]` in https://github.com/pytorch/pytorch/pull/114315 to
indicate the numeric debug handle for values in the graph, in this PR we supported preserving this field
in prepare and convert so that we can use these for numerical debugging

Next: we also want to preserve these in deepcopy of GraphModule as well

Test Plan:
python test/test_quantization.py -k test_quantize_pt2e_preserve_handle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116477
Approved by: https://github.com/tugsbayasgalan
2024-01-03 03:39:00 +00:00
Le-Zheng
95a86ed9ca [Quant] Add int8 linear op gelu for quantization PT2E with Inductor. input is an int8 CPU tensor; weight is an int8 MdkldnnCPU tensor (#114852)
**Summary**
Enable Int8 Linear Gelu post operator fusions for Stock PyTorch Inductor. The input is an int8 CPU tensor and weight is an int8 MkldnnCPU tensor.

**Test plan**
python test/test_quantization.py -k test_qlinear_gelu_pt2e

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114852
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-01-02 08:11:26 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Jerry Zhang
8173d98c57 [quant][be] Skip conv-bn folding when there are no batchnorm ops (#116440)
Summary:
`_fold_conv_bn_qat` is taking a long time currently, so skipping it when it's not necessary,
we can have follow up fixes to actually reduce the patterns or cache the patterns if possible

Test Plan:
uncomment the print in `test_speed`, run

python test/test_quantization.py -k test_speed

and make sure the convert time is low, e.g. 0.1s instead of 8-9 seconds

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116440
Approved by: https://github.com/andrewor14
2023-12-28 23:33:21 +00:00
Nikita Shulga
e86636266f [Quantized] Fixed equal_quantized_cpu for QUInt4 (#116307)
- Return false if scalar_type is different (because QInt8 and QUint8 has identical item_size but shouldn't be compared by comparing data)
- Compute data_size correctly for QUInt4x2 and QUInt2x4 dtypes
- Add regression test

Fixes https://github.com/pytorch/pytorch/issues/116087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116307
Approved by: https://github.com/jerryzh168
2023-12-26 21:52:28 +00:00
leslie-fang-intel
81cebca3d2 [Inductor] [Quant] Fix QConv Binary Inplace Layout Issue (#115613)
This pull request primarily addresses two issues to resolve the `QConvPointWiseBinaryPT2E` layout problem:

- As the changes made in 611a7457ca, for `QConvPointWiseBinaryPT2E` with post-op `sum`, we should also utilize `NoneLayout` and return `accum` instead of `QConvPointWiseBinaryPT2E`.

- Additionally, this pull request fixes an issue in the `_quantized_convolution_onednn` implementation. Given that we expect `accum` to be inplace changed, we should avoid copying `accum` by changing the memory format or data type inside the kernel implementation. Instead, we have moved the necessary changes of memory format or data type to the lowering of `QConvPointWiseBinaryPT2E`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115613
Approved by: https://github.com/jgong5, https://github.com/oulgen
ghstack dependencies: #116172
2023-12-24 08:04:29 +00:00
leslie-fang-intel
dfb6815170 [Reland] [PT2] [Quant] Change the QConv2d Binary post op name from add to sum (#116172)
**Summary**
Re-land https://github.com/pytorch/pytorch/pull/115329. Open a new PR since the origin branch has been deleted.
Change the QConv2d Binary fusion post op name from `add` to `sum`, since we are actually using OneDNN `post op sum` instead of `Binary_Add` for now.

**TestPlan**
```
python -m pytest test_quantized_op.py -k test_qconv2d_sum_pt2e
python -m pytest test_quantized_op.py -k test_qconv2d_sum_relu_pt2e
python -m pytest test_quantized_op.py -k test_qconv2d_sum_relu_float_output_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116172
Approved by: https://github.com/kit1980
2023-12-24 08:00:21 +00:00
PyTorch MergeBot
b6d0d0819a Revert "[PT2] [Quant] Change the QConv2d Binary post op name from add to sum (#115329)"
This reverts commit 9ae0e62929.

Reverted https://github.com/pytorch/pytorch/pull/115329 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, please check internal diff to get the list and logs, @jerryzh168 please support the author in order to get these changes merged and landed ([comment](https://github.com/pytorch/pytorch/pull/115329#issuecomment-1863021726))
2023-12-19 15:52:57 +00:00
leslie-fang-intel
9ae0e62929 [PT2] [Quant] Change the QConv2d Binary post op name from add to sum (#115329)
**Summary**
Change the QConv2d Binary fusion post op name from `add` to `sum`, since we are actually using OneDNN `post op sum` instead of `Binary_Add` for now.

**TestPlan**
```
python -m pytest test_quantized_op.py -k test_qconv2d_sum_pt2e
python -m pytest test_quantized_op.py -k test_qconv2d_sum_relu_pt2e
python -m pytest test_quantized_op.py -k test_qconv2d_sum_relu_float_output_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115329
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-12-15 05:10:47 +00:00
angelayi
dd42201cb8 [export] Preserve FQN in export_to_torch_ir (#115462)
AOTInductor currently relies of export_to_torch_ir to generate a graph, and passes it to inductor to generate the .so. They would like the FQN to be consistent so that they can easily find/update the weights in the .so.

Note that since export flattens all modules in to a single computational graph, we will change the FQNs in the original module by replacing all periods with underscores. For example, `foo.child1param`, which points to a submodule named `foo`'s parameter named `child1param`, will be renamed to `foo_child1param` since we no longer have the submodule `foo`. This is done just by doing `name.replace(".", "_")`.

Outputted AOTInductor c++ code: https://www.internalfb.com/phabricator/paste/view/P900120950?lines=377-355%2C354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115462
Approved by: https://github.com/tugsbayasgalan
2023-12-13 04:58:47 +00:00
Aaron Gokaslan
794545c11f [BE]: Enable RUF015 codebase wide (#115507)
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
2023-12-11 15:51:01 +00:00
HDCharles
b5d3d3ebf0 [ao] making hist_obs handle torch.inf and closeby values (#103467)
Summary: This PR does 2 things:

1) Previously this would simply error, now it will ignore any
torch.inf values that it recieves. note: The code checks for torch.inf after
aminmax that way if there are no torch.inf values found, the perf is a
relatively unchanged

2) as mentioned in https://github.com/pytorch/pytorch/issues/100051,
values close to (but not quite at) the maximum/minimum float value could
overflow to infinity in the course of _adjust_min_max() (when this large
value would be multiplied by something in the middle of a calculation
that would otherwise result in a non inf value). This was fixed by
rearranging the order of operations for the lines in question without
altering the actual equations. Specifically, where operations in lines
1095, 1098 and 1100 have multiplication and division of large values,
its better to divide the two large values before multiplying, rather
than multiplying the two large values together (creating overflow) before dividing like it had been.

Test Plan: python test/test_quantization.py
TestObserver.test_histogram_observer_ignore_infinity

python test/test_quantization.py TestObserver.test_histogram_observer_handle_close_to_infinity
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51489345](https://our.internmc.facebook.com/intern/diff/D51489345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103467
Approved by: https://github.com/andrewor14
2023-12-08 21:41:31 +00:00
Jerry Zhang
cc8f6f56dc [quant][pt2e] Add convert callback to Observer module (#115001)
Summary:
This is to allow easier extension of quant workflow in the future, as we are seening more
diverse ways of doing quantization

putting up this for feedbacks first

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_observer_callback

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115001
Approved by: https://github.com/kimishpatel
2023-12-08 13:47:37 +00:00
Jerry Zhang
ecba053cff [quant][pt2e] XNNPACKQuantizer skip inserting observers for non-float Tensors (#114999)
Summary:
att

Test Plan:
python test/test_quantization.py -k test_add_mul_long

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114999
Approved by: https://github.com/kimishpatel, https://github.com/guangy10
2023-12-07 22:13:36 +00:00
Jerry Zhang
a93b9ee9d8 [quant][be] Add a test for per channel quant for groupwise conv (#115224)
Summary:
just making sure this works

Test Plan:
python test/test_quantization.py -k test_groupwise_per_channel_quant

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115224
Approved by: https://github.com/andrewor14
2023-12-07 04:46:20 +00:00
leslie-fang-intel
7ec145bfed [Quant] [PT2] Fix XNNPACKQuantizer set_module_type issue (#115252)
**Summary**
Fix the issue https://github.com/pytorch/pytorch/issues/115251, the root-cause is we pass the `filter_fn` parameter of `find_sequential_partitions` in wrong position. Use keyword arg to fix this issue.

**Summary**
```
python -u -m pytest -s -v test_quantization.py -k test_set_module_type_case_2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115252
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-12-07 03:08:20 +00:00
leslie-fang-intel
1489e4bcf3 [Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)
**Summary**
Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d
python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval
```

Differential Revision: [D51853407](https://our.internmc.facebook.com/intern/diff/D51853407)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547
Approved by: https://github.com/jgong5, https://github.com/andrewor14
2023-12-06 19:51:22 +00:00
leslie-fang-intel
4a624d1f8a [Quant] [PT2] Enable QLinear input with multi dims (#113733)
**Summary**
In the previous QLinear implementation, it was assumed that inputs have a dimension of 2. In this update, we have modified QLinear to accept inputs with a dimension greater than 2, incorporating input and output reshaping accordingly.

**Test Plan**
```
python -u -m pytest -s -v test_quantized_op.py -k test_qlinear_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113733
Approved by: https://github.com/jgong5, https://github.com/eellison
2023-12-06 01:16:51 +00:00
Natalia Gimelshein
b8ce05456c enable cat for cuda bits types (#115044)
It was already working for cpu, so bring parity.
Also, slightly reduce number of compiled kernels by using OpaqueType.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115044
Approved by: https://github.com/malfet
2023-12-06 00:05:18 +00:00
PyTorch MergeBot
063423edf5 Revert "enable cat for cuda bits types (#115044)"
This reverts commit 4cf97c40f7.

Reverted https://github.com/pytorch/pytorch/pull/115044 on behalf of https://github.com/malfet due to This breaks ROCM ([comment](https://github.com/pytorch/pytorch/pull/115044#issuecomment-1841494814))
2023-12-05 19:37:25 +00:00
Natalia Gimelshein
4cf97c40f7 enable cat for cuda bits types (#115044)
It was already working for cpu, so bring parity.
Also, slightly reduce number of compiled kernels by using OpaqueType.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115044
Approved by: https://github.com/malfet
2023-12-05 17:14:42 +00:00
Jerry Zhang
1474dad28c [quant][pt2e][xnnpack] Add support for QAT dynamic quantization for linear in XNNPACKQuantizer (#113288)
Summary:
FX graph mode quant workflow and also pt2e flow relies on the `is_dynamic` flag in observer/quantizationspec to
convert an observer to dynamic quantization patterns (choose_qparams -> q -> dq), this PR added is_dynamic flag
for all observers so that it's possible to convert these observers to the pattern.

However, this dynamic quantization pattern (choose_qparams -> q -> dq) is actually only valid for MovingAverageObserver(averaging_constant=1)
for the computation before convert and after convert to match in the context of QAT. So we'll have some sanity
checks in other observers to make sure the is_dynamic is False.

Test Plan:
python test/test_quantization.py TestXNNPACKQuantizer.test_qat_dynamic_linear

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51124725](https://our.internmc.facebook.com/intern/diff/D51124725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113288
Approved by: https://github.com/kimishpatel
2023-12-04 23:06:38 +00:00
Jerry Zhang
8f164017ee [quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem (#113405)
Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113405
Approved by: https://github.com/mcr229
2023-12-02 00:44:42 +00:00
PyTorch MergeBot
c6e975bc0e Revert "[Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)"
This reverts commit bab054063c.

Reverted https://github.com/pytorch/pytorch/pull/114547 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114547#issuecomment-1836612143))
2023-12-01 18:52:51 +00:00
Huy Do
5687285ca5 Skip quantization tests running from BaseTestQuantizePT2EQAT_ConvBn (#114829)
Summary: This is a follow-up from D51428979.  These tests should be run only from `TestQuantizePT2EQAT_ConvBn1d` and `TestQuantizePT2EQAT_ConvBn2d`. The base class doesn't have the necessary setup to run them and will fail expectedly.  I previously ignored the failures on D51428979, and these failed tests have been disabled.

Test Plan:
Run an example test there and confirm that two versions from `TestQuantizePT2EQAT_ConvBn1d` and `TestQuantizePT2EQAT_ConvBn2d` are run while the one from `BaseTestQuantizePT2EQAT_ConvBn` is skipped

```
$ buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --run-disabled 'caffe2/test/quantization:test_quantization - test_qat_conv_bn_fusion_literal_args'
File changed: fbcode//caffe2/test/quantization/pt2e/test_quantize_pt2e_qat.py
↷ Skip: caffe2/test/quantization:test_quantization - test_qat_conv_bn_fusion_literal_args (caffe2.test.quantization.pt2e.test_quantize_pt2e_qat.BaseTestQuantizePT2EQAT_ConvBn) (0.0s)

/data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:230: NCCL_DEBUG env var is set to None
/data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:239: NCCL_DEBUG is WARN from /etc/nccl.conf
INFO:2023-11-29 19:20:33 3049620:3049620 CuptiActivityProfiler.cpp:225] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
/data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:158: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen.
  threadSafeForkRegisterAtFork()
test_qat_conv_bn_fusion_literal_args (caffe2.test.quantization.pt2e.test_quantize_pt2e_qat.BaseTestQuantizePT2EQAT_ConvBn) ... skipped 'Skipping test running from BaseTestQuantizePT2EQAT_ConvBn'

----------------------------------------------------------------------
Ran 1 test in 0.001s

OK (skipped=1)

Skipped: Skipping test running from BaseTestQuantizePT2EQAT_ConvBn

Buck UI: https://www.internalfb.com/buck2/7b70fb33-44cb-4745-92e1-64031bb413b8
Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924660765251
Network: Up: 12KiB  Down: 0B  (reSessionID-0399f0c3-e671-4770-a41c-75c06ae709d5)
Jobs completed: 11. Time elapsed: 1:07.2s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 0, local: 1)
Tests finished: Pass 2. Fail 0. Fatal 0. Skip 1. Build failure 0
```

Differential Revision: D51694959

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114829
Approved by: https://github.com/clee2000
2023-12-01 05:13:27 +00:00
Jerry Zhang
64fd706b21 [quant][pt2e] Add generate_numeric_debug_handle pass (#114315)
Summary:
This is a util for numeric suite in pt2 export so that we can build
a more streamlined UX for numerical debugging in quant + executorch stack

Test Plan:
python test/test_quantization.py TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114315
Approved by: https://github.com/zhxchen17
2023-12-01 03:38:17 +00:00
leslie-fang-intel
fd7201029a [Quant] [PT2] Enable Inplace Dropout in _move_exported_model_to_eval (#114725)
**Summary**
Enable Inplace Dropout replacement in `_move_exported_model_to_eval`

**Test Plan**
```
python -u -m pytest -s -v test_quantize_pt2e.py -k test_move_exported_model_to_eval
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114725
Approved by: https://github.com/andrewor14, https://github.com/jgong5
ghstack dependencies: #114547
2023-11-30 04:43:22 +00:00
leslie-fang-intel
bab054063c [Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547)
**Summary**
Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d
python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547
Approved by: https://github.com/jgong5, https://github.com/andrewor14
2023-11-30 04:31:27 +00:00
leslie-fang-intel
8c1f65dc2b [Quant] [PT2] Add Hardtanh and ReLU6 into X86InductorQuantizer Conv2d Unary Annotation (#114579)
**Summary**
Add `Hardtanh` and `ReLU6` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114579
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #114578
2023-11-28 07:18:00 +00:00
leslie-fang-intel
8a35a68bb7 [Quant] Enable QConv2d with hardtanh post op (#114578)
**Summary**
Enable QConv2d implementation with post op `hardtanh`

**Test Plan**
```
python -m pytest test_quantized_op.py -k test_qconv2d_hardtanh_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114578
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-28 07:13:01 +00:00
leslie-fang-intel
74370a8a9d Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe (#114442)
**Summary**
Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_adaptive_avg_pool2d_recipe
python -m pytest test_x86inductor_quantizer.py -k test_flatten_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114442
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-28 01:35:57 +00:00
Huy Do
cf9f3ae8d8 Skip an example of test_instance_norm when running internally due to its size (#114452)
After https://github.com/pytorch/pytorch/pull/113420, `torch.unique` now includes a call to `torch.sort` and that call is slow when running in dev mode, i.e. `@fbcode//mode/dev`.  This causes the test to take more than 10 minutes and time out internally [T170720856](https://www.internalfb.com/intern/tasks/?t=170720856).  Running the test in `@fbcode//mode/opt` is fine, so please let me know if there is a way to set that.  Otherwise, this change will skip the largest example when running in sandcastle internally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114452
Approved by: https://github.com/malfet
2023-11-28 01:11:19 +00:00
leslie-fang-intel
e592b9a469 [Quant] [PT2] Fix an issue in Conv Binary Quantization Annotation (#114540)
**Summary**
To annotate a conv-binary pattern, should skip the pattern if the conv node has more than one user.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary2
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114540
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-28 01:06:48 +00:00
Xia, Weiwen
d18e6b07aa Overload vec::dequantize to eliminate rounding error for quantized sigmoid (#114098)
**Description**
Fix #107030
Dequantize X by `(x_val - zp) * scale` instead of `x_val * scale + (-zp * scale)` to eliminate rounding error.
Now this overload is used for sigmoid only.

Performance impact:
![image](https://github.com/pytorch/pytorch/assets/12522207/655abd16-7d9d-4a9a-8c59-327ebf39157a)
Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake)

**Test plan**
`python test_quantization.py TestQuantizedOps.test_sigmoid_dequantize_rounding_error`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114098
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-23 04:33:57 +00:00
HDCharles
18e1a37c4e [ao] updating embedding_bag support for fx and eager (#107623)
Summary: our docs were saying dynamic embedding bag wasn't supported but
it actually is (at least at the same level as embeddings were) it just wasn't previously tested/listed.

Test Plan: python test/test_quantization.py -k "test_embedding"

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107623
Approved by: https://github.com/jerryzh168
2023-11-21 03:54:00 +00:00
drisspg
2b97f5a9a1 Disallow fp8 type promotion (#113975)
Fixes #113663

As well as updating the promotion logic to disallow automatic type promotion between fp8 types this PR also cleans up the table entries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113975
Approved by: https://github.com/albanD, https://github.com/malfet
2023-11-20 19:47:43 +00:00
andrewor14
e5102ccd27 [quant][pt2] Support conv1d-bn QAT fusion (#113714)
Summary: Previously the PT2 QAT code only supported conv2d-bn.
This commit extends all existing QAT fusion support to conv1d-bn,
including support for all variants like relu, no bias, literal
args, cuda etc. This commit also refactors the code such that
we can support conv3d-bn easily in the future.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D51428979](https://our.internmc.facebook.com/intern/diff/D51428979)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113714
Approved by: https://github.com/jerryzh168
2023-11-17 22:09:30 +00:00
Peter Bell
9f47580ad7 [BE] Don't mutate torch.compile global config in tests (#113882)
We should uniformly use `config.patch` so the configuration changes don't effect
different tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113882
Approved by: https://github.com/lezcano
2023-11-17 16:49:48 +00:00
Nikita Shulga
0d6d97d956 Relax constraints on test_cast_round_trip (#113872)
Results of float point operation can be affected by execution order and compiler is not guaranteed to make trivial optimization that might result in lost off precision while compiling in debug mode

Fixes https://github.com/pytorch/pytorch/issues/113829

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113872
Approved by: https://github.com/Skylion007, https://github.com/huydhn
2023-11-16 19:52:05 +00:00
George White
6c187246d6 Add support for float8_e4m3fnuz and _e5m2fnuz (#107586)
This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit). It has been based on #104242 which adds similar float8 types.

These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586
Approved by: https://github.com/seemethere, https://github.com/malfet
2023-11-15 15:01:11 +00:00
andrewor14
f9ea697112 [quant][pt2][be] Refactor QAT tests for future patterns (#113658)
Summary: Currently the QAT tests are very specific to conv-bn-2d.
This makes it difficult to test new patterns like conv-bn-1d if
we want to add them. This commit refactors these tests so we can
add and test future patterns easily.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113658
Approved by: https://github.com/jerryzh168
2023-11-15 02:17:13 +00:00
Natalia Gimelshein
15a2caea8e Enables copy/clone/reshape/contiguous operations for bits types (#113508)
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113508
Approved by: https://github.com/albanD
2023-11-11 22:51:50 +00:00
George White
8880584015 Improve test_float8.py (#113361)
The numeric test for round-trip casting of float8 dtypes originally consisted of generating a 100x100 tensor in the range 0..max.

This change refactors the test, adds further edge cases and fixes multiple issues with the lower precision simulation which the results of the round-trip cast test were checked against.

Set atol=0 and rtol=0 to ensure an exact equality comparison.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113361
Approved by: https://github.com/malfet, https://github.com/Neilblaze
2023-11-10 15:23:22 +00:00
Jerry Zhang
501d118255 [quant][pt2e] Add transform_for_annotation method in Quantizer (#113115)
Summary:
Adding the method so that people can do some transformations before annotation to make the graph easier to annotate

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_transform_for_annotation

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51141080](https://our.internmc.facebook.com/intern/diff/D51141080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113115
Approved by: https://github.com/kimishpatel
2023-11-09 20:23:29 +00:00
Jerry Zhang
12c257cc00 [qunat][pt2e] Support allow_implicit_sharing flag (#112929)
Summary:
For a Node: node1 and edge: (node1, node2), since they are observing the same
Tensor, we may want to implicitly share observers, this flag allows people to
turn off this behavior for the output of the node

See the test_allow_implicit_sharing test for use case

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_implicit_sharing

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112929
Approved by: https://github.com/kimishpatel
2023-11-08 23:47:17 +00:00
andrewor14
c0aba9be41 [quant][pt2] Fix custom dtype per channel weight in QAT (#112612)
Summary: Previously we only copied over q/dq args for the per
tensor case. This was because the qparams for `quantize_per_tensor`
are literals while the qparams for `quantize_per_channel` are
`get_attr` nodes (tensors), which disappear from the original
nodes in the graph after subgraph rewriting.

However, this is problematic because, in the per channel case,
not all q/dq args are tensors. In particular, the args after
the qparams (axis, qmin, qmax, dtype) are all literals. For
these literal args we simply used the hardcoded ones
(0, -127, 127, torch.int8 respectively), even if the user
explicitly specified to use a different weight dtype. This
commit fixes this by copying over these literal args for the
per channel case as well.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_per_channel_weight_custom_dtype

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112612
Approved by: https://github.com/jerryzh168
2023-11-07 20:10:53 +00:00
andrewor14
b6e85eb8d5 [quant][pt2] Support quantized conv bias in QAT fusion (#112528)
Summary: Previously QAT fusion assumes bias is not quantized.
This works for the existing XNNPACKQuantizer, but not for custom
quantizers that wish to quantize the bias. This commit supports
this by adding the necessary patterns. This requires refactoring
the code, however, since it previously assumed that there will
only be one pair of q-dq (from conv weight) in the matched
pattern, and this is no longer true.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50856377](https://our.internmc.facebook.com/intern/diff/D50856377)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112528
Approved by: https://github.com/jerryzh168
2023-11-06 17:58:57 +00:00
leslie-fang-intel
6ba2748690 [Quant] [PT2] Enable Decomposed quant per tensor/channel to accept bfloat16 input (#112225)
**Summary**
- PR 4 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Enable `decomposed quant_per_tensor` and `quant_per_channel` accepts bfloat16 input.

**TestPlan**
```
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_tensor_bfloat16_input
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_channel_bfloat16_input
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112225
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-03 23:47:43 +00:00
leslie-fang-intel
871e27a61c [Quant] [PT2] Remove the output Annotation of Conv/Linear in x86InductorQuantizer (#112140)
**Summary**
- PR 3 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Remove the output annotation of QConv/QLinear in X86InductorQuantizer.

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d
python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear
python -m pytest test_x86inductor_quantizer.py -k Conv2d
python -m pytest test_x86inductor_quantizer.py -k Linear
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112140
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #112010, #112126
2023-11-03 08:24:55 +00:00
leslie-fang-intel
a53d29cc18 Enable oneDNN QLinear FP32/BF16 output (#112126)
**Summary**
- PR 2 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Enable QLinear (relu) with BFloat16 or Float32 output.

**TestPlan**
```
python -u -m pytest -s -v test_quantized_op.py -k test_qlinear_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112126
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
ghstack dependencies: #112010
2023-11-03 08:20:54 +00:00
leslie-fang-intel
b6fc7af8a0 Enable oneDNN QConv FP32/BF16 output (#112010)
**Summary**

- PR 1 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Enable QConv (relu, add, add_relu) with BFloat16 or Float32 output.

**Test Plan**
```
python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e
python -u -m pytest test_quantized_op.py -k test_qconv2d_relu_pt2e
python -u -m pytest test_quantized_op.py -k test_qconv2d_add_pt2e
python -u -m pytest test_quantized_op.py -k test_qconv2d_add_relu_pt2e
python -u -m pytest test_quantized_op.py -k test_qconv2d_add_relu_float_output_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112010
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2023-11-03 08:16:45 +00:00
leslie-fang-intel
6c19de07cd [Quant] [PT2] Add ConvBNAdd(ReLU) Annotation into X86InductorQuantizer (#111281)
**Summary**
This PR adds ConvBNAdd(ReLU) QAT Annotation into `X86InductorQuantizer`.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_unary_with_quantizer_api
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add_relu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111281
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #111280
2023-11-02 02:05:49 +00:00
leslie-fang-intel
56ca0043f6 [Quant] [PT2] Enable QAT Quantization flow in X86InductorQuantizer (#111280)
**Summary**
This PR enables PT2 QAT Quantization flow in `X86InductorQuantizer`.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary_with_quantizer_api
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d
python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_relu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111280
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-02 02:03:10 +00:00
Kimish Patel
9e2af971fc [Quantization] Add "quantization_tag" as metadata to fx proxy (#108764)
Summary:
In order to make sure that quantization_tag is preserved through second
stage export, this PR adds it as a special metadata that should be
preserved.

Since quantization in export path will work on top of pre dispatch
graph, subsequent post dispatch op decomposition, will decompose ops
that quant workflow tagged. In order to make sure that the patterns
identified by quantizer, remains identifiable, even after decompositions
are applied, we must preserve "quantization_tag".

This enables backend delegates, that quantized a model for specific
backend, to be able to identify "quantized" patterns.

Test Plan:
metadata porting tests

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49056259](https://our.internmc.facebook.com/intern/diff/D49056259)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108764
Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168
2023-11-01 21:41:58 +00:00
andrewor14
231129ea36 [quant][pt2] Fix QAT conv-bn bias derived qspec (#112159)
Summary: Today, we have special handling for special qspecs like
`SharedQuantizationSpec` or `DerivedQuantizationSpec`, since these
qspecs refer to other nodes in the graph and these node references
need to be updated after replacement (since they referred to nodes
in the original graph that no longer exist in the new graph).

However, we only do the above for special nodes like conv, bn,
getitem, and relu. This doesn't cover the common use case of
having conv bias derive its qparams from those of conv input
activations and conv weight. This commit adds support for this
use case by also replacing the node references for these nodes.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50697078](https://our.internmc.facebook.com/intern/diff/D50697078)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112159
Approved by: https://github.com/jerryzh168
2023-10-31 18:04:23 +00:00
Jerry Zhang
3db0095ea2 [reland][quant][pt2e][be] Cleanup observer insertion logic (#111828) (#112453)
Summary: att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers

Test Plan:
contbuild & OSS CI, see bf998a2c5d

Test plan from GitHub:
python test/test_quantization.py TestQuantizePT2E

CIs

Differential Revision: D50816224

Pulled By: jerryzh168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112453
Approved by: https://github.com/andrewor14
2023-10-31 17:33:24 +00:00
PyTorch MergeBot
797d7100de Revert "[quant][pt2e][be] Cleanup observer insertion logic (#111828)"
This reverts commit bf998a2c5d.

Reverted https://github.com/pytorch/pytorch/pull/111828 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111828#issuecomment-1782154648))
2023-10-27 01:35:27 +00:00
PyTorch MergeBot
5ce8002d24 Revert "Remove deprecated fbgemm operators (#104535)"
This reverts commit 57c7aa12db.

Reverted https://github.com/pytorch/pytorch/pull/104535 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/104535#issuecomment-1779650412))
2023-10-25 16:34:16 +00:00
Jerry Zhang
bf998a2c5d [quant][pt2e][be] Cleanup observer insertion logic (#111828)
Summary:
att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers

Test Plan:
python test/test_quantization.py TestQuantizePT2E

CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111828
Approved by: https://github.com/kimishpatel
ghstack dependencies: #111827
2023-10-25 03:48:36 +00:00
Kimish Patel
a8760f1b42 [Quantization] Add a test for QAT + PTQ selective quantization in (#111689)
xnnpack quantizer

Summary:
For some workflows you want to quantize some parts of the model via qat
and then continue eager mode training. After training, you want to
export the whole model and perform PTQ on the rest.

Test Plan:
test added

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D50510480](https://our.internmc.facebook.com/intern/diff/D50510480)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111689
Approved by: https://github.com/jerryzh168
2023-10-24 23:25:38 +00:00
Oleg Bulatov
192477b5ba Enable flake8-bugbear B020 lint (#110823)
Fixes part of https://github.com/pytorch/pytorch/issues/106571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823
Approved by: https://github.com/Skylion007
2023-10-24 22:43:47 +00:00
Peter Bell
57c7aa12db Remove deprecated fbgemm operators (#104535)
These operators are not used and have been deprecated since #72690 (Feb 2022). Additionally, the `torch.jit.quantized` interface has been deprecated since #40102 (June 2020).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104535
Approved by: https://github.com/ezyang
2023-10-22 06:10:09 +00:00
Jerry Zhang
43c211facb [quant][pt2e] Actually support transitive sharing for SharedQuantizationSpec (#111172)
Summary:
Previously we actually did not really support this, this PR added the support.

Next
* clean up insert observer logic
* add allow_transitive_sharing boolean flag to allow people to turn this op for certain edges

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_shared_qspec_transitivity

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D50250789](https://our.internmc.facebook.com/intern/diff/D50250789)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111172
Approved by: https://github.com/kimishpatel
2023-10-20 23:25:17 +00:00
Jerry Zhang
e0ddc3ff9c [quant][pt2e][be] Move xnnpack quantizer tests to separate file (#111004)
Summary:
att

Test Plan:
python test/test_quantization.py TestXNNPACKQuantizer
python test/test_quantization.py TestXNNPACKQuantizerModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111004
Approved by: https://github.com/andrewor14
2023-10-12 01:16:05 +00:00
andrewor14
0e551bbcd7 [quant][pt2] Preserve source_fn_stack after QAT fusion (#110899)
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899
Approved by: https://github.com/jerryzh168
2023-10-11 02:55:52 +00:00
Nikita Shulga
65d40a72c4 Delete rogue print from test_quantize_pt2e.py (#110732)
Introduced by https://github.com/pytorch/pytorch/pull/110308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110732
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi, https://github.com/jerryzh168
2023-10-06 22:16:10 +00:00
Jeff Daily
e8f1f4ed66 [quant][pt2][ROCm] follow-up PR 109908 for miopen_batch_norm (#110653)
Fixes recent broken unit tests caused by PR #109908 because cudnn and miopen have separate batch norm functions.

```
2023-10-05T09:35:01.6606614Z _______________ TestQuantizePT2EQAT.test_qat_conv_bn_fusion_cuda _______________
2023-10-05T09:35:01.6606948Z Traceback (most recent call last):
2023-10-05T09:35:01.6607362Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 323, in test_qat_conv_bn_fusion_cuda
2023-10-05T09:35:01.6607767Z     self._verify_symmetric_xnnpack_qat_graph(
2023-10-05T09:35:01.6608217Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 130, in _verify_symmetric_xnnpack_qat_graph
2023-10-05T09:35:01.6608658Z     self._verify_symmetric_xnnpack_qat_graph_helper(
2023-10-05T09:35:01.6609105Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 173, in _verify_symmetric_xnnpack_qat_graph_helper
2023-10-05T09:35:01.6609623Z     m = prepare_qat_pt2e(m, quantizer)
2023-10-05T09:35:01.6610171Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/quantize_pt2e.py", line 178, in prepare_qat_pt2e
2023-10-05T09:35:01.6610561Z     _fuse_conv_bn_qat(model)
2023-10-05T09:35:01.6611072Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 501, in _fuse_conv_bn_qat
2023-10-05T09:35:01.6611497Z     m = _fuse_conv_bn_qat_helper(m, is_cuda=True)
2023-10-05T09:35:01.6612065Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 575, in _fuse_conv_bn_qat_helper
2023-10-05T09:35:01.6612492Z     _get_conv_bn_getitem_nodes(r.replacements)
2023-10-05T09:35:01.6613058Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 383, in _get_conv_bn_getitem_nodes
2023-10-05T09:35:01.6613465Z     assert bn_node is not None
2023-10-05T09:35:01.6613716Z AssertionError
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110653
Approved by: https://github.com/jerryzh168, https://github.com/pruthvistony
2023-10-06 15:30:55 +00:00
Jerry Zhang
7b6042111f [quant][pt2e] Refactor conv related annotation for XNNPACKQuantizer (#110308)
Summary:
Since we changed IR that we are working with to pre autograd aten IR, it's easier
to use plain pattern match instead of relying on source_matcher_utils now, this
PR refactors the annotation for conv to use aten ops directly.

Also fixed reentrant test after this change.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110308
Approved by: https://github.com/kimishpatel
2023-10-05 22:36:18 +00:00
Andrew Or
7c72238e4b Back out "Enable pickling model prepared with QAT qconfig" (#110392)
Summary:
D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out.

we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday.

Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes.

Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392
Approved by: https://github.com/junesg, https://github.com/jerryzh168
2023-10-05 14:41:00 +00:00
andrewor14
62cad5b5b0 [quant][pt2] Support cudnn_batch_norm in QAT fusion (#109908)
Summary: Today, we get different batch norm ops depending on
the device the model is placed on at export time. Exporting
`model.cpu()` gives `_native_batch_norm_legit`, while exporting
`model.cuda()` gives `cudnn_batch_norm`. QAT fusion currently
only supports the former and silently ignores the latter. This
commit fixes this by additionally matching on the latter op
during QAT fusion.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_fusion
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_relu_fusion

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D49615145](https://our.internmc.facebook.com/intern/diff/D49615145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109908
Approved by: https://github.com/jerryzh168
2023-10-05 04:08:44 +00:00
HDCharles
428cbd7513 [ao] fixing multihead attention convert size (#110407)
Summary: after converting nn.multihead attention we weren't deleting the
old in_proj_weight and in_proj_bias despite not (really) using them.

Test Plan: python test/test_quantization.py -k
"test_custom_module_multi_head_attention"

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110407
Approved by: https://github.com/jerryzh168
2023-10-03 08:49:12 +00:00
Jerry Zhang
c9b8e06060 [quant] Enable quantization for wav2letter (#109830)
Summary:
Also added annotation support for conv1d_relu and conv1d in XNNPACKQuantizer, the quantized results still
matches fx quant path (didn't quantize conv1d) so tests are not disabled

Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=w2l --verify

Differential Revision: D49479546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109830
Approved by: https://github.com/kimishpatel
2023-09-29 00:47:34 +00:00
Jerry Zhang
3de42995e4 [quant][pt2e] Add quant API re-entrant test (#110125)
Summary:
Add the test to make sure we can call the quantize API multiple times

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_reentrant

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110125
Approved by: https://github.com/kimishpatel
ghstack dependencies: #110097
2023-09-28 22:41:59 +00:00
Sindi Shkodrani
419ec3b229 Enable pickling model prepared with QAT qconfig (#109288)
Summary:
Resolving error:

AttributeError: Can't pickle local object '_add_module_to_qconfig_obs_ctr.<locals>.get_factory_kwargs_based_on_module_device'

by moving nested function out to the main module

Test Plan: Added test to CI

Reviewed By: andrewor14

Differential Revision: D49187352

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109288
Approved by: https://github.com/andrewor14
2023-09-28 09:51:19 +00:00
Jerry Zhang
1b51d29b66 [quant][pt2e] Enable constant folding for quantize ops (#109343)
Summary:
This PR added constant folding for quantize ops so that instead of storing fp32 weight in the
quantized model, we'll get int8/int16 etc. weight

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_fold_quantize

also will verify in executorch later

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343
Approved by: https://github.com/kimishpatel, https://github.com/jgong5
2023-09-27 06:04:45 +00:00
andrewor14
7da3c938cf [quant][be] Move QAT tests to its own file (#108061)
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT
python test/test_quantization.py TestQuantizePT2EQATModels

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108061
Approved by: https://github.com/jerryzh168
2023-09-15 18:34:44 +00:00
Jerry Zhang
58a883093f [quant][pt2e] Add test for serialize and deserialize quantized model (#109158)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_save_load

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109158
Approved by: https://github.com/andrewor14
ghstack dependencies: #108924, #108925
2023-09-15 00:50:55 +00:00
Jerry Zhang
9187559e75 [quant][be] Remove test/quantization/pt2e/test_quantize_pt2e_fx.py (#108925)
Summary:
this is no longer needed since we have the quantizer api now

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108925
Approved by: https://github.com/andrewor14
ghstack dependencies: #108924
2023-09-14 18:35:17 +00:00
Jerry Zhang
41e2189843 [quant] Remove reference representation rewrite for adaptive_avg_pool2d (#108924)
Summary:
integer adaptive_avg_pool2d is not well defined due to different possible ways of rounding fp32 value to integer value, and
this op isn't too critical for numerics (since it appears not too often), so we'll skip this for now.

we might need to revert the changes that adds integer impl for adaptive_avg_pool op as well

Test Plan:
python test/test_quantization.py TestQuantizePT2ERepresentation

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108924
Approved by: https://github.com/kimishpatel
2023-09-14 10:18:36 +00:00
Jerry Zhang
c914ca7577 [quant][be] Add TestPT2ERepresentation test case (#108923)
Summary:
att

Test Plan:
python test/test_quantization.py TestPT2ERepresentation
Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108923
Approved by: https://github.com/andrewor14
2023-09-14 02:01:38 +00:00
Andrew Or
e8a402c56e [quant][pt2] Fix and rename move_model_to_eval (#108891)
Summary:
This commit fixes two silent correctness problems with
the current implementation of `move_model_to_eval`:

(1) Previously the user had to manually call `eliminate_dead_code`
before calling `move_model_to_eval`, otherwise the dropout pattern
won't actually get eliminated. This is because subgraph rewriter
complains the match is not self-contained, and so silently does
not do the replacement.

(2) We wish to error when the user calls `model.train()` or
`model.eval()` on an exported model. This error is raised
correctly immediately after export today, but no longer raised
after the user calls prepare or convert.

We fix (1) by moving the `eliminate_dead_code` call into
`move_model_to_eval`, and fix (2) by ensuring the respective
errors are thrown after prepare and convert as well.

Additionally, this commit renames `move_model_to_eval` to
`move_exported_model_to_eval` to be more explicit.

bypass-github-export-checks

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_disallow_eval_train
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_to_eval

Imported from OSS

Differential Revision: D49097293

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108891
Approved by: https://github.com/jerryzh168
2023-09-11 15:37:01 +00:00
Jerry Zhang
b0de6a8002 [quant][executorch] Support inception_v4 in examples (#108382)
Summary: Verified that pt2e quant flow matches the fx flow with executorch backend config

Test Plan:
with-proxy buck2 run executorch/examples/quantization:example -- -m=ic4 --verify

```
[INFO 2023-08-31 16:08:06,923 example.py:77] prepare sqnr: inf
[INFO 2023-08-31 16:08:06,932 example.py:81] quant diff max: 0.0
[INFO 2023-08-31 16:08:06,936 example.py:85] quant sqnr: inf
```

full output: https://www.internalfb.com/intern/paste/P818520579/

Differential Revision: D48889075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108382
Approved by: https://github.com/kimishpatel
2023-09-08 17:39:31 +00:00
Kurt Mohler
3f88e3105f Reland: Remove remaining global set_default_dtype calls from tests (#108088)
Fixes #68972

Relands #107246

To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
2023-09-07 03:04:34 +00:00
Jerry Zhang
32a16d4999 [quant][pt2e] Support int16 quantization (#108453)
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
2023-09-06 19:31:20 +00:00
Kimish Patel
ffc0c46092 [Quantization] Add metadata porting for nodes added by quantization (#107107)
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107107
Approved by: https://github.com/jerryzh168
ghstack dependencies: #107105, #107106, #107899, #107900
2023-09-02 06:38:14 +00:00
Kimish Patel
eb67c452c8 [Quant] Add DQ duplication pass (#107900)
Summary:
During convert step observers are first replaced by Q-DQ pair. In some
scenarios like following output DQ has a fan out.

                 ---> OP2 -> Q -> DQ
                /
OP -> Q -> DQ -
                \
                 ---> OP3 -> Q -> DQ

If either op OP2 or OP3 are configured to be quantized, then the input
is expected to quantized. In this case quantized equivalent of some
pattern, that quantizer asked to be quantized, should look like:
[DQ -> {pattern} -> Q]. However, in scenario like above where DQ node
is shared between multiple "quantized" patterns, boundary of "quantized"
pattern is not clear because DQ now belongs to multiple quantized
patterns.

This poses challenge for:
- Porting metadata: which "quantized" partition this DQ node belongs
- Quantized representation, equivalently, needs to identify
self-contained quantized pattern that is replaced by its equivalent pattern
that captures compute in the quantized precision.

Test Plan:
test_duplicate_dq_pass

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel
ghstack dependencies: #107105, #107106, #107899
2023-09-02 06:20:03 +00:00
leslie-fang-intel
fb808c30c7 x86_inductor_quantizer switches to new graph capture API (#108214)
**Summary**
Update `X86InductorQuantizer` and related testcase to the new graph capture API `capture_pre_autograd_graph`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108214
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-09-01 00:43:45 +00:00
andrewor14
057b807178 [quant] Move dropout replacement to move_model_to_eval (#108184)
Summary: This commit adds a public facing
`torch.ao.quantization.move_model_to_eval` util function
for QAT users. Instead of calling model.eval() on an exported
model (which doesn't work, see
https://github.com/pytorch/pytorch/issues/103681), the user
would call this new util function instead. This ensures special
ops such as dropout and batchnorm (not supported yet) will have
the right behavior when the graph is later used for inference.

Note: Support for an equivalent `move_model_to_train` will be
added in the future. This is difficult to do for dropout
currently because the eval pattern of dropout is simply a clone
op, which we cannot just match and replace with a dropout op.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_model_to_eval

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D48814735](https://our.internmc.facebook.com/intern/diff/D48814735)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108184
Approved by: https://github.com/jerryzh168
2023-08-30 16:33:17 +00:00
Jerry Zhang
147b3495e2 [quant][pt2e] Add reference representation for dynamic quantized linear (#108073)
Summary: att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_dynamic_linear
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e -- 'test_representation_dynamic_linear'

Reviewed By: kimishpatel

Differential Revision: D48703076

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108073
Approved by: https://github.com/andrewor14
2023-08-29 07:12:55 +00:00
andrewor14
199e23bc3a [quant][be] Clean up QAT tests in test_quantize_pt2e.py (#107991)
Summary: This commit does 4 main things:

1. When verifying QAT numerics, automatically check both the
per tensor and the per channel cases, and automatically verify
convert numerics

2. When verifying the QAT graph, automatically check both the
per tensor and the per channel cases

3. Merge verify graph and verify numerics tests for conv-bn

4. Fix `test_prepare_qat_conv_bn_fusion_getitem_placeholder`,
which was no longer testing the right thing recent capture
changes, since the maxpool op is no longer followed by a
getitem node. However, we do still need this test for other
ops that *are* followed by getitem nodes (e.g. standalone BN).

Items (1) - (3) make the QAT tests significantly less verbose
and easier to read.

Test Plan:
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107991
Approved by: https://github.com/jerryzh168
2023-08-28 21:12:00 +00:00
Jerry Zhang
9ae3d7ca90 [reland][quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930) (#107992)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107992
Approved by: https://github.com/digantdesai, https://github.com/mcr229
2023-08-27 14:50:03 +00:00
Xia, Weiwen
e9b0f62a19 [Quant][PT2E] Enable linear and linear-unary post-op quant recipe for x86 inductor quantizer (#106781)
**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106781
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #105818
2023-08-27 10:50:17 +00:00
Xia, Weiwen
a6d3da1835 [Quant] Add int8 linear op impl for quantization PT2E with Inductor. input is an int8 CPU tensor; weight is an int8 MdkldnnCPU tensor. (#105818)
**Summary**
Add a new onednn qlinear op for quantization PT2E with Inductor. input is an int8 CPU tensor and weight is an int8 MkldnnCPU tensor.

**Test plan**
python test/test_quantization.py -k test_qlinear_pt2e

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105818
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2023-08-27 08:13:12 +00:00
leslie-fang-intel
1147a28b0b [Quant][PT2E] Add cat and avg_pool2d recipe into x86InductorQuantizer (#106836)
**Summary**
Add `cat` and `avg_pool2d` quantization recipe as input output share observer into `x86InductorQuantizer`.

**Test Plan**
```
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_same_inputs
clear && python -m pytest test_x86inductor_quantizer.py -k test_cat_recipe_single_input
clear && python -m pytest test_x86inductor_quantizer.py -k test_avg_pool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106836
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-26 16:51:13 +00:00
Jerry Zhang
15d4dedbbf [quant][pt2e] Add reference representation rewrite for statically quantized linear (#107994)
Summary: att

Test Plan:
```
python test/test_quantization.py TestQuantizePT2E.test_representation_linear
buck2 test 'fbcodemode/opt' fbcodecaffe2/test:quantization_pt2e -- 'test_representation_linear'
```

Reviewed By: kimishpatel

Differential Revision: D48674862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107994
Approved by: https://github.com/mcr229, https://github.com/guangy10
2023-08-26 15:39:52 +00:00
leslie-fang-intel
9319dd1c7c [Quant][Inductor] Enable the lowering of quantized maxpool2d (#105906)
**Summary**
Enable the `dq-maxpool2d-q` pattern match and lower into `torch.ops.quantized.max_pool2d`.

**Test Plan**
```
python -m pytest test_mkldnn_pattern_matcher.py -k test_qmaxpool2d
python -m pytest test_quantized_op.py -k test_max_pool2d_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105906
Approved by: https://github.com/jgong5, https://github.com/eellison
ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456, #105639
2023-08-26 08:36:47 +00:00
leslie-fang-intel
70ca18f8a0 [Quant][PT2E] Enable X86InductorQuantizer single quantizable op(maxpool2d) (#105639)
**Summary**
In this PR, we mainly enable 2 things.

- Enable the skeleton of quantization recipe for single quantizable operators in `X86InductorQuantizer`.
- Add quantization recipe of `maxpool2d` and annotate it as input./output share observer.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_maxpool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105639
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456
2023-08-26 08:34:15 +00:00
andrewor14
240bdbea61 [quant][pt2e] Fix annotation for conv no bias case (#107971)
Summary: This fixes the no bias case for conv annotations.
Previously this would result in an index out of bounds, since
the new aten.conv2d op may not have the bias arg (unlike the
old aten.convolution op). This was not caught because of a lack
of test cases, which are added in this commit.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_no_bias
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_fusion_no_conv_bias

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel

Differential Revision: [D48696874](https://our.internmc.facebook.com/intern/diff/D48696874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107971
Approved by: https://github.com/jerryzh168
2023-08-26 01:01:54 +00:00
Jerry Zhang
f92f69dbfb [quant][pt2e] Enable testing for reference quant model representations (#107474)
Summary:
Previously these tests were disabled due to time out in dynamo export in fbcode,
this might have been resolved, so trying to enable again

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48619072](https://our.internmc.facebook.com/intern/diff/D48619072)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107474
Approved by: https://github.com/andrewor14
2023-08-26 00:37:45 +00:00
PyTorch MergeBot
8d44b0f5a5 Revert "[quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)"
This reverts commit 1d1739dc6d.

Reverted https://github.com/pytorch/pytorch/pull/107930 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107930#issuecomment-1694069330))
2023-08-26 00:37:02 +00:00
Jerry Zhang
1d1739dc6d [quant][pt2e][xnnpack_quantizer] Add support for mul and mul_relu (#107930)
Summary: att

Test Plan: buck2 run executorch/examples/quantization:example -- -m=mv3 --verify

Differential Revision: D48588121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107930
Approved by: https://github.com/kimishpatel
2023-08-25 23:36:19 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
2b7271c703 Support cond and out_dtype for predispatch (#107941)
Summary: Title

Test Plan: CI

Differential Revision: D48675742

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107941
Approved by: https://github.com/jerryzh168
2023-08-25 17:37:16 +00:00
leslie-fang-intel
8ef057255d [Quant][PT2E] Enable qconv for quantization 2.0 export (#104580)
**Summary**
Enable `qconv1d/2d/3d`, `qconv2d_relu`, `qconv2d_add`, and `qconv2d_add_relu` operator for quantization 2.0 export with oneDNN library.

**Test Plan**
```
python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104580
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 17:34:45 +00:00
Jerry Zhang
a0cfaf0688 [quant][pt2e] Make sure XNNPACKQuantizer works with the pre_dispatch=True (#107872)
Summary: att

Test Plan:
```
buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18

buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D48415977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107872
Approved by: https://github.com/andrewor14
2023-08-25 05:04:01 +00:00
Jerry Zhang
16fcb07846 [quant][pt2e] Add support for channel in DerivedQuantizationSpec (#107833)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_derived_qspec_per_channel

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48630535](https://our.internmc.facebook.com/intern/diff/D48630535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107833
Approved by: https://github.com/andrewor14
2023-08-24 07:45:13 +00:00
vasiliy
61fe49b8ed pt2: make aot_eager backend handle basic float8 operations (#107783)
Summary:

Reland of https://github.com/pytorch/pytorch/pull/107642 with a fix for tests on Windows.

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107783
Approved by: https://github.com/albanD
2023-08-23 18:10:53 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
5025fb9213 Revert "pt2: make aot_eager backend handle basic float8 operations (#107642)"
This reverts commit 24147a8e1c.

Reverted https://github.com/pytorch/pytorch/pull/107642 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing Windows CPU test in trunk. The Windows failures on your PR looks related I think ([comment](https://github.com/pytorch/pytorch/pull/107642#issuecomment-1688999380))
2023-08-22 22:17:36 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
vasiliy
24147a8e1c pt2: make aot_eager backend handle basic float8 operations (#107642)
Summary:

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107642
Approved by: https://github.com/albanD
2023-08-22 18:57:14 +00:00
Tugsbayasgalan Manlaibaatar
ee72071fc7 Avoid executing side-effectful graph_module as validation step (#107271)
Dynamo currently runs the real graph module with real inputs as a way to match the return result of graph module with the eager return type. This is unsafe when graph module is side effectful. In the long term, we will get rid of this step. But in the short term, we just fakify the graph module again and run it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107271
Approved by: https://github.com/ezyang
2023-08-22 04:22:31 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
Jerry Zhang
28be2c674a [quant][pt2e] Move specific quantizer related things outside of main quant code base (#106806) (#107259)
Summary:

Currently in quantizer/quantize_pt2e we import things from specific quantizers (XNNPACKQuantizer, QuantizationConfig) etc.
this PR removes them so it's clearer that they are not part of the core quantization code base

This PR also removed get_supported_operators from main Quantizer since we haven't seen a clear need for this API

Test Plan:
CIs

Imported from OSS

Differential Revision: D48340367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107259
Approved by: https://github.com/kimishpatel
2023-08-18 21:29:09 +00:00
Jerry Zhang
d3c4ec767b [quant][pt2e] Fix handling for SharedQuantizationSpec (#106922)
Summary:
Previously if we have:
```
conv1 -> cat
conv2  /
```
and configure output of conv1/conv2 to be int8 quantized, and cat also int8 quantized and with shared inputs,
it will not produce expected results (input of cat will not be shared)

The problem is that there is some missing checks when inserting observers for input for cat

This PR fixes the problem.

Fixes: https://github.com/pytorch/pytorch/issues/106760
Test Plan:
python tes/test_quantization.py TestQuantzePT2E.test_shared_qspec

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106922
Approved by: https://github.com/kimishpatel
2023-08-16 21:16:45 +00:00
Jiaxu Zhu
152203d3c3 [pytorch][ao] Add torch.matmul in FloatFunctional/QFunctional (#106831)
Summary: As title

Test Plan: new unit tests

Differential Revision: D48172841

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106831
Approved by: https://github.com/jerryzh168
2023-08-10 22:43:36 +00:00
Jerry Zhang
79449e6272 [quant][pt2e][fix] Remove the requirement of using no_grad for reference model that contains quantized conv2d (#106924)
Summary:
att

we don't actually need gradient for conv2d, just need it to run without error, so we delayed the error of out_dtype gradient
to the time when user actually requested it

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_conv2d

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106924
Approved by: https://github.com/zou3519, https://github.com/kimishpatel
2023-08-10 19:16:10 +00:00
Jerry Zhang
97ce979e5d [quant][pt2e] Add reference representation for quantized conv2d (#105784)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_quantize_dequantize_per_channel

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105784
Approved by: https://github.com/kimishpatel
ghstack dependencies: #105783
2023-08-09 22:41:35 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
a44c072c89 Make InternalModel and Resnet work with rexportable flow (#106676)
Summary: Internal model and Resnet uses "re-export" flow now. Also did some refactoring to make the code little cleaner

Some changes for OSS:
1. Correctly use the "cached" fake tensors so that static symbols are still resolved to static
2. Change logic in PassBase to allocate static shapes for parameters
3. Add "is_torch_exported" tag to every node to make it survive during various graph transformations.
4. Added experimental wrapper API for quantization team to get pre_dispatch=True graph. Note that it doesn't actually do that right now. But we plan to switch soon.

Test Plan: CI

Differential Revision: D47890878

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106676
Approved by: https://github.com/jerryzh168
2023-08-09 20:10:48 +00:00
Jerry Zhang
69ecad6f2b [quant][pt2e] Add reference representation for quantize_per_channel and dequantize_per_channel (#105783)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_quantize_dequantize_per_channel

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105783
Approved by: https://github.com/kimishpatel
2023-08-09 01:39:52 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Jerry Zhang
2156f0434c [quant][pt2e] Add reference representation for quantized adaptive_avg_pool2d (#105709)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_adaptive_avg_pool2d

Although right now it is not really testing things since there is some problem with dynamo export

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105709
Approved by: https://github.com/andrewor14
ghstack dependencies: #105708
2023-08-04 18:49:14 +00:00
Jerry Zhang
9e301949ec [quant][pt2e] Add reference representation for quantized max_pool2d (#105708)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_maxpool2d

Although right now it is not really testing things since there is some problem with dynamo export

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105708
Approved by: https://github.com/andrewor14
2023-08-04 08:19:52 +00:00
Jerry Zhang
820e68b58a [quant][pt2e] Add reference representation for quantized add - relu (#105707)
Summary:
Implementing reference representation for quantized ops we decided in https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit#heading=h.ov8z39149wy8

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_representation_add_relu

Although right now it is not really testing things since there is some problem with dynamo export
Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105707
Approved by: https://github.com/andrewor14
2023-08-03 00:42:06 +00:00
Jerry Zhang
ba387b8830 [easy][be] operator_config -> quantization_config renaming (#106479)
Summary:
att

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106479
Approved by: https://github.com/andrewor14
2023-08-03 00:36:44 +00:00
leslie-fang-intel
bfed2da2e4 [Quant][PT2E] Re-enable test case of conv add/add_relu recipe for x86inductorquantizer (#105638)
**Summary**
Re-enable the test case of `test_conv2d_binary_with_quantizer_api` and `test_conv2d_binary_unary_with_quantizer_api` for X86InductorQuantizer. We disable these 2 testcases previously due to the time out issue in internal CI.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_unary_with_quantizer_api
```

Differential Revision: [D47745372](https://our.internmc.facebook.com/intern/diff/D47745372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105638
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
2023-08-02 17:26:22 +00:00
Jerry Zhang
d528a137e0 [quant][pt2e][quantizer] Suppoert set_module_type in XNNPACKQuantizer (#106094)
Summary:
Added support to allow users to set configurations based on module type in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_type

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106094
Approved by: https://github.com/andrewor14
ghstack dependencies: #106087
2023-08-02 08:33:58 +00:00
Sergii Dymchenko
af37608276 Remove duplicate ops tests in test_quantized_op.py (#106398)
The duplicates are after https://github.com/pytorch/pytorch/pull/94170
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106398
Approved by: https://github.com/izaitsevfb, https://github.com/malfet, https://github.com/jerryzh168
2023-08-02 02:37:36 +00:00
Jerry Zhang
92a22a8098 [quant][pt2e][quantizer] Suppoert set_module_name in XNNPACKQuantizer (#106087)
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
2023-08-02 01:19:23 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
PyTorch MergeBot
93b2036bef Revert "[quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)"
This reverts commit 3ca71ed735.

Reverted https://github.com/pytorch/pytorch/pull/105894 on behalf of https://github.com/huydhn due to breaking executorch tests internally ([comment](https://github.com/pytorch/pytorch/pull/105894#issuecomment-1654831950))
2023-07-28 01:16:02 +00:00
Jerry Zhang
3ca71ed735 [quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894)
Summary:
Currently scale/zero_point for per tensor quant is stored as burnt in literals, this means these values can't be serialized in state_dict, this
PR changes them to buffers/Tensors so that they can be serialized

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D47770963](https://our.internmc.facebook.com/intern/diff/D47770963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105894
Approved by: https://github.com/kimishpatel
2023-07-26 20:15:06 +00:00
Jerry Zhang
3a77f9aaaf [quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885)
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations

Test Plan:
CIs

sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"

Differential Revision: D47727838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
2023-07-26 18:20:09 +00:00
lezcano
36ae359655 Update matmul decomp to match eager (#105850)
The decomposition was not updated after https://github.com/pytorch/pytorch/pull/95261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105850
Approved by: https://github.com/Chillee
2023-07-26 09:24:51 +00:00
vasiliy
8b34fa5e9b add basic cuda support for float8 dtypes (#105807)
Summary:

Ensures that creating tensors, copying, filling with zeroes, checking for nan works on cuda for the `float8` dtypes.  This should be enough for float8 emulation on cuda.

Note that I skipped the mul test - it's less trivial to add (need a new c++ macro), and there is no use case for it. We can follow up on that in the future.

Test Plan:

```
python test/test_quantization.py TestFloat8Dtype
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105807
Approved by: https://github.com/ezyang, https://github.com/jerryzh168, https://github.com/albanD
2023-07-25 03:43:36 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Jerry Zhang
143c83d637 [quant][pt2e][be] Remove unneeded code (#105676)
Summary:
att

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105676
Approved by: https://github.com/andrewor14
2023-07-21 00:51:22 +00:00
PaliC
9760ea58a3 fix lint (#105675)
Forward fix of the lint issues introduced by https://github.com/pytorch/pytorch/pull/104242
We are forward fixing as this PR contains Meta internal changes that would be tricky to revert smoothly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105675
Approved by: https://github.com/jerryzh168, https://github.com/albanD, https://github.com/atalman
2023-07-20 18:42:25 +00:00
Amadeusz Skrzypczak
b64bd4a5dd Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
PyTorch MergeBot
f2b15772ff Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)"
This reverts commit a9804130e5.

Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284))
2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak
a9804130e5 Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
Jerry Zhang
dff4e034b8 [quant][pt2e][be] Rename qnnpack quantizer to xnnpack quantizer (#105551)
Summary: att

Test Plan: sandcastle CI and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422894

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105551
Approved by: https://github.com/andrewor14
2023-07-20 03:52:40 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
leslie-fang-intel
fa6be2fa6f [Quant][PT2E] Remove x86 inductor pt2e backend config (#105039)
**Summary**
For the Quantization PT2E path, we recommend to use `X86InductorQuantizer` instead of backend config of `x86_inductor_pt2e_backend_config`. Remove the `x86_inductor_pt2e_backend_config` and the relevant testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105039
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-07-19 23:18:29 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Jerry Zhang
554052f321 [quant][pt2e][be] Rename prepare_pt2e_quantizer to prepare_pt2e (#105484)
Summary: att

Test Plan: sandcastle and OSS CI

Reviewed By: andrewor14

Differential Revision: D47422892

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105484
Approved by: https://github.com/andrewor14
2023-07-19 04:51:37 +00:00
Jerry Zhang
ed2b9f1af1 [quant][pt2e] rename _quantize_pt2e to quantize_pt2e (#105377)
Summary: att

Test Plan: CIs

Reviewed By: andrewor14

Differential Revision: D47234357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105377
Approved by: https://github.com/andrewor14
2023-07-18 16:46:05 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Jerry Zhang
7b4d080496 [quant][pt2e] Rename _pt2e to pt2e (#104668)
Summary:
X-link: https://github.com/pytorch/executorch/pull/3

att

Test Plan: Imported from OSS

Differential Revision: D47202807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668
Approved by: https://github.com/andrewor14
2023-07-15 06:34:17 +00:00
Tuan Tran
85745cd3d9 Fix bug in fuse_modules (#105069)
Summary: This diff fixes the issue reported in https://github.com/pytorch/pytorch/issues/105063 and also related to internal caffe2 bug (reproduced error in internal fb pytorch: N3945540)

Test Plan: Wait for sandcastle with the added unit test in caffe2/torch/ao/quantization/eager/test_fuse_eager

Differential Revision: D47402357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105069
Approved by: https://github.com/jerryzh168
2023-07-13 23:39:59 +00:00
Andrew Or
4b29829ece [quant][pt2] Fix QAT convert for mobilenetv2 (#104110)
Summary:
QAT convert for mobilenetv2 was previously not working
because we incorrectly applied dropout during eval as well as
training. This is because, for exported models, model.eval() does
not change the behavior of dropout, unlike models with torch ops.
This commit simulates the effects of model.eval() for exported
models as well by replacing the aten dropout pattern before eval.
As of this commit, end-to-end QAT numerics now match for
mobilenetv2 between FX and PT2.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Differential Revision: D46750343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110
Approved by: https://github.com/jerryzh168
2023-07-11 18:42:42 +00:00
Jerry Zhang
c42de84708 [quant] Skip some x86 quantizer tests for now due to time out (#104666)
Summary: att

Test Plan: sandcastle ci

Reviewed By: malfet

Differential Revision: D47234616

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104666
Approved by: https://github.com/DanilBaibak
2023-07-06 17:34:13 +00:00
leslie-fang-intel
8e2e2d730e [Quant][PT2E]Accelerate test of conv2d_add and conv2d_add_relu by reducing test configs (#104686)
**Summary**
Reduce the test time of `test_conv2d_binary_with_quantizer_api` and `test_conv2d_binary_unary_with_quantizer_api`.
* For `test_conv2d_binary_with_quantizer_api`, reduce the number of test config from 12 to 2.
* For `test_conv2d_binary_unary_with_quantizer_api`, reduce the number of test config from 24 to 2.

**Test Plan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_with_quantizer_api
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary_unary_with_quantizer_api
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104686
Approved by: https://github.com/jerryzh168
2023-07-06 07:34:46 +00:00
Jerry Zhang
611febf6cf [quant] Support integer implementations for max_pool2d (#104225)
Summary:
This is needed for representing quantized model in pt2 export quantization flow

Test Plan:
tested by opinfo, python test/test_ops.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104225
Approved by: https://github.com/kimishpatel
2023-07-05 23:54:07 +00:00
leslie-fang-intel
2a21469a77 [Quant][PT2E] Enable conv2d unary and binary recipe for x86 inductor quantizer (#98826)
**Summary**

- Recipe to annotate `conv2d_relu` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add` for `X86InductorQuantizer` is added.
- Recipe to annotate `conv2d_add_relu` for `X86InductorQuantizer` is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98826
Approved by: https://github.com/jerryzh168
2023-07-04 00:01:10 +00:00
Kimish Patel
bd0f0f40a1 [PT2][Quant] Enable symbolic shape in linear quantization (#104473)
When tracing with symbolic shapes, arbitrary sym_size nodes can appear in the
graph. Earlier changes did not account for this and quantizer fails to annotate
the right nodes. This diff fixes that by not annotating sym_size nodes, which
should really not be relevant for quantization.

As next steps, we should validate in quant workflow that a) sym_int nodes are not
being quantized and b) add similar support, as this diff, for generic
annotations

Differential Revision: [D47132050](https://our.internmc.facebook.com/intern/diff/D47132050/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104473
Approved by: https://github.com/jerryzh168
2023-07-01 05:14:30 +00:00
Jerry Zhang
ecca9591d5 [quant][pt2e] Add reference representation for quantize/dequantize operators (#104395)
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators

Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: kimishpatel

Differential Revision: D46959928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
2023-06-30 04:32:18 +00:00
leslie-fang-intel
945a257277 [Quant][PT2E] Supported customized _EQUIVALENT_TYPES in Module Partition API (#102516)
**Summary**
`Module Partition API` can simplify the pattern match process in Quantization annotation. However, current implementation of
`Module Partition API` has hardcoded `_EQUIVALENT_TYPES` 999bae0f54/torch/ao/quantization/_pt2e/graph_utils.py (L13-L20). So, PyTorch Extension Libraries such as [intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch) can't use `Module Partition API` with customized `_EQUIVALENT_TYPES` . In this PR, we plan to enable customized `_EQUIVALENT_TYPES` by pass in parameter.

**Test Plan**
```
python -m pytest test_graph_utils.py -k test_customized_equivalet_types_dict
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102516
Approved by: https://github.com/jgong5, https://github.com/kimishpatel
2023-06-28 00:20:25 +00:00
Jerry Zhang
c98896b76f [quant][pt2e] Add more precise representation for quantized add (#104130)
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:

float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...

inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:

```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor   /
```

Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:

```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
    x = (x_scale / out_scale) * x_i8
    y = (y_scale / out_scale) * y_i8
    out = x + y
    out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
    out += out_zero_point
    return out
```

Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D45628032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
2023-06-27 20:11:30 +00:00
HDCharles
8176cd8c0f [ao] fixing quantized prelu workflow (#103455)
Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu
was not running its observers when the quantization flow was being run,
this was a bug which is now fixed and the relevant prelu tests also now
check for this. Also added a corrected observer for PReLU to
qconfig_mapping

Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455
Approved by: https://github.com/jerryzh168
2023-06-23 16:45:40 +00:00
Andrew Or
7320ef5651 [quant][pt2] Add prepare QAT test for mobilenetv2 (#104068)
Summary:
Prepare QAT for mobilenetv2 has matching numerics with
FX. There were two changes needed to achieve this, however.
First, this commit adds observer sharing for ReLU6, which is
used extensively throughout this model. Second, in the tests we
have to use the same manual seed every time we call the models
in order to get the same results between FX and PT2. This is
because there is a dropout at the end of the model.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Reviewed By: kimishpatel

Differential Revision: D46707786

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104068
Approved by: https://github.com/jerryzh168
2023-06-23 16:34:25 +00:00
leslie-fang-intel
fcb7a47f8b [Quant][PT2E]Fix the maxpool2d input observer didn't insert after QuantizationAnnotation API (#101941)
**Summary**
The previous UT has been broken accidently, since the output of conv2d node has been annotated by mistake.
Re-enable these UTs for case:

- Single `conv2d` node, if we don't annotate the output node of `conv2d`. There should be no fake quant at conv2d's output.
-  For `conv2d-maxpool` pattern, `maxpool` should has fake quant inserted at input and output node since we annotate these nodes.

**Test Plan**
```
python -m pytest test_quantize_pt2e.py -k test_wo_annotate_conv_output_quantizer
python -m pytest test_quantize_pt2e.py -k test_max_pool2d_quantizer
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101941
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-23 11:50:31 +00:00
Andrew Or
303ff84b04 [quant][pt2] Update special qspecs after QAT rewrite (#103970)
Summary:
Special qspecs like `SharedQuantizationSpec` and
`DerivedQuantizationSpec` refer to other nodes in the graph.
However, after subgraph rewriting in QAT, the nodes referred
to in these special qspecs may be replaced by new nodes.
This could lead to the following error when inserting
observers according to these qspecs:

```
AssertionError: please make sure only refer to edge or node
that has observer/fake_quant inserted: 'getitem' not in
dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3])
```

This commit fixes this by keeping track of the nodes that
are replaced during subgraph rewriting in QAT, and using
this mapping to update the dangling references used in these
special qspecs.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec

Reviewed By: jerryzh168

Differential Revision: D46606614

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970
Approved by: https://github.com/jerryzh168
2023-06-22 20:05:57 +00:00
Omkar Salpekar
ae1ed27756 [codemod][numpy] replace np.str with str (#103931)
Summary:
`np.str` is removed from numpy 1.20.0. It was an alias to builtin `str` and it's safe to do the replacement.

The whole changes is mechanical, generated using the following onliner:
```
fbgr -sl 'np\.str\b' | xargs perl -pi -e 's,\bnp\.str\b,str,g'
```

Test Plan: sandcastle

Differential Revision: D46586144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103931
Approved by: https://github.com/huydhn
2023-06-21 18:16:42 +00:00
Andrew Or
873f772df2 [quant][pt2] Fix QAT convert for resnet18 (#103759)
Summary:
Before this commit, only prepare QAT numerics matched
between PT2 and FX for resnet18. Convert numerics diverged,
however, for two reasons:

(1) Existing patterns did not handle inplace ReLUs. This commit
fixes this by adding extra patterns that use these ReLUs instead
of the normal ones.

(2) Subgraph rewriter could not handle skip connections in
quantized models, because the dequantize node is used in both
the conv node within the match pattern, and an inplace add node
outside of the match pattern. This led the subgraph matcher to
filter out the match, complaining that it was not self contained.
This commit fixes this problem by duplicating the dequantize
nodes, one for each user, such that subsequent matches will
be self contained.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_resnet18

Reviewed By: jerryzh168

Differential Revision: D46564114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103759
Approved by: https://github.com/jerryzh168
2023-06-21 15:36:07 +00:00
leslie-fang-intel
dbc8eb2a8f [Quant][PT2E]Enable x86 inductor quantizer (#98730)
**Summary**

- Enable `X86InductorQuantizer` basics.
- Recipe to annotate conv2d is added.

**Test Plan**
```
python -u -m pytest -s -v test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98730
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-06-17 06:10:23 +00:00
Andrew Or
2bc56bec07 [quant][pt2] Handle literal conv args in convert QAT (#103731)
Summary:
Similar to the prepare case, we need to manually copy
over literal conv args such as padding and stride to the new,
replaced conv nodes, since these args are not captured by the
subgraph rewriter.

Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion_literal_args

Reviewed By: jerryzh168

Differential Revision: D46383130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103731
Approved by: https://github.com/jerryzh168
2023-06-16 17:15:37 +00:00