Commit Graph

56 Commits

Author SHA1 Message Date
Maggie Moss
b13cd141b3 Add pyrefly suppressions (#164748)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the `project-excludes` field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

0 errors (4,263 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748
Approved by: https://github.com/oulgen
2025-10-07 17:31:18 +00:00
Xuehai Pan
279cae52e7 [BE][PYFMT] migrate PYFMT for torch/ao/ to ruff format (#148185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148185
Approved by: https://github.com/ezyang
2025-06-14 16:47:04 +00:00
Aaron Orenstein
d782e46a36 [BE] typing for decorators - library (#138969)
Test Plan: unit tests

Differential Revision: D62302678

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138969
Approved by: https://github.com/zou3519
2025-01-15 17:08:55 +00:00
bobrenjc93
a55977f763 Migrate from Tuple -> tuple in torch/ao (#144265)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144265
Approved by: https://github.com/aorenste
2025-01-10 00:12:06 +00:00
Xia, Weiwen
9827d677b4 [Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 (#141480)
Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer`
After `convert_pt2e`, the pattern will be
```
  x
  |
linear <- to_fp32 <- to_fp16 <- w
```

**Test plan**
```
pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141480
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-11-29 07:48:39 +00:00
Shen Xu
19a4d68224 Add missing mappings to support torch.uint16 in quantization and export (#136547)
Test Plan: CI.

Differential Revision: D63142844

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136547
Approved by: https://github.com/angelayi
2024-10-01 00:01:01 +00:00
Kimish Patel
e5a57932f0 [Pytorch][AO] Update choose_qparams_per_token op to output correct shape for scales and zp (#136807)
- also makes scales and zp dtype reconcile with meta impl as well as other
quantized ops representation of scales and zero point
- make sure qunatize_per_token's output_dtype is respected

There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack

Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136807
Approved by: https://github.com/jerryzh168
2024-09-27 18:46:17 +00:00
Scott Wolchok
e2b94923ba [PyTorch] Speed up decomposed quantize_per_channel (#133029)
Similar to D60871396 (#132828).

Differential Revision: [D60978385](https://our.internmc.facebook.com/intern/diff/D60978385/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133029
Approved by: https://github.com/cccclai
2024-08-08 23:48:34 +00:00
Scott Wolchok
eeb6ad0744 [quant] Speed up dequantize_per_channel (#132828)
Tensor-wise operations are much faster than looping over tensor elements. Rewrite loop in dequantize_per_channel to use whole-Tensor operations accordingly.

Differential Revision: [D60871396](https://our.internmc.facebook.com/intern/diff/D60871396/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132828
Approved by: https://github.com/cccclai
2024-08-08 16:44:41 +00:00
PyTorch MergeBot
a3ba405871 Revert "[BE] typing for decorators - library (#131570)"
This reverts commit 5731b486c8.

Reverted https://github.com/pytorch/pytorch/pull/131570 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
Aaron Orenstein
5731b486c8 [BE] typing for decorators - library (#131570)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131570
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569
2024-07-25 22:24:19 +00:00
Xuehai Pan
2ce734cee9 [BE] enable UFMT for torch/ao/quantization/ (#128863)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863
Approved by: https://github.com/ezyang
ghstack dependencies: #128861, #128862
2024-07-25 04:17:54 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Aaron Orenstein
62bcdc0ac9 Flip default value for mypy disallow_untyped_defs [4/11] (#127841)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841
Approved by: https://github.com/oulgen
2024-06-08 18:36:48 +00:00
andrewor14
3cba50e478 [quant] Make per_group and per_token quant match torch.fake_quantize (#125781)
Summary: Follow-up to https://github.com/pytorch/ao/pull/229.
This resolves the difference between `input.div(scales)` and
`input.mul(1.0 / scales)`, which results in small numerical
discrepancies on some inputs.

Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781
Approved by: https://github.com/jerryzh168
2024-05-14 18:18:54 +00:00
Amadeusz Skrzypczak
107f944f22 Support fp8 quantization (#123161)
This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E.

Motivation for using fp8 quantization instead of int8:
- it works better to run inference with the same datatype the model was trained with,
- fp8 can handle outliers better, which is one of the problems in LLMs activations.

The numerical recipe we want to use it for is fp8 inference:
- bgemms/gemms running in float8_e4m3fn,
- Per-Tensor-Quantization/Scaling,
- amax observer for measurement with input_backoff and weight_backoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 13:35:27 +00:00
Aaron Gokaslan
c5fafe9f48 [BE]: TRY002 - Ban raising vanilla exceptions (#124570)
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.

I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
2024-04-21 22:26:40 +00:00
andrewor14
3eea300680 [quant] Do not decompose choose_qparams_per_token_asymmetric (#124178)
Summary: https://github.com/pytorch/pytorch/pull/123452 added
backward support to this op by turning it into
CompositeImplicitAutograd, which meant it gets decomposed during
export/compile. However, this is not desirable behavior for the
PTQ case when we try to lower the model. This commit enables
QAT without breaking PTQ by refactoring the impl into a separate
op that does have backward support.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai, zou3519

Subscribers: jerryzh168, digantdesai, zou3519, supriyar

Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178
Approved by: https://github.com/digantdesai
2024-04-16 22:58:48 +00:00
andrewor14
762e19606e [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519
2024-04-12 20:05:56 +00:00
PyTorch MergeBot
f0eb162730 Revert "Switch quantized_decomposed over to new custom ops API (#123454)"
This reverts commit 638729c0cd.

Reverted https://github.com/pytorch/pytorch/pull/123454 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123454#issuecomment-2051738976))
2024-04-12 13:14:59 +00:00
PyTorch MergeBot
fe092da874 Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)"
This reverts commit c83900887f.

Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs c83900887f https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))
2024-04-11 16:19:28 +00:00
andrewor14
c83900887f [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168
2024-04-11 14:51:42 +00:00
rzou
638729c0cd Switch quantized_decomposed over to new custom ops API (#123454)
We are taking API feedback. Changes:
- I removed some of the default values (they weren't being used).
- I was unable to convert the last op (which is essentially an
  autograd.Function registered as CompositeImplicitAutograd). That one
  is "incorrectly registered"; I punt fixing it to the future.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123454
Approved by: https://github.com/andrewor14
ghstack dependencies: #123453, #123578
2024-04-11 13:18:06 +00:00
andrewor14
fe29a8fbea [quant][be] Simplify fake_quant_per_channel (#123186)
Summary: We probably don't need
`torch._C._AutoDispatchBelowAutograd()`, which is to prevent
infinite recursion if the implementation calls itself. Let's
remove it and see if anything breaks. The other major change
is registering the op to the more general Autograd dispatch
key so it can be used on cuda as well.

Test Plan:
python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel

Reviewers: zou3519, bdhirsh

Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186
Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel
2024-04-03 18:06:45 +00:00
Guang Yang
c677221798 remove torchao dependency (#122524)
Test Plan:
CI

```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp32 --pt2e_quantize "xnnpack_dynamic" -2
```

```
buck run //executorch/backends/xnnpack/test:test_xnnpack_ops -- executorch.backends.xnnpack.test.ops.linear.TestLinear.test_qd8_fp32_per_token_weight_per_channel_group_int4
```

Differential Revision: D55263008

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122524
Approved by: https://github.com/jerryzh168
2024-03-23 03:18:43 +00:00
Manuel Candales
c53e3f57b5 allow fp16 in quant/dequant decompositions (#121738)
Test Plan:
```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp16 --pt2e_quantize "xnnpack_dynamic" -2
```

Reviewed By: kirklandsign

Differential Revision: D54785950

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121738
Approved by: https://github.com/jerryzh168
2024-03-13 21:45:08 +00:00
Manuel Candales
6d8a7d6e58 [pytorch] optional zero points on dequantize per channel (#121724)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2364

bypass-github-export-checks

Test Plan: sandcastle

Reviewed By: mikekgfb

Differential Revision: D54709217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121724
Approved by: https://github.com/mikekgfb
2024-03-12 19:54:11 +00:00
kausik
edf22f3a48 Modify signature of dequantize ops for decomposed quantized Tensor (#119173) (#121450)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2308

Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any.

At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization.

This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel

Reviewed By: digantdesai

Differential Revision: D53590486

Pulled By: manuelcandales

Co-authored-by: kausik <kmaiti@habana.ai>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450
Approved by: https://github.com/jerryzh168
2024-03-12 12:36:31 +00:00
leslie-fang-intel
975d428425 [Quant] Add the operator of decomposed fake quant per channel (#121297)
**Summary**
Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen.

**Test Plan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel
```

**Next Step**
Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-03-08 10:51:37 +00:00
leslie-fang-intel
84de851539 [Inductor] Enable the decomposition of quant/dequant per channel (#119177)
**Summary**
Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type.
Enable decomposition of quant/dequant per channel to make it vectorized code generation.

**TestPlan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input
```

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177
Approved by: https://github.com/peterbell10, https://github.com/jansel
2024-02-19 01:30:44 +00:00
leslie-fang-intel
6ba2748690 [Quant] [PT2] Enable Decomposed quant per tensor/channel to accept bfloat16 input (#112225)
**Summary**
- PR 4 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640.
- Enable `decomposed quant_per_tensor` and `quant_per_channel` accepts bfloat16 input.

**TestPlan**
```
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_tensor_bfloat16_input
python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_channel_bfloat16_input
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112225
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-11-03 23:47:43 +00:00
Jerry Zhang
32a16d4999 [quant][pt2e] Support int16 quantization (#108453)
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
2023-09-06 19:31:20 +00:00
Jerry Zhang
ecca9591d5 [quant][pt2e] Add reference representation for quantize/dequantize operators (#104395)
Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators

Test Plan:
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: kimishpatel

Differential Revision: D46959928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395
Approved by: https://github.com/andrewor14
2023-06-30 04:32:18 +00:00
Jerry Zhang
c98896b76f [quant][pt2e] Add more precise representation for quantized add (#104130)
Summary:
The planned e2e for quantization in pytorch 2.0 export is the following:

float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ...

inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of
convert_to_reference_fx in fx grah mode quantization:

```
torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor
torch.ops.quantized_decomposed.dequantize_per_tensor   /
```

Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since
here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for
quantized add is:

```
def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point):
    x = (x_scale / out_scale) * x_i8
    y = (y_scale / out_scale) * y_i8
    out = x + y
    out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale
    out += out_zero_point
    return out
```

Test Plan:
```
buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Reviewed By: kimishpatel

Differential Revision: D45628032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130
Approved by: https://github.com/kimishpatel
2023-06-27 20:11:30 +00:00
Jerry Zhang
ce8d31551b [quant][be] Change return type for zero_point to be int32 Tensor (#102234)
Summary: This is probably a typo

Test Plan: CI

Reviewed By: salilsdesai

Differential Revision: D46172706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102234
Approved by: https://github.com/salilsdesai
2023-06-01 18:30:44 +00:00
Kazuaki Ishizaki
a13a63ae9a Fix typos under torch/ao directory (#97679)
This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-04-10 22:25:15 +00:00
Jacob Szwejbka
fc324d3485 [quant][pt2e] Add support for dynamic quantization with symmetric quant for input (#94854)
Summary:
Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization
for the input in dynamic quantization

Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic"

Reviewed By: digantdesai

Differential Revision: D43134794

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854
Approved by: https://github.com/digantdesai
2023-02-28 19:39:31 +00:00
PyTorch MergeBot
641dc0b844 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 782e4f5c02.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv
2023-02-13 09:20:37 +00:00
Jacob Szwejbka
2628901033 [Executorch][Quant] Add Choose_qparams_symmetric (#94685)
Summary: needed for symmetric dynamic quant flow

Test Plan: todo

Reviewed By: jerryzh168

Differential Revision: D43134117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94685
Approved by: https://github.com/larryliu0820
2023-02-13 07:27:48 +00:00
Jerry Zhang
782e4f5c02 [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-10 01:40:12 +00:00
Jacob Szwejbka
bb48d90b00 [Executorch][Quant][BE] Refactor Choose_Qparams (#94338)
Summary: Refactor so that it can be decomposed

Test Plan: ci

Differential Revision: D42681268

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94338
Approved by: https://github.com/jerryzh168
2023-02-09 01:20:17 +00:00
PyTorch MergeBot
3a5a762443 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 3fd46a2f9c.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace 3fd46a2f9c.  Please rebase and re-land it
2023-02-08 18:29:10 +00:00
Jerry Zhang
3fd46a2f9c [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-08 17:26:01 +00:00
Nikita Shulga
c0dd9b3b67 Revert "[Executorch][Quantization][BE] Refactor Choose Qparams (#92592)"
This reverts commit 59071ab1e7.

It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.
2023-01-23 09:13:02 -08:00
Jacob Szwejbka
59071ab1e7 [Executorch][Quantization][BE] Refactor Choose Qparams (#92592)
Summary: Should hopefully be a little faster. Definitely cleaner to not create an observer inside the op

Test Plan: ci

Differential Revision: D42154677

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92592
Approved by: https://github.com/jerryzh168
2023-01-20 01:36:47 +00:00
Jerry Zhang
2a23dfe8ed [quant] Support lowering for quantized embedding byte operator (#91159)
Summary: This PR adds lowering for embedding in quantization in executorch flow

Test Plan: buck run executorch/exir/tests:quant_fusion_pass -- "executorch.exir.tests.test_quant_fusion_pass.TestQuantFusionPass.test_embedding_byte"

Reviewed By: qihqi

Differential Revision: D41673139

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91159
Approved by: https://github.com/vkuzo
2022-12-21 22:52:24 +00:00
Jacob Szwejbka
bd94ee66ea [quantized] [executorch] typo (#89960)
Summary: Inefficient impl in python

Test Plan: buck2 test mode/dev //caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_quantized_embedding_byte (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)'

Differential Revision: D41627744

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89960
Approved by: https://github.com/jerryzh168
2022-12-16 19:49:09 +00:00
Jerry Zhang
94b9bb324f [quant] Add example for lowering quantized dynamic linear pattern through delegation (#90640)
Summary: Only the pattern part, will leave the delegation example to Chen

Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic"

Reviewed By: cccclai

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90640
Approved by: https://github.com/cccclai
2022-12-13 00:57:33 +00:00
Edward Z. Yang
a747326423 Add manual meta implementations to quantize_per_tensor.tensor and co (#89958)
When you are writing a meta function, you cannot call item() on the tensor because there is no real data on the tensor and it will fail. The error message was not very good in this case, see also https://github.com/pytorch/pytorch/issues/89959

This PR takes a brute force approach to resolving the problem: just manually define meta implementations for the naughty functions that are calling item(). However, this results in a lot of code duplication. The easiest way to avoid this situation is to rewrite the decomps so they don't call item. It should not be that difficult to use direct tensors on your operations, as scalar tensors can broadcast too.

I could only test this with `buck test @mode/opt -c python.package_style=inplace //executorch/backends/test:test_backends` in internal with D41555454. Test coverage needs to be improved, otherwise don't blame us when we break you.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89958
Approved by: https://github.com/jerryzh168
2022-12-01 06:04:37 +00:00
Jerry Zhang
9e4a25c731 [quant][decomposed] Add support for int32 for decomposed q/dq ops (#89881)
Summary:
att

Test Plan:
python test/test_quantization.py -k test_decomposed_quantize_per_tensor
python test/test_qunatization.py -k test_decomposed_dequantize_per_tensor

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89881
Approved by: https://github.com/cccclai
2022-11-30 21:24:00 +00:00