pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Maggie Moss	b13cd141b3	Add pyrefly suppressions (#164748 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the `project-excludes` field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: 0 errors (4,263 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748 Approved by: https://github.com/oulgen	2025-10-07 17:31:18 +00:00
Xuehai Pan	279cae52e7	[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format` (#148185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148185 Approved by: https://github.com/ezyang	2025-06-14 16:47:04 +00:00
Aaron Orenstein	d782e46a36	[BE] typing for decorators - library (#138969 ) Test Plan: unit tests Differential Revision: D62302678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138969 Approved by: https://github.com/zou3519	2025-01-15 17:08:55 +00:00
bobrenjc93	a55977f763	Migrate from Tuple -> tuple in torch/ao (#144265 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144265 Approved by: https://github.com/aorenste	2025-01-10 00:12:06 +00:00
Xia, Weiwen	9827d677b4	[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 (#141480 ) Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer` After `convert_pt2e`, the pattern will be ``` x \| linear <- to_fp32 <- to_fp16 <- w ``` Test plan ``` pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141480 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-11-29 07:48:39 +00:00
Shen Xu	19a4d68224	Add missing mappings to support torch.uint16 in quantization and export (#136547 ) Test Plan: CI. Differential Revision: D63142844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136547 Approved by: https://github.com/angelayi	2024-10-01 00:01:01 +00:00
Kimish Patel	e5a57932f0	[Pytorch][AO] Update choose_qparams_per_token op to output correct shape for scales and zp (#136807 ) - also makes scales and zp dtype reconcile with meta impl as well as other quantized ops representation of scales and zero point - make sure qunatize_per_token's output_dtype is respected There are a few places where we need to reconcile on scale and zero point dtype but that will come later. This fixes are mainly being done to enable quantized kv cache though ET stack Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136807 Approved by: https://github.com/jerryzh168	2024-09-27 18:46:17 +00:00
Scott Wolchok	e2b94923ba	[PyTorch] Speed up decomposed quantize_per_channel (#133029 ) Similar to D60871396 (#132828). Differential Revision: [D60978385](https://our.internmc.facebook.com/intern/diff/D60978385/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133029 Approved by: https://github.com/cccclai	2024-08-08 23:48:34 +00:00
Scott Wolchok	eeb6ad0744	[quant] Speed up dequantize_per_channel (#132828 ) Tensor-wise operations are much faster than looping over tensor elements. Rewrite loop in dequantize_per_channel to use whole-Tensor operations accordingly. Differential Revision: [D60871396](https://our.internmc.facebook.com/intern/diff/D60871396/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132828 Approved by: https://github.com/cccclai	2024-08-08 16:44:41 +00:00
PyTorch MergeBot	a3ba405871	Revert "[BE] typing for decorators - library (#131570 )" This reverts commit `5731b486c8`. Reverted https://github.com/pytorch/pytorch/pull/131570 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
Aaron Orenstein	5731b486c8	[BE] typing for decorators - library (#131570 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131570 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569	2024-07-25 22:24:19 +00:00
Xuehai Pan	2ce734cee9	[BE] enable UFMT for `torch/ao/quantization/` (#128863 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863 Approved by: https://github.com/ezyang ghstack dependencies: #128861, #128862	2024-07-25 04:17:54 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
andrewor14	3cba50e478	[quant] Make per_group and per_token quant match torch.fake_quantize (#125781 ) Summary: Follow-up to https://github.com/pytorch/ao/pull/229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781 Approved by: https://github.com/jerryzh168	2024-05-14 18:18:54 +00:00
Amadeusz Skrzypczak	107f944f22	Support fp8 quantization (#123161 ) This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E. Motivation for using fp8 quantization instead of int8: - it works better to run inference with the same datatype the model was trained with, - fp8 can handle outliers better, which is one of the problems in LLMs activations. The numerical recipe we want to use it for is fp8 inference: - bgemms/gemms running in float8_e4m3fn, - Per-Tensor-Quantization/Scaling, - amax observer for measurement with input_backoff and weight_backoff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 13:35:27 +00:00
Aaron Gokaslan	c5fafe9f48	[BE]: TRY002 - Ban raising vanilla exceptions (#124570 ) Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR. I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570 Approved by: https://github.com/ezyang	2024-04-21 22:26:40 +00:00
andrewor14	3eea300680	[quant] Do not decompose choose_qparams_per_token_asymmetric (#124178 ) Summary: https://github.com/pytorch/pytorch/pull/123452 added backward support to this op by turning it into CompositeImplicitAutograd, which meant it gets decomposed during export/compile. However, this is not desirable behavior for the PTQ case when we try to lower the model. This commit enables QAT without breaking PTQ by refactoring the impl into a separate op that does have backward support. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai, zou3519 Subscribers: jerryzh168, digantdesai, zou3519, supriyar Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178 Approved by: https://github.com/digantdesai	2024-04-16 22:58:48 +00:00
andrewor14	762e19606e	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519	2024-04-12 20:05:56 +00:00
PyTorch MergeBot	f0eb162730	Revert "Switch quantized_decomposed over to new custom ops API (#123454 )" This reverts commit `638729c0cd`. Reverted https://github.com/pytorch/pytorch/pull/123454 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123454#issuecomment-2051738976))	2024-04-12 13:14:59 +00:00
PyTorch MergeBot	fe092da874	Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 )" This reverts commit `c83900887f`. Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs `c83900887f` https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))	2024-04-11 16:19:28 +00:00
andrewor14	c83900887f	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168	2024-04-11 14:51:42 +00:00
rzou	638729c0cd	Switch quantized_decomposed over to new custom ops API (#123454 ) We are taking API feedback. Changes: - I removed some of the default values (they weren't being used). - I was unable to convert the last op (which is essentially an autograd.Function registered as CompositeImplicitAutograd). That one is "incorrectly registered"; I punt fixing it to the future. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123454 Approved by: https://github.com/andrewor14 ghstack dependencies: #123453, #123578	2024-04-11 13:18:06 +00:00
andrewor14	fe29a8fbea	[quant][be] Simplify fake_quant_per_channel (#123186 ) Summary: We probably don't need `torch._C._AutoDispatchBelowAutograd()`, which is to prevent infinite recursion if the implementation calls itself. Let's remove it and see if anything breaks. The other major change is registering the op to the more general Autograd dispatch key so it can be used on cuda as well. Test Plan: python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel Reviewers: zou3519, bdhirsh Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186 Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel	2024-04-03 18:06:45 +00:00
Guang Yang	c677221798	remove torchao dependency (#122524 ) Test Plan: CI ``` buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp32 --pt2e_quantize "xnnpack_dynamic" -2 ``` ``` buck run //executorch/backends/xnnpack/test:test_xnnpack_ops -- executorch.backends.xnnpack.test.ops.linear.TestLinear.test_qd8_fp32_per_token_weight_per_channel_group_int4 ``` Differential Revision: D55263008 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122524 Approved by: https://github.com/jerryzh168	2024-03-23 03:18:43 +00:00
Manuel Candales	c53e3f57b5	allow fp16 in quant/dequant decompositions (#121738 ) Test Plan: ``` buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp16 --pt2e_quantize "xnnpack_dynamic" -2 ``` Reviewed By: kirklandsign Differential Revision: D54785950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121738 Approved by: https://github.com/jerryzh168	2024-03-13 21:45:08 +00:00
Manuel Candales	6d8a7d6e58	[pytorch] optional zero points on dequantize per channel (#121724 ) Summary: X-link: https://github.com/pytorch/executorch/pull/2364 bypass-github-export-checks Test Plan: sandcastle Reviewed By: mikekgfb Differential Revision: D54709217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121724 Approved by: https://github.com/mikekgfb	2024-03-12 19:54:11 +00:00
kausik	edf22f3a48	Modify signature of dequantize ops for decomposed quantized Tensor (#119173 ) (#121450 ) Summary: X-link: https://github.com/pytorch/executorch/pull/2308 Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any. At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization. This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel Reviewed By: digantdesai Differential Revision: D53590486 Pulled By: manuelcandales Co-authored-by: kausik <kmaiti@habana.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450 Approved by: https://github.com/jerryzh168	2024-03-12 12:36:31 +00:00
leslie-fang-intel	975d428425	[Quant] Add the operator of decomposed fake quant per channel (#121297 ) Summary Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen. Test Plan ``` python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel ``` Next Step Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2024-03-08 10:51:37 +00:00
leslie-fang-intel	84de851539	[Inductor] Enable the decomposition of quant/dequant per channel (#119177 ) Summary Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type. Enable decomposition of quant/dequant per channel to make it vectorized code generation. TestPlan ``` python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8 python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8 python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input ``` Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177 Approved by: https://github.com/peterbell10, https://github.com/jansel	2024-02-19 01:30:44 +00:00
leslie-fang-intel	6ba2748690	[Quant] [PT2] Enable Decomposed quant per tensor/channel to accept bfloat16 input (#112225 ) Summary - PR 4 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable `decomposed quant_per_tensor` and `quant_per_channel` accepts bfloat16 input. TestPlan ``` python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_tensor_bfloat16_input python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_channel_bfloat16_input ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112225 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-03 23:47:43 +00:00
Jerry Zhang	32a16d4999	[quant][pt2e] Support int16 quantization (#108453 ) Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453 Approved by: https://github.com/kimishpatel	2023-09-06 19:31:20 +00:00
Jerry Zhang	ecca9591d5	[quant][pt2e] Add reference representation for quantize/dequantize operators (#104395 ) Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators Test Plan: buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' Reviewed By: kimishpatel Differential Revision: D46959928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395 Approved by: https://github.com/andrewor14	2023-06-30 04:32:18 +00:00
Jerry Zhang	c98896b76f	[quant][pt2e] Add more precise representation for quantized add (#104130 ) Summary: The planned e2e for quantization in pytorch 2.0 export is the following: float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ... inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of convert_to_reference_fx in fx grah mode quantization: ``` torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor torch.ops.quantized_decomposed.dequantize_per_tensor / ``` Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for quantized add is: ``` def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point): x = (x_scale / out_scale) * x_i8 y = (y_scale / out_scale) * y_i8 out = x + y out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale out += out_zero_point return out ``` Test Plan: ``` buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' ``` Reviewed By: kimishpatel Differential Revision: D45628032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130 Approved by: https://github.com/kimishpatel	2023-06-27 20:11:30 +00:00
Jerry Zhang	ce8d31551b	[quant][be] Change return type for zero_point to be int32 Tensor (#102234 ) Summary: This is probably a typo Test Plan: CI Reviewed By: salilsdesai Differential Revision: D46172706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102234 Approved by: https://github.com/salilsdesai	2023-06-01 18:30:44 +00:00
Kazuaki Ishizaki	a13a63ae9a	Fix typos under torch/ao directory (#97679 ) This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-04-10 22:25:15 +00:00
Jacob Szwejbka	fc324d3485	[quant][pt2e] Add support for dynamic quantization with symmetric quant for input (#94854 ) Summary: Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization for the input in dynamic quantization Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic" Reviewed By: digantdesai Differential Revision: D43134794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854 Approved by: https://github.com/digantdesai	2023-02-28 19:39:31 +00:00
PyTorch MergeBot	641dc0b844	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `782e4f5c02`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv	2023-02-13 09:20:37 +00:00
Jacob Szwejbka	2628901033	[Executorch][Quant] Add Choose_qparams_symmetric (#94685 ) Summary: needed for symmetric dynamic quant flow Test Plan: todo Reviewed By: jerryzh168 Differential Revision: D43134117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94685 Approved by: https://github.com/larryliu0820	2023-02-13 07:27:48 +00:00
Jerry Zhang	782e4f5c02	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-10 01:40:12 +00:00
Jacob Szwejbka	bb48d90b00	[Executorch][Quant][BE] Refactor Choose_Qparams (#94338 ) Summary: Refactor so that it can be decomposed Test Plan: ci Differential Revision: D42681268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94338 Approved by: https://github.com/jerryzh168	2023-02-09 01:20:17 +00:00
PyTorch MergeBot	3a5a762443	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `3fd46a2f9c`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace `3fd46a2f9c`. Please rebase and re-land it	2023-02-08 18:29:10 +00:00
Jerry Zhang	3fd46a2f9c	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-08 17:26:01 +00:00
Nikita Shulga	c0dd9b3b67	Revert "[Executorch][Quantization][BE] Refactor Choose Qparams (#92592 )" This reverts commit `59071ab1e7`. It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.	2023-01-23 09:13:02 -08:00
Jacob Szwejbka	59071ab1e7	[Executorch][Quantization][BE] Refactor Choose Qparams (#92592 ) Summary: Should hopefully be a little faster. Definitely cleaner to not create an observer inside the op Test Plan: ci Differential Revision: D42154677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92592 Approved by: https://github.com/jerryzh168	2023-01-20 01:36:47 +00:00
Jerry Zhang	2a23dfe8ed	[quant] Support lowering for quantized embedding byte operator (#91159 ) Summary: This PR adds lowering for embedding in quantization in executorch flow Test Plan: buck run executorch/exir/tests:quant_fusion_pass -- "executorch.exir.tests.test_quant_fusion_pass.TestQuantFusionPass.test_embedding_byte" Reviewed By: qihqi Differential Revision: D41673139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91159 Approved by: https://github.com/vkuzo	2022-12-21 22:52:24 +00:00
Jacob Szwejbka	bd94ee66ea	[quantized] [executorch] typo (#89960 ) Summary: Inefficient impl in python Test Plan: buck2 test mode/dev //caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_quantized_embedding_byte (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' Differential Revision: D41627744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89960 Approved by: https://github.com/jerryzh168	2022-12-16 19:49:09 +00:00
Jerry Zhang	94b9bb324f	[quant] Add example for lowering quantized dynamic linear pattern through delegation (#90640 ) Summary: Only the pattern part, will leave the delegation example to Chen Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic" Reviewed By: cccclai Pull Request resolved: https://github.com/pytorch/pytorch/pull/90640 Approved by: https://github.com/cccclai	2022-12-13 00:57:33 +00:00
Edward Z. Yang	a747326423	Add manual meta implementations to quantize_per_tensor.tensor and co (#89958 ) When you are writing a meta function, you cannot call item() on the tensor because there is no real data on the tensor and it will fail. The error message was not very good in this case, see also https://github.com/pytorch/pytorch/issues/89959 This PR takes a brute force approach to resolving the problem: just manually define meta implementations for the naughty functions that are calling item(). However, this results in a lot of code duplication. The easiest way to avoid this situation is to rewrite the decomps so they don't call item. It should not be that difficult to use direct tensors on your operations, as scalar tensors can broadcast too. I could only test this with `buck test @mode/opt -c python.package_style=inplace //executorch/backends/test:test_backends` in internal with D41555454. Test coverage needs to be improved, otherwise don't blame us when we break you. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89958 Approved by: https://github.com/jerryzh168	2022-12-01 06:04:37 +00:00
Jerry Zhang	9e4a25c731	[quant][decomposed] Add support for int32 for decomposed q/dq ops (#89881 ) Summary: att Test Plan: python test/test_quantization.py -k test_decomposed_quantize_per_tensor python test/test_qunatization.py -k test_decomposed_dequantize_per_tensor Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89881 Approved by: https://github.com/cccclai	2022-11-30 21:24:00 +00:00

1 2

56 Commits