pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Maggie Moss	154e4d36e9	Fix pyrelfy ignore syntax in distributions and ao (#166248 ) Ensures existing pyrefly ignores only ignore the intended error code pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166248 Approved by: https://github.com/oulgen	2025-10-26 22:13:48 +00:00
Yuanyuan Chen	a60d9e1f6d	Fix flake8 B028 warnings (#166224 ) This PR fixes flake8 B028 warning by specifying stacklevel=2 in `warnings.warn`. The advantage is that users can know more contextual information about PyTorch warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166224 Approved by: https://github.com/ezyang	2025-10-26 06:18:55 +00:00
PyTorch MergeBot	8daef35cf1	Revert "[Code Clean] Clean asserts in torch/ao/quantization (root, quantizer, backend_config) (#165433 )" This reverts commit `df64c0c464`. Reverted https://github.com/pytorch/pytorch/pull/165433 on behalf of https://github.com/clee2000 due to I think this broke some quantization tests ([comment](https://github.com/pytorch/pytorch/pull/165433#issuecomment-3429741770))	2025-10-21 22:10:19 +00:00
zhudada	df64c0c464	[Code Clean] Clean asserts in torch/ao/quantization (root, quantizer, backend_config) (#165433 ) Replace assert statements with explicit if/raise patterns in: - torch/ao/quantization/~ - torch/ao/quantization/quantizer/ - torch/ao/quantization/backend_config/ fix partialy #164878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165433 Approved by: https://github.com/albanD	2025-10-20 22:42:51 +00:00
Maggie Moss	b13cd141b3	Add pyrefly suppressions (#164748 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the `project-excludes` field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: 0 errors (4,263 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748 Approved by: https://github.com/oulgen	2025-10-07 17:31:18 +00:00
albanD	25f4d7e482	Use new type statement to fix public API of types (#158487 ) Since type statement breaks older python version, trying to find equivalent behavior without the type mechanics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158487 Approved by: https://github.com/andrewor14	2025-07-17 18:46:44 +00:00
albanD	058fb1790f	Fix compilation and "import torch" issues for cpython 3.14 (#158184 ) Beginning of process for 3.14 bringup. State of things from this PR: - Nothing too scary looking from the Dynamo CPython side, nothing we heavily rely on seems to be missing @williamwen42 - The existing check that makes torch.compile() nicely fail is working as expected. So all these empty functions shouldn't cause any weirdness. - The `__module__` update changes look suspicious, we should investigate what is the reason and impact of that, in particular for our public API checking @jbschlosser - Leaving the weakref.py thread safety change as a follow up to keep this a bit simpler. I vendored the whole struct in the meantime FYI @ezyang EDIT: The `__module__` change is even more cursed than I though due to changes to Union and Optional type where the `__module__` field cannot be changed anymore. See https://github.com/python/cpython/issues/132139 for details. For now, I'm just skipping the `__module__` setting for 3.14 which will trip the public API checks. Will revisit once I have a final answer on the cpython issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158184 Approved by: https://github.com/msaroufim	2025-07-15 05:06:55 +00:00
Xuehai Pan	279cae52e7	[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format` (#148185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148185 Approved by: https://github.com/ezyang	2025-06-14 16:47:04 +00:00
Aaron Gokaslan	7f65a20884	[BE]: Enable ruff SLOT checks (#146276 ) This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276 Approved by: https://github.com/aorenste	2025-02-04 19:18:23 +00:00
Aaron Orenstein	9e0437a04a	PEP585 update - torch/ao/quantization (#145140 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145140 Approved by: https://github.com/bobrenjc93	2025-01-19 10:20:00 +00:00
sanchitintel	43dcb4bb61	Revise CPU vectorization ISA support API (#135075 ) Revising (mostly renaming) CPU vectorization ISA support API (non-frontend-user-facing). Also added AVX512_BF16 ISA detection API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135075 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/ezyang	2024-09-05 12:14:56 +00:00
Xuehai Pan	2ce734cee9	[BE] enable UFMT for `torch/ao/quantization/` (#128863 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863 Approved by: https://github.com/ezyang ghstack dependencies: #128861, #128862	2024-07-25 04:17:54 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Andrew Or	7c72238e4b	Back out "Enable pickling model prepared with QAT qconfig" (#110392 ) Summary: D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out. we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday. Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes. Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392 Approved by: https://github.com/junesg, https://github.com/jerryzh168	2023-10-05 14:41:00 +00:00
Sindi Shkodrani	419ec3b229	Enable pickling model prepared with QAT qconfig (#109288 ) Summary: Resolving error: AttributeError: Can't pickle local object '_add_module_to_qconfig_obs_ctr.<locals>.get_factory_kwargs_based_on_module_device' by moving nested function out to the main module Test Plan: Added test to CI Reviewed By: andrewor14 Differential Revision: D49187352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109288 Approved by: https://github.com/andrewor14	2023-09-28 09:51:19 +00:00
Justin Chu	c0d8a4af0a	[BE] Enable ruff's UP rules and autoformat ao/ (#105430 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430 Approved by: https://github.com/albanD, https://github.com/malfet	2023-07-19 13:44:37 +00:00
HDCharles	8176cd8c0f	[ao] fixing quantized prelu workflow (#103455 ) Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu was not running its observers when the quantization flow was being run, this was a bug which is now fixed and the relevant prelu tests also now check for this. Also added a corrected observer for PReLU to qconfig_mapping Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455 Approved by: https://github.com/jerryzh168	2023-06-23 16:45:40 +00:00
leslie-fang-intel	9832cfbbfe	Quantization oneDNN backend only support VNNI CPU (#103653 ) Summary - Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support. - Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-19 09:50:07 +00:00
Jerry Zhang	f7c736e1e7	[quant][pt2e] Add observer_or_fake_quant_ctr to QuantizationSpec (#101920 ) Summary: This is the second refactor to align the annotation API with design, next step is to change prepare_pt2e to consume QuantizationSpec object directly Test Plan: ``` buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D45927416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101920 Approved by: https://github.com/andrewor14	2023-05-23 05:48:23 +00:00
Aaron Gokaslan	1e2d82b8e4	[BE] Merge isinstance calls together (#94419 ) Simplify and speeds up isinstance calls by checking for multiple types at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419 Approved by: https://github.com/ezyang	2023-02-09 00:47:26 +00:00
Jerry Zhang	59c1b5025f	[quant][fx][pt2e] Refactor prepare so it's aligned better with the new API plan in pt2e (#94011 ) Summary: There are three things that happens in the current prepare code, (1). user express their intention of how they want the model to be quantized with QConfigMapping, we translate that to node.meta["target_dtype_info"] (2). we validate the setting against BackendConfig (3). insert observers based on the validated node.meta["target_dtype_info"] previously (2) and (3) are mixed together, this PR tries to move (2) closer to (1), with one edge case left, this refactor moves us closer to our target design for quantization in pytorch 2.0 export path this is a follow up PR for https://github.com/pytorch/pytorch/pull/92641 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/94011 Approved by: https://github.com/vkuzo	2023-02-07 08:23:56 +00:00
XiaobingSuper	4bae860813	quantization: make x86 as default backend (#88799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88799 Approved by: https://github.com/kit1980	2022-12-01 02:09:54 +00:00
Vasiliy Kuznetsov	22a1b5e243	quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431 ) Summary: This PR deprecates the `compute_dtype` field on observers, and replaces it with the `is_dynamic` field on observers. This is better aligned with the reference model spec. Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431 Approved by: https://github.com/jerryzh168	2022-11-24 07:07:34 +00:00
peterjc123	60e59c0755	Fix get_default_qat_qconfig for PT 1.13 (#88876 ) See https://github.com/pytorch/pytorch/pull/84329/files#r1019916766 for more context Pull Request resolved: https://github.com/pytorch/pytorch/pull/88876 Approved by: https://github.com/jgong5, https://github.com/vkuzo	2022-11-15 06:36:24 +00:00
HDCharles	6fe4ccc7cb	[ao] qconfig.py fix public v private (#87515 ) Summary: made is_reuse_input_qconfig, _activation_is_memoryless, _partial_wrapper_equals, _obs_or_fq_ctr_equals, _add_module_to_qconfig_obs_ctr, _assert_valid_qconfig private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709280](https://our.internmc.facebook.com/intern/diff/D40709280) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87515 Approved by: https://github.com/jcaip	2022-11-09 22:30:03 +00:00
Jerry Zhang	4caddac534	[quant][api] Add assert for backend in get_default_qconfig related apis (#86259 ) (#87331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259 Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn" for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping Test Plan: python test/test_quantization.py -k test_get_default_qconfig_mapping Imported from OSS Reviewed By: jcaip Differential Revision: D40236474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331 Approved by: https://github.com/andrewor14	2022-10-21 16:57:35 +00:00
Jerry Zhang	8a47a49d5e	[quant] Move the order of x86 engine to avoid changing the default qengine (#86631 ) since the default qengine is the last element of the engine in supported_engines list, adding x86 qengine in the end of the list changes the default quantized engine as well. this PR will be a short term fix to revert the changes. We have an issue here to track the proper fix: https://github.com/pytorch/pytorch/issues/86404 Motivation: a meta internal team found that the inference failed in onednn prepacking with error: "could not create a primitive descriptor for a reorder primitive." in a COPPER_LAKE machine, we are working with intel to repro and fix the problem. in the mean time, we'll revert the changes of default option back to fbgemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/86631 Approved by: https://github.com/vkuzo	2022-10-11 00:07:41 +00:00
HDCharles	facf210f9a	[ao] fixing public v private for qconfig.py (#86026 ) Summary: no changes, just removed the exception for this file, someone had already fixed the actual file Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86026 Approved by: https://github.com/jerryzh168	2022-10-06 21:42:44 +00:00
Xia, Weiwen	4b86a9359a	[Quant] Make x86 backend default when querying qconfig (#85461 ) This PR is a follow-up of #84329 [[Quant] Add unified x86 quant backend](https://github.com/pytorch/pytorch/pull/84329) It makes `x86` backend default when querying `qconfig`. Users get x86's qconfig/qconfig_mappings if backend is not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85461 Approved by: https://github.com/jgong5, https://github.com/vkuzo	2022-09-30 23:44:45 +00:00
andrewor14	24fc680ee4	[Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863 ) Summary: This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config Reviewers: jerryzh168, vkuzo Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168	2022-09-30 22:53:38 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
Vasiliy Kuznetsov	09965957cd	quantization: align observer dtype with reference model spec (#85345 ) Summary: Before this PR, the `dtype` attribute of observers was not clearly defined. It originally meant `interface_dtype` in the eager mode workflow, which is how the codebase before this PR is using it. In the new reference model spec, `dtype` attribute of an observer represents the `dtype` value which needs to be passed into a `quantize` function in the reference model spec. This PR aligns the codebase to this definition of dtype. In detail: 1. change util functions to interpret `dtype` using the reference model definition 2. change `prepare` to interpret `dtype` using the reference model definition 3. change observers for dynamic quantization to interpret `dtype` using the reference model definition. A future PR (left out of this one to keep LOC small) will deprecate the `compute_dtype` field and instead expose `is_dynamic` on observers. " Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345 Approved by: https://github.com/z-a-f, https://github.com/jerryzh168	2022-09-21 06:34:26 +00:00
Jerry Zhang	446edadd95	[quant][fx] Follow up fixes for qconfig validations for fixedqparams ops (#81010 ) Summary: This adds a few things on top of https://github.com/pytorch/pytorch/pull/80184, 1). node.target was assumed to be "tanh", torch.nn.Tanh etc. this PR handles that properly 2). adds FixedQParamsFakeQuantize support 3). extends the comparison function _partial_wrapper_equals to work with FakeQuantize.with_args(observer=...) Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37735193](https://our.internmc.facebook.com/intern/diff/D37735193) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81010 Approved by: https://github.com/andrewor14	2022-07-14 18:06:23 +00:00
Andrew Or	c44317704a	[Quant][fx] Add default configs for fixed qparams ops (#80184 ) Summary: This commit adds qconfigs with special observers for fixed qparams ops in get_default_qconfig_mapping and get_default_qat_qconfig_mapping. For correctness, we also require users to use these special observers if we detect these fixed qparams ops in prepare. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184 Approved by: https://github.com/jerryzh168	2022-06-29 23:07:26 +00:00
Andrew Or	61a1eef7fc	[Quant][fx] Add get_default_qconfig_mapping Summary: This follows https://github.com/pytorch/pytorch/pull/78452, which replaced the qconfig_dict with QConfigMapping. This PR additionally replaces get_default_qconfig_dict with get_default_qconfig_mapping. For backward compatibility, we deprecate the old functions instead of removing them. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618 Approved by: https://github.com/jerryzh168	2022-06-16 16:10:14 +00:00
Andrew Or	5dcbcc6de8	[Quant][fx] Fix get_default_qconfig_dict for fused modules Summary: Calling `prepare_fx` with `get_default_qconfig_dict` failed for models with fused modules, such as `ConvReLU2d`. This commit fixes this by adding qconfig entries for ReLU and BatchNorm as well. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_dict_with_fused_modules Reviewers: jerryzh168 Subscribers: jerryzh168, vkuzo Issue: https://github.com/pytorch/pytorch/issues/75825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75838 Approved by: https://github.com/jerryzh168	2022-04-15 22:37:26 +00:00
Digant Desai	09f32eba7a	[quant] Add default symmetric qat qconfig for qnnpack (#74507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507 * This is the default symmetric qat qconfigs for qnnpack. * Support for symmetric quantization is not available from other backends. * Observers are similar to symmetric PTQ qconfigs for qnnpack. Reviewed By: jerryzh168 Differential Revision: D34804808 fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164 (cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)	2022-03-24 16:19:28 +00:00
Digant Desai	cfe1a41b01	[quant] Add default symmetric qconfig for qnnpack (#74396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value*2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828 (cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)	2022-03-18 13:42:41 +00:00
Weiwen Xia	060f1b822a	Add onednn quant backend (#74137 ) Summary: Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820 jerryzh168 Please review. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425 (cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)	2022-03-15 01:28:21 +00:00
Jerry Zhang	5a897536f3	Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend Test Plan: revert-hammer Differential Revision: D33716039 (`989b24855e`) Original commit changeset: 6f7bb807e857 Original Phabricator Diff: D33716039 (`989b24855e`) fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0 (cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)	2022-03-11 22:06:25 +00:00
Xia Weiwen	989b24855e	Add ONEDNN quantization backend (#69820 ) Summary: This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI. ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK. To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models. ```python torch.backends.quantized.engine = 'onednn' ``` ## Design docs https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983 https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096 ## File changes Add ONEDNN to qengine list - aten/src/ATen/Context.cpp - c10/core/QEngine.h - torch/ao/quantization/qconfig.py - torch/backends/quantized/\_\_init\_\_.py Implement qconv & qlinear for ONEDNN backend - aten/src/ATen/native/quantized/cpu/conv_serialization.h - aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp - aten/src/ATen/native/quantized/cpu/onednn_utils.h - aten/src/ATen/native/quantized/cpu/qconv.cpp - aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp - aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp - aten/src/ATen/native/quantized/cpu/qlinear.cpp - aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp - aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp Skip tests that are not supported by ONEDNN - test/ao/sparsity/test_kernels.py - test/quantization/core/test_quantized_module.py - test/quantization/core/test_quantized_op.py ## Validation results This PR has passed `test_quantization.py` and `test_mkldnn.py`. Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform: (Note: Tested with single instance on single core. Using the latest oneDNN library.) Table 1. Performance comparison of int8 2d convolution operator \|No.\| Shape\| FBGEMM\| ONEDNN\| Gain\| \|-\|-\|-\|-\|-\| \|1\| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 668.310us\| 535.630us\| 24.8%\| \|2\| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 290.630us\| 281.810us\| 3.1%\| \|3\| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.045ms\| 893.010us\| 17.0%\| \|4\| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 385.320us\| 373.720us\| 3.1%\| \|5\| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.876ms\| 1.641ms\| 14.3%\| \|6\| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 660.460us\| 638.470us\| 3.4%\| Table 2. Performance comparison of int8 linear operator \|No.\| Shape (m, n, k)\| FBGEMM\| ONEDNN\| Gap\| \|-\|-\|-\|-\|-\| \|1\| 64, 800, 320\| 80.550us\| 96.770us\| 20.10%\| \|2\| 64, 768, 512\| 101.230us\| 130.720us\| 29.10%\| \|3\| 16, 256, 512\| 30.230us\| 51.450us\| 70.20%\| \|4\| 128, 128, 128\| 33.810us\| 50.480us\| 49.30%\| \|5\| 256, 512, 256\| 154.490us\| 195.050us\| 26.30%\| \|6\| 1024, 1024, 1024\| 3.134ms\| 3.514ms\| 12.10%\| ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820 Reviewed By: HDCharles Differential Revision: D33716039 Pulled By: jerryzh168 fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd (cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)	2022-03-11 20:31:49 +00:00
Jerry Zhang	7ddf212f33	[quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863 This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first, and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack). This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code in quantization_patterns.py as well (in followup PRs). Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` and other internal/oss regression tests Imported from OSS Reviewed By: andrewor14 Differential Revision: D34778506 fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b (cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)	2022-03-11 17:11:30 +00:00
Charles David Hernandez	39605a5632	[ao] Removing memoryless observer args for MovingAverage (#73947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947 The original implementation of memoryless observers used MinMaxObservers and a memoryless argument to manipulate the behavior of the observer such that it wouldn't keep track of previously observed min and max's. It was later pointed out that this was equivalent to a movingaverageobserver with averaging_constant=1 which is requires less overhead and no 1 off args (memoryless) so this PR refactors the memoryless arg and uses MovingAverage observers instead, although the memoryless adjective is still used, a complete definintion was also added to clarify error messages given these changes. TestPlan python test/test_quantization.py TestQuantizeEagerQAT python test/test_quantization.py TestObserver Test Plan: Imported from OSS Reviewed By: andrewor14 Differential Revision: D34732080 Pulled By: HDCharles fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a (cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)	2022-03-11 00:21:49 +00:00
dzdang	a39e8e8f5e	[Quant][fx] Added explicit entries for for functional and module conv&linear support into get_default_qconfig_dict&get_default_qat_qconfig_dict (#73528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73528 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D34535572 Pulled By: dzdang fbshipit-source-id: 883f46e014e47aeba3ea6f9fb401c54e3792b2ac (cherry picked from commit 66713d518295b2e7306561030aa6b7ca049a708c)	2022-03-04 03:29:20 +00:00
Jerry Zhang	5db711f9d3	[quant][be] Replace QConfigDynamic with QConfig in code (#69864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864 att, will have a follow up PR that removes QConfigDynamic in the api Test Plan: regression tests ``` python test/test_quantization.py TestPostTrainingStatic python test/test_quantization.py TestPostTrainingDynamic python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33073235 fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db	2021-12-17 22:30:57 -08:00
Charles David Hernandez	497ec9d9b8	Getting NS to work with Ferraris (#68908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68908 see description in github Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32928449 fbshipit-source-id: ba7085b823a0ebcd0d9e40f4ac19ca0a2cac1169	2021-12-08 12:26:00 -08:00
Ben Koopman	93aa3603ee	[quant][embedding qat] Re-Land Support Embedding QAT via FX API (#69333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333 Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that. Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"` Imported from OSS Reviewed By: jingsh Differential Revision: D32814827 fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16	2021-12-08 05:28:07 -08:00

1 2

63 Commits