pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
HDCharles	8176cd8c0f	[ao] fixing quantized prelu workflow (#103455 ) Summary: https://github.com/pytorch/pytorch/issues/100654 noticed prelu was not running its observers when the quantization flow was being run, this was a bug which is now fixed and the relevant prelu tests also now check for this. Also added a corrected observer for PReLU to qconfig_mapping Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_prelu Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103455 Approved by: https://github.com/jerryzh168	2023-06-23 16:45:40 +00:00
andrewor14	6c550bb4d5	[quant][be] Easier way to override default in QConfigMapping (#99888 ) Summary: This commit adds a private helper function to override the default QConfig in the default QConfigMapping. Previously we needed to override all the object_types manually while skipping the fixed qparams ops. This led to duplicate code every time someone wanted a new default QConfig. After this commit, we can just call the same helper function instead. Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/99888 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2023-04-26 18:14:01 +00:00
maxren	483fd3351a	[Quant] Add get_symmetric_qnnpack_qat_qconfig_mapping (#98569 ) Differential Revision: [D44776230](https://our.internmc.facebook.com/intern/diff/D44776230/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98569 Approved by: https://github.com/andrewor14	2023-04-07 17:57:56 +00:00
Xia, Weiwen	1d03a6a901	[Quant][Fx] Fix issue: qconfig_mappings of onednn backend are not correctly set for fused modules (#91297 ) Summary For onednn quantization backend only. Currently, FX fusion requires that all separate ops in a fused module/op have the same `qconfig`. To support `linear - leaky_relu` and `linear - tanh` fusion with onednn backend, we previously explicitly set the same `qconfig` to `linear`, `leaky_relu` and `tanh`. However, this brings two problems: - It breaks fusion of `linear - relu` since `relu` does not have the same `qconfig` as `linear` does. And it does not look good if we set `qconfig` to all these ops. They should use a global `qconfig` by default. - `Tanh` requires `fixed_qparams_qconfig` otherwise it is not quantized. So, we cannot set another `qconfig` to `tanh`. Looks like there is not a straightforward way to solve the problems. This PR fixes them by the following: - Do not set `qconfig` to these ops so that these ops use a global `qconfig` and `linear - relu` and `linear - leaky_relu` can be fused correctly. - Set the same `qconfig` to `linear` and `tanh` manually by users when they want to fuse `linear - tanh` with onednn backend. A known issue still exists: users cannot fuse `linear - tanh` and quantize standalone `tanh` at the same time. Test plan python test/test_quantization.py -k test_qconfig_dict_with_fused_modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/91297 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-01-26 09:55:34 +00:00
Xia, Weiwen	a5eb564ba4	[Quant] lower fused LinearTanh for onednn backend (#89188 ) Summary Add fuser method and quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering are supported only in FX mode. Test plan python test_quantization.py TestFuseFx TestQuantizeFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/89188 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-20 01:30:21 +00:00
Xia, Weiwen	9ca41a986c	[Quant][FX] Lower QLinearLeakyReLU for onednn backend (#88668 ) Summary Add quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode. Test plan python test_quantization.py TestQuantizeFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88668 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-19 00:44:24 +00:00
Xia, Weiwen	7b0ec67e34	[Quant][FX] Add backend config for onednn backend and fuse Linear-LeakyReLU (#88665 ) Summary Add backend config for onednn backend so that it can support more post op fusion for int8 inference. First `Linear - LeakyReLU` fusion is implemented based on previous PRs. Test plan python test_quantization.py TestFuseFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88665 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-17 03:33:08 +00:00
andrewor14	29d1d8f3ef	[Quant] Remove explicitly default QConfigMapping settings (#90066 ) Summary: Previously we explicitly set a qconfig for ops like conv and linear in the default QConfigMapping. However, this makes it difficult for user to override the global and have the new global take effect for basic ops. This commit removes these explicit settings so the user can simply run the following to quantize these ops. ``` qconfig_mapping = get_default_qconfig_mapping() qconfig_mapping.set_global(my_qconfig) ``` There is no change in behavior for the default use case of not setting anything on the default QConfigMapping. Test Plan: python test/test_quantization.py TestQuantizeFx.test_default_qconfig_mapping_override_global Reviewers: vkuzo, jerryzh168 Subscribers: vkuzo, jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90066 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2022-12-02 23:33:47 +00:00
HDCharles	9013c92a9f	[ao] making QConfigMapping print in a user friendly way (#89932 ) Summary: added __repr__ to QConfigMapping and QConfigMultiMapping loosely based on __repr__ for BaseSparsifier example output: ``` >>> import torch >>> print(torch.ao.quantization.qconfig_mapping.get_default_qconfig_mapping()) QConfigMapping ( global_qconfig QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) object_type_qconfigs reshape: QConfig(activation=<class 'torch.ao.quantization.observer.ReuseInputObserver'>, weight=<class 'torch.ao.quantization.observer.NoopObserver'>) <class 'torch.nn.modules.conv.Conv1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.Conv2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.Conv3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.linear.Linear'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv1d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv2d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv3d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv_transpose1d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method conv_transpose2d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method conv_transpose3d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in function linear>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.activation.ReLU'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <function relu at 0x7f08ad57bc10>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method relu of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <function layer_norm at 0x7f08ad57fca0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=<class 'torch.ao.quantization.observer.PlaceholderObserver'>) <class 'torch.nn.modules.normalization.LayerNorm'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=<class 'torch.ao.quantization.observer.PlaceholderObserver'>) <class 'torch.nn.modules.activation.Hardsigmoid'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <function hardsigmoid at 0x7f08ad57f670>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) hardsigmoid: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) hardsigmoid_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Sigmoid'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method sigmoid of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) sigmoid: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) sigmoid_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Softmax'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Tanh'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method tanh of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) tanh: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) tanh_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) module_name_regex_qconfigs OrderedDict() module_name_qconfigs OrderedDict() module_name_object_type_order_qconfigs OrderedDict() ) ``` Test Plan: python test/test_quantization.py TestFXNumericSuiteNShadows.test_qconfig_multi_mapping_repr python test/test_quantization.py TestQuantizeFx.test_qconfig_mapping_repr Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89932 Approved by: https://github.com/vkuzo	2022-12-02 05:24:47 +00:00
XiaobingSuper	4bae860813	quantization: make x86 as default backend (#88799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88799 Approved by: https://github.com/kit1980	2022-12-01 02:09:54 +00:00
andrewor14	61801799a0	[Quant][bc-breaking] Remove overwrite_output_observer (#88620 ) Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, https://github.com/pytorch/pytorch/pull/80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88620 Approved by: https://github.com/jerryzh168	2022-11-16 18:44:12 +00:00
HDCharles	072834d56d	[ao] qconfig_mapping.py fixing public v private (#87518 ) Summary: made _GLOBAL_DICT_KEY, _OBJECT_TYPE_DICT_KEY, _MODULE_NAME_REGEX_DICT_KEY, _MODULE_NAME_DICT_KEY, _MODULE_NAME_OBJECT_TYPE_ORDER_DICT_KEY private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709278](https://our.internmc.facebook.com/intern/diff/D40709278) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87518 Approved by: https://github.com/jcaip	2022-11-11 00:32:24 +00:00
Jerry Zhang	4caddac534	[quant][api] Add assert for backend in get_default_qconfig related apis (#86259 ) (#87331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259 Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn" for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping Test Plan: python test/test_quantization.py -k test_get_default_qconfig_mapping Imported from OSS Reviewed By: jcaip Differential Revision: D40236474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331 Approved by: https://github.com/andrewor14	2022-10-21 16:57:35 +00:00
andrewor14	0cae309069	[Quant] Add get_symmetric_qnnpack_qconfig_mapping (#87002 ) Summary: Today, in order to get XNNPACK quantized ops to work, the user must write some code that refers to private data structures (`_FIXED_QPARAMS_OP_TO_OBSERVER`) to create a QConfigMapping that is compatible with the symmetric constraints in the QNNPACK BackendConfig. This is because `get_default_qconfig("qnnpack")` produces a QConfig that does not satisfy these constraints, and the default QConfigMapping for QNNPACK uses this Qconfig. Instead, we simply put this code into a helper function to make it easier for the user to run XNNPACK quantized ops. In the future, once there is feature parity between the set of ops supported by QNNPACK and XNNPACK, we should revisit whether to simply change `get_default_qconfig("qnnpack")` to return an XNNPACK-compatible QConfig. Test Plan: python test/test_quantization.py TestQuantizeFx.test_symmetric_qnnpack_qconfig_mapping Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/87002 Approved by: https://github.com/vkuzo	2022-10-20 02:33:15 +00:00
Jerry Zhang	8a47a49d5e	[quant] Move the order of x86 engine to avoid changing the default qengine (#86631 ) since the default qengine is the last element of the engine in supported_engines list, adding x86 qengine in the end of the list changes the default quantized engine as well. this PR will be a short term fix to revert the changes. We have an issue here to track the proper fix: https://github.com/pytorch/pytorch/issues/86404 Motivation: a meta internal team found that the inference failed in onednn prepacking with error: "could not create a primitive descriptor for a reorder primitive." in a COPPER_LAKE machine, we are working with intel to repro and fix the problem. in the mean time, we'll revert the changes of default option back to fbgemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/86631 Approved by: https://github.com/vkuzo	2022-10-11 00:07:41 +00:00
Xia, Weiwen	4b86a9359a	[Quant] Make x86 backend default when querying qconfig (#85461 ) This PR is a follow-up of #84329 [[Quant] Add unified x86 quant backend](https://github.com/pytorch/pytorch/pull/84329) It makes `x86` backend default when querying `qconfig`. Users get x86's qconfig/qconfig_mappings if backend is not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85461 Approved by: https://github.com/jgong5, https://github.com/vkuzo	2022-09-30 23:44:45 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
Jerry Zhang	4523ac7aa1	[quant][docs][ez] Fix formatting for qconfig_mapping (#85306 ) Summary: att Test Plan: visual inspection of generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85306 Approved by: https://github.com/vkuzo, https://github.com/andrewor14	2022-09-22 02:09:36 +00:00
Jesse Cai	d6b2f5c643	[Quant][fx] Remove `remove_quant_dequant_pairs` and fix tests (#84203 ) Summary: - `remove_quant_dequant_pairs` removes ops when a `quant` is followed by a `dequant` - It looks like the quantized implementation of `layer_norm` only supports float weights, so updated the default qconfig to avoid quantizing the weight param. - Fixes broken test, `test_norm_weight_bias`. This was the only test that broke, because the default qconfig dict we pass in quantizes the weight. I just pulled the native qconfig object and converted it to a dict. - Adds in qconfig and backend config support for layernorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels ``` Reviewers: Subscribers: Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110 Tags: quant, fx Differential Revision: [D39395141](https://our.internmc.facebook.com/intern/diff/D39395141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84203 Approved by: https://github.com/jerryzh168	2022-09-12 16:32:15 +00:00
Jerry Zhang	214a6500e3	[quant][docs] Additonal fixes for quantize_fx docs (#84587 ) Summary: Some more clarifications for the arguments, including linking to object docs (QConfigMapping, BackendConfig) and adding types in the doc Test Plan: ``` cd docs make html ``` and visual inspection for the generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84587 Approved by: https://github.com/vkuzo	2022-09-09 15:23:23 +00:00
Jerry Zhang	446edadd95	[quant][fx] Follow up fixes for qconfig validations for fixedqparams ops (#81010 ) Summary: This adds a few things on top of https://github.com/pytorch/pytorch/pull/80184, 1). node.target was assumed to be "tanh", torch.nn.Tanh etc. this PR handles that properly 2). adds FixedQParamsFakeQuantize support 3). extends the comparison function _partial_wrapper_equals to work with FakeQuantize.with_args(observer=...) Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37735193](https://our.internmc.facebook.com/intern/diff/D37735193) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81010 Approved by: https://github.com/andrewor14	2022-07-14 18:06:23 +00:00
Andrew Or	c44317704a	[Quant][fx] Add default configs for fixed qparams ops (#80184 ) Summary: This commit adds qconfigs with special observers for fixed qparams ops in get_default_qconfig_mapping and get_default_qat_qconfig_mapping. For correctness, we also require users to use these special observers if we detect these fixed qparams ops in prepare. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184 Approved by: https://github.com/jerryzh168	2022-06-29 23:07:26 +00:00
Andrew Or	61a1eef7fc	[Quant][fx] Add get_default_qconfig_mapping Summary: This follows https://github.com/pytorch/pytorch/pull/78452, which replaced the qconfig_dict with QConfigMapping. This PR additionally replaces get_default_qconfig_dict with get_default_qconfig_mapping. For backward compatibility, we deprecate the old functions instead of removing them. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618 Approved by: https://github.com/jerryzh168	2022-06-16 16:10:14 +00:00
Andrew Or	c7b4eec233	[Quant][fx][bc-breaking] Replace qconfig_dict with a config object (#78452 ) Summary: Previously, FX graph mode quantization configurations were specified through a dictionary of qconfigs. However, this API was not in line with other core APIs in PyTorch. This commit replaces this dictionary with a config object that users will create and pass to prepare and convert. This leads to better type safety and better user experience in notebook settings due to improved auto completion. The new API is as follows: ``` from torch.ao.quantization import QConfigMapping from torch.ao.quantization.quantize_fx import prepare_fx qconfig_mapping = QConfigMapping() .set_global(qconfig) .set_object_type(torch.nn.Linear, qconfig) .set_module_name_regex("foo.bar", qconfig) .set_module_name("mod", qconfig) prepare_fx(model, qconfig_mapping) ``` For backwards compatibility, `prepare_fx`, `prepare_qat_fx`, and `convert_fx` will continue to accept qconfig_dicts, which will be converted to QuantizationConfigs internally. Note that this commit does not modify existing tests to use the new API; they will continue to pass in qconfig_dict as before, which still works but triggers a deprecation warning. This will be handled in a future commit. Test Plan:* python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: D36747998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452 Approved by: https://github.com/jerryzh168	2022-05-30 18:30:07 +00:00

24 Commits