pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
HDCharles	a01c1ee594	[ao] making _is_activation_post_process private with BC (#90554 ) same function in observer and quantize, consolidated to a single function note: this is a recreation of D40709276 which caused severa breakages due to not maintaining BC for models with cached code with calls to the old function name Differential Revision: [D41793604](https://our.internmc.facebook.com/intern/diff/D41793604/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41793604/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/90554 Approved by: https://github.com/jcaip	2022-12-16 08:09:33 +00:00
HDCharles	173accd1c1	[ao][fx] fixing public v private qconfig_mapping_utils.py (#88399 ) Summary: made _check_is_valid_config_dict, _compare_prepare_convert_qconfig_mappings, _generate_node_name_to_qconfig, _is_qconfig_supported_by_dtype_configs, _maybe_adjust_qconfig_for_module_name_object_type_order, _update_qconfig_for_fusion private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015544](https://our.internmc.facebook.com/intern/diff/D41015544) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88399 Approved by: https://github.com/jcaip	2022-12-15 17:48:34 +00:00
andrewor14	691a44f403	[Quant][fx][bc-breaking] Add simpler BackendConfig pattern format (#90698 ) Summary: The existing BackendConfig fusion pattern uses a "reversed nested tuple" format that is highly unintuitive. For example, ``` linear-relu -> (nn.ReLU, nn.Linear) conv-bn-relu -> (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d)) ``` This pattern format also complicates the signatures of the user specified "fuser methods", which needed to accept arguments in reverse nested order to match the patterns: ``` def fuse_linear_relu(is_qat, relu, linear): ... def fuse_conv_bn_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv ... ``` Instead, this commit introduces a new pattern format that simply specifies the ops in forward order with no nesting: ``` linear-relu -> (nn.Linear, nn.ReLU) conv-bn-relu -> (nn.Conv2d, nn.BatchNorm2d, nn.ReLU) def fuse_linear_relu(is_qat, linear, relu): ... def fuse_conv_bn_relu(is_qat, conv, bn, relu): ... ``` Note that the legacy "reversed nested tuple" is still used internally since it is more general. In the future, we should replace it with the format used in the subgraph rewriter in `torch.fx`, and simplify the existing pattern matching code to handle the new format added in this commit. BC-breaking Notes: Before: ``` import torch as nn import torch.ao.nn.intrinsic as nni from torch.ao.quantization.backend_config import BackendPatternConfig def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` After: ``` def fuse_linear_relu(is_qat, conv, bn, relu): return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` OR (for backward-compatibility) ``` def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig() \ ._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) \ ._set_use_legacy_pattern_format(True) ``` Before: ``` backend_config.configs # returns Dict[Pattern, BackendPatternConfig] ``` After: ``` backend_config.configs # returns List[BackendPatternConfig] ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestBackendConfig Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D41954553](https://our.internmc.facebook.com/intern/diff/D41954553) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90698 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2022-12-14 22:44:29 +00:00
HDCharles	258860fa3a	[ao][fx] fixing public v private for pattern_utils.py (#88397 ) Summary: made _DEFAULT_FUSION_PATTERNS, _register_fusion_pattern, _DEFAULT_QUANTIZATION_PATTERNS, _DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP, _DEFAULT_OUTPUT_OBSERVER_MAP, _register_quant_pattern, _sorted_patterns_dict private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015537](https://our.internmc.facebook.com/intern/diff/D41015537) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88397 Approved by: https://github.com/jcaip	2022-12-14 03:40:02 +00:00
HDCharles	79156c11c3	[ao][fx] fixing public v private match_utils.py (#88396 ) Summary: made _is_match, _find_matches, _MatchResult private also added __all__ to lower_to_qnnpack.py Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015540](https://our.internmc.facebook.com/intern/diff/D41015540) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88396 Approved by: https://github.com/jcaip	2022-12-13 20:16:55 +00:00
Jerry Zhang	0e182c9441	[quant][fx] Add support for matching constant in the custom matcher code in quantization (#90092 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx.test_pattern_match_constant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90092 Approved by: https://github.com/jcaip	2022-12-06 22:47:41 +00:00
andrewor14	29d1d8f3ef	[Quant] Remove explicitly default QConfigMapping settings (#90066 ) Summary: Previously we explicitly set a qconfig for ops like conv and linear in the default QConfigMapping. However, this makes it difficult for user to override the global and have the new global take effect for basic ops. This commit removes these explicit settings so the user can simply run the following to quantize these ops. ``` qconfig_mapping = get_default_qconfig_mapping() qconfig_mapping.set_global(my_qconfig) ``` There is no change in behavior for the default use case of not setting anything on the default QConfigMapping. Test Plan: python test/test_quantization.py TestQuantizeFx.test_default_qconfig_mapping_override_global Reviewers: vkuzo, jerryzh168 Subscribers: vkuzo, jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90066 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2022-12-02 23:33:47 +00:00
PyTorch MergeBot	cba96366a2	Revert "remove torch.equal usages (#89527 )" This reverts commit `4095ef8b80`. Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests `4095ef8b80` https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502	2022-12-02 21:36:13 +00:00
HDCharles	9013c92a9f	[ao] making QConfigMapping print in a user friendly way (#89932 ) Summary: added __repr__ to QConfigMapping and QConfigMultiMapping loosely based on __repr__ for BaseSparsifier example output: ``` >>> import torch >>> print(torch.ao.quantization.qconfig_mapping.get_default_qconfig_mapping()) QConfigMapping ( global_qconfig QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) object_type_qconfigs reshape: QConfig(activation=<class 'torch.ao.quantization.observer.ReuseInputObserver'>, weight=<class 'torch.ao.quantization.observer.NoopObserver'>) <class 'torch.nn.modules.conv.Conv1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.Conv2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.Conv3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.conv.ConvTranspose3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.linear.Linear'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv1d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv2d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv3d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method conv_transpose1d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method conv_transpose2d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method conv_transpose3d of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in function linear>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.activation.ReLU'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <function relu at 0x7f08ad57bc10>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <built-in method relu of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm1d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <class 'torch.nn.modules.batchnorm.BatchNorm3d'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=functools.partial(<class 'torch.ao.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){}) <function layer_norm at 0x7f08ad57fca0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=<class 'torch.ao.quantization.observer.PlaceholderObserver'>) <class 'torch.nn.modules.normalization.LayerNorm'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.HistogramObserver'>, reduce_range=True){}, weight=<class 'torch.ao.quantization.observer.PlaceholderObserver'>) <class 'torch.nn.modules.activation.Hardsigmoid'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <function hardsigmoid at 0x7f08ad57f670>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) hardsigmoid: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) hardsigmoid_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Sigmoid'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method sigmoid of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) sigmoid: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) sigmoid_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Softmax'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.00390625, zero_point=0, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <class 'torch.nn.modules.activation.Tanh'>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) <built-in method tanh of type object at 0x7f08b99497e0>: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) tanh: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) tanh_: QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.FixedQParamsObserver'>, scale=0.0078125, zero_point=128, dtype=torch.quint8, quant_min=0, quant_max=255){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){}) module_name_regex_qconfigs OrderedDict() module_name_qconfigs OrderedDict() module_name_object_type_order_qconfigs OrderedDict() ) ``` Test Plan: python test/test_quantization.py TestFXNumericSuiteNShadows.test_qconfig_multi_mapping_repr python test/test_quantization.py TestQuantizeFx.test_qconfig_mapping_repr Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89932 Approved by: https://github.com/vkuzo	2022-12-02 05:24:47 +00:00
Jerry Zhang	342139589c	[quant][fx] Add support for matching multiple arguments in patterns (#89986 ) Summary: This PR adds support for matching patterns that has multiple arguments, it's needed for quantization in PyTorch 2.0 early prototype Before this PR, we only support patterns like: ``` x -> conv -> bn -> relu (relu, (bn, conv)) ``` where each operator has a single node, the code breaks when we want to match a pattern that has an op that has multiple arguments, such as: ``` shape \ transpose -> reshape -> output -> ``` where `reshape` has two arguments Test Plan: python test/test_quantization.py TestQuantizeFx.test_match_pattern_with_multiple_args Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89986 Approved by: https://github.com/vkuzo	2022-12-02 03:28:32 +00:00
Jerry Zhang	8aee768025	[quant][be] Merge qconfig_mapping_utils.py in quantization and fx folders (#89979 ) Summary: att, no functionality changes Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89979 Approved by: https://github.com/vkuzo	2022-12-01 21:25:53 +00:00
andrewor14	d80056312a	[Quant][fx][bc-breaking] Rename fx/patterns.py (#89872 ) Summary: This commit renames fx/quantization_patterns.py to fx/quantize_handler.py, and fx/fusion_patterns.py to fx/fuse_handler.py. This is because these files contain only QuantizeHandler and FuseHandler respectively, so the new names are more descriptive. A future commit will further break BC by removing all the empty QuantizeHandler classes. BC-breaking notes: The following classes under the `torch.ao.quantization.fx.quantization_patterns` namespace are migrated to the `torch.ao.quantization.fx.quantize_handler` namespace: ``` QuantizeHandler BinaryOpQuantizeHandler CatQuantizeHandler ConvReluQuantizeHandler LinearReLUQuantizeHandler BatchNormQuantizeHandler EmbeddingQuantizeHandler RNNDynamicQuantizeHandler DefaultNodeQuantizeHandler FixedQParamsOpQuantizeHandler CopyNodeQuantizeHandler GeneralTensorShapeOpQuantizeHandler CustomModuleQuantizeHandler StandaloneModuleQuantizeHandler ``` The following classes under the `torch.ao.quantization.fx.fusion_patterns` namespace are migrated to the `torch.ao.quantization.fx.fuse_handler` namespace: ``` DefaultFuseHandler FuseHandler ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/89872 Approved by: https://github.com/jerryzh168	2022-12-01 17:37:07 +00:00
Philip Meier	4095ef8b80	remove torch.equal usages (#89527 ) Preparation for the next PR in this stack: #89559. I replaced - `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`, - the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and - `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default). There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527 Approved by: https://github.com/mruberry	2022-12-01 11:22:52 +00:00
Jerry Zhang	39772a6a01	[quant] Add support for quantize_per_channel in the reference flow with decomposed tensor (#89270 ) Summary: att, after this PR we can produce quantize_per_channel and dequantize_per_channel ops (typically used for quantizing weights) in the reference flow using decomposed tensor Test Plan: python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_per_channel_quant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89270 Approved by: https://github.com/vkuzo	2022-11-23 10:57:04 +00:00
Jerry Zhang	c4e08387c1	[quant][fx] Support producing reference quantized patterns for dynamic quantization (#89248 ) Summary: split the is_decomposed logic for `_replace_observer_with_quantize_dequantize_node` in a separate function and added support for dynamic quantization in the decomposed version of this function. In case of dynamic quantization, we'll produce the following reference quantized pattern in decomposed mode: ``` x -> choose_qparams -> quantize_per_tensor -> dequantize_per_tensor -> linear ``` Test Plan: python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_dynamic_quant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89248 Approved by: https://github.com/vkuzo	2022-11-22 16:45:13 +00:00
PyTorch MergeBot	9d209e7834	Revert "[ao] making _is_activation_post_process private (#87520 )" This reverts commit `45c62a3377`. Reverted https://github.com/pytorch/pytorch/pull/87520 on behalf of https://github.com/bigfootjon due to Diff reverted internally	2022-11-21 16:48:26 +00:00
andrewor14	19e66fcec2	[Quant] Allow setting fixed qparams for inner LSTM ops (#88456 ) Summary: In both eager and FX graph mode quantization, `torch.ao.nn.quantizable.LSTM` is used as an observed custom module, which is responsible for inserting its own observers. By default, the user specifies a single QConfig for the custom module (either through QConfigMapping or by setting the "qconfig" attribute"), and all inner ops will [inherit this QConfig](`dc00bb51b8/torch/ao/nn/quantizable/modules/rnn.py (L366-L378)`) and use the same observer/fake_quantize constructors. Today, users who wish to override this behavior must extend `torch.ao.nn.quantizable.LSTM` and write a lot of custom code to manually assign the QConfigs to the inner ops. This commit alleviates this burden on the user by providing a helper function to assign QConfigs with custom observers. An example use case of this is providing a reference implementation for a backend kernel that hardcodes qparams for efficiency. Example usage: ``` import torch from torch.ao.quantization import get_default_qconfig_mapping from torch.ao.quantization.fx.custom_config import ( PrepareCustomConfig, ConvertCustomConfig, ) class MyModel(torch.nn.Module): ... class UserLSTM(torch.ao.nn.quantizable.LSTM): @classmethod def from_float(cls, other): assert isinstance(other, cls._FLOAT_MODULE) linear_output_obs_ctr = FixedQParamsObserver.with_args( scale=2 -11, zero_point=2 15, dtype=torch.qint32) sigmoid_obs_ctr = FixedQParamsObserver.with_args( scale=2 -16, zero_point=0, dtype=torch.qint32) tanh_obs_ctr = FixedQParamsObserver.with_args( scale=2 -15, zero_point=2 15, dtype=torch.qint32) cell_state_obs_ctr = FixedQParamsObserver.with_args( scale=2 -11, zero_point=0, dtype=torch.qint32) hidden_state_obs_ctr = FixedQParamsObserver.with_args( scale=2 -7, zero_point=2 7, dtype=torch.quint8) return torch.ao.quantization.utils._get_lstm_with_individually_observed_parts( float_lstm=other, linear_output_obs_ctr=linear_output_obs_ctr, sigmoid_obs_ctr=sigmoid_obs_ctr, tanh_obs_ctr=tanh_obs_ctr, cell_state_obs_ctr=cell_state_obs_ctr, hidden_state_obs_ctr=hidden_state_obs_ctr, ) qconfig_mapping = get_default_qconfig_mapping() example_inputs = (torch.rand(5, 3, 50), torch.rand(1, 3, 50), torch.randn(1, 3, 50)) prepare_custom_config = PrepareCustomConfig() \ .set_float_to_observed_mapping(torch.nn.LSTM, UserLSTM) convert_custom_config = ConvertCustomConfig() \ .set_observed_to_quantized_mapping(UserLSTM, torch.ao.nn.quantized.LSTM) model = MyModel() model = prepare_fx(model, qconfig_mapping, example_inputs, prepare_custom_config=prepare_custom_config) model(example_inputs) # calibrate model = convert_fx(model, convert_custom_config=convert_custom_config) model(example_inputs) ``` Test Plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88456 Approved by: https://github.com/jerryzh168, https://github.com/vkuzo	2022-11-18 16:27:12 +00:00
Kazuaki Ishizaki	088f2fa567	Fix typos in messages under test (#89121 ) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980	2022-11-17 01:55:03 +00:00
HDCharles	45c62a3377	[ao] making _is_activation_post_process private (#87520 ) Summary: same function in observer and quantize, consolidated to a single function. Note the definitions were slightly different, I've changed the definition to be maximally inclusive so that the name of the function is more accurate Test Plan: python test/test_public_bindings.py python test/test_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520 Approved by: https://github.com/jcaip	2022-11-16 21:31:57 +00:00
andrewor14	61801799a0	[Quant][bc-breaking] Remove overwrite_output_observer (#88620 ) Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, https://github.com/pytorch/pytorch/pull/80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88620 Approved by: https://github.com/jerryzh168	2022-11-16 18:44:12 +00:00
HDCharles	b9029fc449	[ao] quant_type.py fixing public v private (#87519 ) Summary: made _get_quant_type_to_str private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709282](https://our.internmc.facebook.com/intern/diff/D40709282) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87519 Approved by: https://github.com/jcaip	2022-11-15 15:42:31 +00:00
HDCharles	072834d56d	[ao] qconfig_mapping.py fixing public v private (#87518 ) Summary: made _GLOBAL_DICT_KEY, _OBJECT_TYPE_DICT_KEY, _MODULE_NAME_REGEX_DICT_KEY, _MODULE_NAME_DICT_KEY, _MODULE_NAME_OBJECT_TYPE_ORDER_DICT_KEY private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709278](https://our.internmc.facebook.com/intern/diff/D40709278) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87518 Approved by: https://github.com/jcaip	2022-11-11 00:32:24 +00:00
HDCharles	a6610faa93	[ao] qconfig_mapping_utils.py fixing public v private (#87517 ) Summary: made _get_object_type_qconfig, _get_module_name_regex_qconfig, _get_module_name_qconfig, _maybe_adjust_qconfig_for_module_type_or_name, _get_flattened_qconfig_dict _update_qconfig_for_qat private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709279](https://our.internmc.facebook.com/intern/diff/D40709279) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87517 Approved by: https://github.com/jcaip	2022-11-10 21:40:39 +00:00
Xia, Weiwen	f6192b75c6	[Quant] Support lowering of channel shuffle in FX (#83731 ) ## Description Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py` ## Validation UTs added to test - correctness of quantized `ChannelShuffle` module. - FX lowering of `ChannelShuffle` module and functional `channel_shuffle`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83731 Approved by: https://github.com/jerryzh168	2022-11-09 08:08:11 +00:00
Philip Meier	bc73affdad	prepare removal of deprecated functionality in torch.testing (#87969 ) _Redo of #86586 with all BC breaking changes granularly placed into separate commits._ --- Per title. Deprecation happened on Feb 25, 2022 in `c6f1bbc0ac`, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969 Approved by: https://github.com/mruberry	2022-11-02 14:04:48 +00:00
Jerry Zhang	2e1199d171	[quant][fx] Fix a typo in utils.py (#88024 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88024 Approved by: https://github.com/HDCharles, https://github.com/z-a-f	2022-10-31 17:31:58 +00:00
Jerry Zhang	ecf277abec	[quant][improvement] Check the fixedqparam op qconfig based on backend_config (#87425 ) Summary: Previously we hardcoded the supported observers for fixedqparam ops, this PR changes that to take the information from BackendConfig, this allows users to customize the support for fixed qparam ops Test Plan: python test/test_quantization.py TestQuantizeFx.test_change_backend_config_for_fixed_qparam_ops Reviewers: Subscribers: Tasks: Tags: unlinked from diff since it's too hard to land Pull Request resolved: https://github.com/pytorch/pytorch/pull/87425 Approved by: https://github.com/andrewor14	2022-10-28 23:38:40 +00:00
Jerry Zhang	0e3b5ea026	[quant][fx] Add _convert_to_reference_decomposed (#87094 ) Summary: _convert_to_reference_decomposed is a private convert function in fx graph mode quantization flow to convert a calibrated/trained model to a reference quantized model with decomposed quantized tensor representations. Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87094 Approved by: https://github.com/andrewor14	2022-10-27 01:22:08 +00:00
Jerry Zhang	4caddac534	[quant][api] Add assert for backend in get_default_qconfig related apis (#86259 ) (#87331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259 Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn" for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping Test Plan: python test/test_quantization.py -k test_get_default_qconfig_mapping Imported from OSS Reviewed By: jcaip Differential Revision: D40236474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331 Approved by: https://github.com/andrewor14	2022-10-21 16:57:35 +00:00
andrewor14	0cae309069	[Quant] Add get_symmetric_qnnpack_qconfig_mapping (#87002 ) Summary: Today, in order to get XNNPACK quantized ops to work, the user must write some code that refers to private data structures (`_FIXED_QPARAMS_OP_TO_OBSERVER`) to create a QConfigMapping that is compatible with the symmetric constraints in the QNNPACK BackendConfig. This is because `get_default_qconfig("qnnpack")` produces a QConfig that does not satisfy these constraints, and the default QConfigMapping for QNNPACK uses this Qconfig. Instead, we simply put this code into a helper function to make it easier for the user to run XNNPACK quantized ops. In the future, once there is feature parity between the set of ops supported by QNNPACK and XNNPACK, we should revisit whether to simply change `get_default_qconfig("qnnpack")` to return an XNNPACK-compatible QConfig. Test Plan: python test/test_quantization.py TestQuantizeFx.test_symmetric_qnnpack_qconfig_mapping Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/87002 Approved by: https://github.com/vkuzo	2022-10-20 02:33:15 +00:00
Jerry Zhang	363b108e39	[quant][fx] Fix weight_dtype and bias_dtype backend_config checks (#86719 ) Summary: This PR adds checks for the existence of "weight_dtype" and "bias_dtype" in the node_name_to_dtype dictionary before accessing it, the corner case is hit when we check the compatibility of qconfig and backend_config for weight and bias that appears before activation (e.g. torch.addmm) Test Plan: python test/test_quantization.py -k test_backend_config_check_for_weight_and_bias Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86719 Approved by: https://github.com/andrewor14	2022-10-12 20:20:02 +00:00
Jerry Zhang	d792d75091	[quant][fix] Fix the call to get_executorch_backend_config (#86338 ) Summary: previously the call failed because there was an infinite loop in _get_share_qparams_ops_configs Test Plan: python test/test_quantization.py -k test_get_executorch_backend_config Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86338 Approved by: https://github.com/andrewor14	2022-10-10 18:52:26 +00:00
zaf	efccb6401c	[quant][ao_migration] nn.intrinsic.qat migration to ao (#86171 ) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.qat`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. ``` python test/test_quantization.py TestAOMigrationNNIntrinsic ``` Differential Revision: [D39419993](https://our.internmc.facebook.com/intern/diff/D39419993/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419993/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86171 Approved by: https://github.com/jerryzh168	2022-10-07 17:29:42 +00:00
Jiaxu Zhu	bc919ac796	[torch.ao.quantization] include torch.qint32 for static quant (#86345 ) Summary: include `torch.qint32` to `activation_is_statically_quantized` and `get_quant_type` so that fakequantize with `dtype=torch.qint32` won't be skipped Test Plan: updated `test_custom_module_class` Differential Revision: D40128178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86345 Approved by: https://github.com/jerryzh168	2022-10-06 20:05:56 +00:00
andrewor14	24fc680ee4	[Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863 ) Summary: This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config Reviewers: jerryzh168, vkuzo Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168	2022-09-30 22:53:38 +00:00
andrewor14	4ca125a9e1	[Quant][fx] Add quant and scale ranges to BackendConfig (#85200 ) Summary: This commit adds the following constraints to BackendConfig: quant_min_lower_bound quant_max_upper_bound scale_min_lower_bound scale_max_upper_bound This is motivated by QNNPACK constraints on qint8 weight values and the min scale value. Actually enforcing these constraints in the QNNPACK BackendConfig will follow in a future commit. Today, users can also specify the above constraints through QConfigs, and these settings may not necessarily match the ones specified in the BackendConfig. In this case, we will handle the discrepancy as follows: (1) Require QConfig quant ranges to fall within the backend's (2) Require QConfig min scale value (eps) >= backend's (3) Require QConfig to specify quant range if the backend specified one (4) Require QConfig to specify min scale value (eps) if the backend specified one Public API changes: * Previous API, still supported after this commit: ``` dtype_config = DTypeConfig( input_dtype=torch.quint8, output_dtype=torch.quint8, weight_dtype=torch.qint8, bias_dtype=torch.float, ) ``` * New API: ``` dtype_config = DTypeConfig( input_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), output_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), weight_dtype=DTypeWithConstraints( dtype=torch.qint8, quant_min_lower_bound=-128, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), bias_dtype=torch.float, ) ``` * Additionally, the following `DTypeConfig` attributes have new types with helper getters: ``` # These have type DTypeWithConstraints dtype_config.input_dtype dtype_config.output_dtype dtype_config.weight_dtype # These return Optional[torch.dtype] dtype_config.get_input_dtype() dtype_config.get_output_dtype() dtype_config.get_weight_dtype() ``` Note that scale_max is currently not used because there is no existing mechanism to enforce this on the observer. In the future, we can validate this as well if there is a use case. Test Plan: python test/test_quantization.py TestBackendConfig.test_dtype_with_constraints python test/test_quantization.py TestQuantizeFx.test_backend_config_scale_min python test/test_quantization.py TestQuantizeFx.test_backend_config_quantization_range Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200 Approved by: https://github.com/jerryzh168	2022-09-28 00:33:29 +00:00
dzdang	ea81138bd6	[quant][improvement][better-engineering] Refactored get_supported_device_types into common_quantization.py (#79607 ) Summary: Both test_quantized_tensor.py and test_quantize_fx.py had the same get_supported_device_types function defined. This PR refactors it into the common_quantization.py file for common usage Test Plan: ``` python test/test_quantization.py ``` Differential Revision: [D37173692](https://our.internmc.facebook.com/intern/diff/D37173692) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79607 Approved by: https://github.com/jerryzh168	2022-09-23 23:32:18 +00:00
andrewor14	034f2b4d23	[Quant][fx] Enable FX static quantization for LSTM (#85068 ) Summary: This commit enables the custom module LSTM path for FX graph mode static quantization. This has the same flow as eager mode, which was already previously supported: ``` torch.nn.LSTM \| (prepare_fx) v torch.ao.nn.quantizable.LSTM \| (convert_fx) v torch.ao.nn.quantized.LSTM ``` The main reason why custom module LSTM is not supported in FX graph mode quantization today is because its inputs and outputs are nested tuples, and existing constructs such as observers, "quantize" nodes, and "dequantize" nodes do not understand how to handle complex structures. Note that the approach taken in this commit is only intended to be a short-term solution highly tailored to the input and output formats of custom module LSTM. In the future, for the longer-term solution, we should design a more general QConfig that allows users to specify complex input and output formats, and enable FX graph mode quantization to understand arbitrary nested structures and automatically infer how to transform the graph accordingly. Context: Today, in FX graph mode static quantization, custom modules are assumed to have quantized inputs and quantized outputs, with the exact dtypes derived from the associated QConfig (default quint8). Since custom modules are currently not handled through the reference model flow, their observer replacement logic are a little different from normal operators: ``` # (1) Original model input -> custom_module -> output # (2) Observed model (after prepare) input -> obs0 -> custom_module -> obs1 -> output # (3) Quantized model (after convert) input -> quant -> quantized_custom_module -> dequant -> output ``` In the last step, input observers are replaced with "quantize" and output observers are replaced with "dequantize", in contrast to other non-custom-module patterns where observers are replaced with "quantize-dequantize" pairs instead. Note that, conceptually, the output observer `obs1` is really just a DeQuantStub, since no observation is actually needed. Custom module LSTM: The reason why custom module LSTM cannot be handled in the same way is because, unlike other custom modules, its inputs and outputs are nested tuples instead of single tensors. This is how the existing custom module code would try to handle LSTMs: ``` # (1) Original model # input format: (input, (hidden0, hidden1)) # output format: (output, (hidden0, hidden1)) input -> lstm -> output hidden0 -/ \-> hidden0 hidden1 -/ \-> hidden1 # (2) Observed model (after prepare) input -> obs0 -> lstm -> obs1 # fails hidden0 -/ # missing observer hidden1 -/ # missing observer ``` However, this fails today because 1) we assume there is only one input to the custom module, and so we never end up quantizing `hidden0` and `hidden1`, and 2) the output observer `obs1` is fed a tuple, which it does not understand how to handle. Short-term fix: This commit addresses the above by specifically handling the input and output structures used by custom module LSTM. For the inputs, we manually insert observers for `hidden0` and `hidden1` to ensure all input tensors are quantized. For the outputs, we split the tuple into its internal nodes, attach a DeQuantStub to each node, and recombine these DeQuantStubs according to the original structure. Finally, we must also reroute consumers of the original LSTM tuple (and its internal nodes, e.g. `lstm[0]`) to these DeQuantStubs: ``` # (1) Original model input -> lstm -> output -> linear0 hidden0 -/ \-> hidden0 -> linear1 hidden1 -/ \-> hidden1 -> linear2 # (2) Observed model (after prepare) input -> obs0 -> lstm -> output -> dqstub -> linear0 -> obs3 hidden0 -> obs1 -/ \-> hidden0 -> dqstub -> linear1 -> obs4 hidden1 -> obs2 -/ \-> hidden1 -> dqstub -> linear2 -> obs5 # (3) Reference model (after convert) input -> quant -> qlstm -> output -> dequant -> linear0 -> quant -> dequant hidden0 -> quant -/ \-> hidden0 -> dequant -> linear1 -> quant -> dequant hidden1 -> quant -/ \-> hidden1 -> dequant -> linear2 -> quant -> dequant # (4) Quantized model (after lowering) input -> quant -> qlstm -> output -> quantized_linear0 -> dequant hidden0 -> quant -/ \-> hidden0 -> quantized_linear1 -> dequant hidden1 -> quant -/ \-> hidden1 -> quantized_linear2 -> dequant ``` Note that we choose to insert DeQuantStubs here instead of observers because these will ultimately be replaced by "dequantize" nodes. This matches the general custom module behavior, where output observers are replaced only with "dequantize" nodes (as opposed to the normal "quantize-dequantize" pair), since custom module outputs are assumed to already be quantized. Using DeQuantStubs instead of observers also simplifies the "dequantize" insertion logic. In the future, we should use DeQuantStubs in place of output observers for custom modules in general. Test plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm python test/test_quantization.py TestQuantizeFx.test_static_lstm_consume_tuple Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85068 Approved by: https://github.com/jerryzh168	2022-09-23 13:53:39 +00:00
Vasiliy Kuznetsov	cb8e73bb71	fx quant: fix bug in custom module test (#85344 ) Summary: `TestQuantizeFx.test_custom_module_class` was subtly broken because the various parts of the test case were modifying the original model. This seems incorrect because `prepare_fx` and `convert_fx` are inplace. To fix this, we can `copy.deepcopy` the model before applying the test cases to it. This test case was triggered by an unrelated refactor, splitting the fix in a separate diff to keep the refator clean. Test plan: ``` python test/test_quantization.py TestQuantizeFx.test_custom_module_class ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85344 Approved by: https://github.com/dzdang, https://github.com/z-a-f, https://github.com/jerryzh168	2022-09-20 21:12:17 +00:00
Feisi Fu	d8eae6283d	Rename 'torch/ao/nn/quantized._reference' to 'torch/ao/nn/quantized/reference'. (#84974 ) Currently, the path for reference modules contains _ which means it's private (https://github.com/pytorch/pytorch/tree/master/torch/ao/nn/quantized/_reference), but we would like to make it public since the reference module is now enabled by default in the fx graph mode quantization flow and it will be added to eager mode flow as well in the future. To make '_reference' public, it should satisfy the [public API rules](https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation). I did in the first commit (prepare '_reference' to be public): 1: add __all__ to public modules and packages; 2. made functions, that are only used in the file that the function is defined, private by adding _ at their names. Fixes #83090. (we rename the 'torch/ao/nn/quantized/_reference', because of migration #81667.) This is a dup for the #84786. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84974 Approved by: https://github.com/andrewor14, https://github.com/z-a-f	2022-09-16 17:49:07 +00:00
Jerry Zhang	44c30c5d1c	[quant][docs] Add example for the error message for fixed qparam ops (#84666 ) Summary: att, since example makes it clearer what the user needs to do Test Plan: local test for the error message Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84666 Approved by: https://github.com/vkuzo, https://github.com/andrewor14	2022-09-14 03:43:00 +00:00
Jesse Cai	d6b2f5c643	[Quant][fx] Remove `remove_quant_dequant_pairs` and fix tests (#84203 ) Summary: - `remove_quant_dequant_pairs` removes ops when a `quant` is followed by a `dequant` - It looks like the quantized implementation of `layer_norm` only supports float weights, so updated the default qconfig to avoid quantizing the weight param. - Fixes broken test, `test_norm_weight_bias`. This was the only test that broke, because the default qconfig dict we pass in quantizes the weight. I just pulled the native qconfig object and converted it to a dict. - Adds in qconfig and backend config support for layernorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels ``` Reviewers: Subscribers: Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110 Tags: quant, fx Differential Revision: [D39395141](https://our.internmc.facebook.com/intern/diff/D39395141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84203 Approved by: https://github.com/jerryzh168	2022-09-12 16:32:15 +00:00
zaf	2f04ba2c7c	[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:38 +00:00
zaf	b1455f9424	[quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)! Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:36 +00:00
zaf	d32a762147	[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:34 +00:00
zaf	c92e5ac95b	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012/) Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:33 +00:00
Sergii Dymchenko	591222f5d9	Fix use-dict-literal lint (#83718 ) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD	2022-08-24 00:26:46 +00:00
Jerry Zhang	a419e483b2	[quant][fx] Add support for quantized matmul (#83885 ) Summary: att, probably missed the op during migration to the reference flow Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_qmatmul Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83885 Approved by: https://github.com/andrewor14	2022-08-23 05:46:25 +00:00
PyTorch MergeBot	6a9c02339d	Revert "[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 )" This reverts commit `432f037498`. Reverted https://github.com/pytorch/pytorch/pull/78713 on behalf of https://github.com/janeyx99 due to Reverting for breaking (trunk-only) ios build	2022-08-22 07:32:37 +00:00
PyTorch MergeBot	b1a7b67529	Revert "[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 )" This reverts commit `e6fb97d8ae`. Reverted https://github.com/pytorch/pytorch/pull/78714 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:30:48 +00:00

1 2 3 4 5

217 Commits