pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Anthony Barbier	954ce94950	Add __main__ guards to quantization tests (#154728 ) This PR is part of a series attempting to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs. In quantization tests: - Add and use a common raise_on_run_directly method for when a user runs a test file directly which should not be run this way. Print the file which the user should have run. - Raise a RuntimeError on tests which have been disabled (not run) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154728 Approved by: https://github.com/ezyang	2025-06-10 19:46:07 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
zaf	c92e5ac95b	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012/) Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:33 +00:00
PyTorch MergeBot	6a9c02339d	Revert "[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 )" This reverts commit `432f037498`. Reverted https://github.com/pytorch/pytorch/pull/78713 on behalf of https://github.com/janeyx99 due to Reverting for breaking (trunk-only) ios build	2022-08-22 07:32:37 +00:00
zaf	432f037498	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D36860145](https://our.internmc.facebook.com/intern/diff/D36860145/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-22 01:38:55 +00:00
Andrew Or	8aedd8fb25	[Quant][fx] Hide equalization_config from prepare APIs (#80164 ) Summary: This PR hides the equalization_config argument from prepare_fx. This is a private API that we do not wish to expose to users and have to maintain backward compatibility for. Test Plan: python test/test_quantization.py TestEqualizeFx Reviewers: jerryzh168 Subscribers: jerryzh168 Differential Revision: [D37394353](https://our.internmc.facebook.com/intern/diff/D37394353) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80164 Approved by: https://github.com/jerryzh168	2022-06-28 04:20:34 +00:00
Andrew Or	c7b4eec233	[Quant][fx][bc-breaking] Replace qconfig_dict with a config object (#78452 ) Summary: Previously, FX graph mode quantization configurations were specified through a dictionary of qconfigs. However, this API was not in line with other core APIs in PyTorch. This commit replaces this dictionary with a config object that users will create and pass to prepare and convert. This leads to better type safety and better user experience in notebook settings due to improved auto completion. The new API is as follows: ``` from torch.ao.quantization import QConfigMapping from torch.ao.quantization.quantize_fx import prepare_fx qconfig_mapping = QConfigMapping() .set_global(qconfig) .set_object_type(torch.nn.Linear, qconfig) .set_module_name_regex("foo.bar", qconfig) .set_module_name("mod", qconfig) prepare_fx(model, qconfig_mapping) ``` For backwards compatibility, `prepare_fx`, `prepare_qat_fx`, and `convert_fx` will continue to accept qconfig_dicts, which will be converted to QuantizationConfigs internally. Note that this commit does not modify existing tests to use the new API; they will continue to pass in qconfig_dict as before, which still works but triggers a deprecation warning. This will be handled in a future commit. Test Plan:* python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: D36747998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452 Approved by: https://github.com/jerryzh168	2022-05-30 18:30:07 +00:00
Jerry Zhang	8225f42a8a	[quant][fx][equalization] Fix example_inputs follow ups in test_equalize_fx Summary: as a followup to https://github.com/pytorch/pytorch/pull/76496, we defined model specific example_inputs for the test models in common_quantization.py and used these in test_equalize_fx Test Plan: python test/test_quantization.py TestEqualizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78314 Approved by: https://github.com/vkuzo	2022-05-26 01:42:24 +00:00
Jerry Zhang	416899d1a9	[quant][fx][bc-breaking] Add required example_args argument to prepare_fx and prepare_qat_fx (#249 ) (#77608 ) Summary: X-link: https://github.com/facebookresearch/d2go/pull/249 X-link: https://github.com/fairinternal/ClassyVision/pull/104 X-link: https://github.com/pytorch/benchmark/pull/916 X-link: https://github.com/facebookresearch/ClassyVision/pull/791 X-link: https://github.com/facebookresearch/mobile-vision/pull/68 FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors. Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base. As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now. If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to pass the arguments by keyword BC-breaking Note: Before: ```python m = resnet18(...) m = prepare_fx(m, qconfig_dict) # or m = prepare_qat_fx(m, qconfig_dict) ``` After: ```python m = resnet18(...) m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),)) # or m = prepare_qat_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),)) ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Imported from OSS Static Docs Preview: classyvision \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D35984526/V30/classyvision/)\| \|Modified Pages\| Reviewed By: vkuzo, andrewor14 Differential Revision: D35984526 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77608 Approved by: https://github.com/dzdang	2022-05-21 21:03:48 +00:00
Supriya Rao	d13829e6be	[quant][[fx] update observer_fqn to not depend on node.name (#66767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767 Make observer fqn in prepare step independent of input_node/observed_node name. This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique. Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: anjali411 Differential Revision: D31752052 fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d	2021-10-22 21:16:24 -07:00
Jane Xu	6a224b3370	Set test owners for quantization tests (#66832 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/66832 Reviewed By: saketh-are Differential Revision: D31842880 Pulled By: janeyx99 fbshipit-source-id: 8aee760e4203045c12e7548a21ed5b71c557e3ee	2021-10-21 16:04:41 -07:00
Vasiliy Kuznetsov	227e37dd39	pytorch quantization ao migration phase 2: caffe2/test (#65832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65832 Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/test` folder. ``` find caffe2/test/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" HG: manually revert the files testing this migration hg revert caffe2/test/quantization/ao_migration/common.py hg revert caffe2/test/quantization/ao_migration/test_ao_migration.py ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31275754 fbshipit-source-id: 4ed54a74525634feb0f47a26d071102e19c30049	2021-10-01 06:26:30 -07:00
Supriya Rao	767a104698	[quant] change observer FQNs generated in prepare step (#65420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65420 Context: In some FB use cases we have a need to map observer stats from train model checkpoint to inference model. We observerd that some buffer names are different becuase the intermediate activation tensors are generated differently across train and inference model. More details in https://fb.quip.com/PtGcAR0S5CQP Currently, for each observer (activation_post_process), the FQN of the module inserted is determined based on the FQN of the input tensor it is observing. In this change we change the observer FQN to include the FQN of the op/module it is observing rather than tensor/intermediate op names along with the “input”/“output” detail. Before ``` def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(x_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None linear_activation_post_process_0 = self.linear_activation_post_process_0(linear); linear = None return linear_activation_post_process_0 ``` After ``` def forward(self, x): mods1_input_activation_post_process_0 = self.mods1_input_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(mods1_input_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None mods1_output_activation_post_process_0 = self.mods1_output_activation_post_process_0(linear); linear = None return mods1_output_activation_post_process_0 ``` Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31088652 fbshipit-source-id: 2f1526f578a13000b34cfd30d11f16f402fd3447	2021-09-23 09:08:10 -07:00
Philip Meier	57d4c6cf42	replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637 Reviewed By: malfet Differential Revision: D30541266 Pulled By: mruberry fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6	2021-08-25 16:47:40 -07:00
Angela Yi	d9154b9b26	[quant] Input-Weight Equalization - allow logical evaluation (#61603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61603 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29686878 fbshipit-source-id: 67ca4cab98b3d592ff2bb8db86499789b85bd582	2021-08-06 15:10:32 -07:00
Angela Yi	836b2431dc	[quant] Input-Weight Equalization - selective equalization (#61916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61916 Functions used to run selective equalization based on the SQNR obtained from running the Numeric Suite. After running the Numeric Suite between the equalized and float model, we will get the SQNR between the two models and construct an equalization_qconfig_dict that specifies to only equalize the layers with the highest quantization errors. How to run: ``` layer_to_sqnr_dict = get_layer_sqnr_dict(float_model, equalized_model, input) eq_qconfig_dict = get_equalization_qconfig_dict(layer_to_sqnr_dict, equalized_model, num_layers_to_equalize) prepared = prepare_fx(float_model, qconfig_dict, eq_qconfig_dict) ... ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_selective_equalization` Imported from OSS Reviewed By: supriyar Differential Revision: D29796950 fbshipit-source-id: 91f0f8427d751beaea32d8ffc2f3b8aa8ef7ea95	2021-08-06 09:29:03 -07:00
Angela Yi	91ef19309e	[quant] Input-weight equalization - branch support (#62366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62366 In the case of models with branches, we are unable to equalize the branching part in the graph. For example, given this graph: ``` conv2 / \ x -> conv1 -> add ``` After prepare, we will ignore the branched layers (conv1 and conv2) and will not insert the equalization observers. A warning message will also be printed with the layers that are unable to be equalized. ``` conv2 -> out_quant_obs2 / \ x -> input_quant_obs -> conv1 -> out_quant_obs1 -> add ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Imported from OSS Reviewed By: malfet, supriyar Differential Revision: D29982585 fbshipit-source-id: 706297e7f1861975998dfa83e7ca59af09d80618	2021-08-03 12:45:25 -07:00
Supriya Rao	cfd0f5ebc9	[quant] update per-channel observer min/max_val attribute names (#62345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345 This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes. It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version. Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR. Test Plan: python test/test_quantization.py TestSerialization Imported from OSS Reviewed By: HDCharles Differential Revision: D30003700 fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a	2021-07-29 22:28:53 -07:00
Angela Yi	0751a41ab1	[quant] Input-Weight Equalization - ConvReLU support (#61350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350 Applied changes in convert to allow for ConvReLU2d layers Initial Model: `x -> conv1 -> relu` After fusion: `x -> convRelu2d` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial Model: ``` ConvReluModel( (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (relu): ReLU() ) ``` After prepare: ``` GraphModule( (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552) (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver( (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000])) ) (fc): ConvReLU2d( (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU() ) (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981) ) graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29638275 fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb	2021-07-13 14:00:40 -07:00
Angela Yi	b3e4dab45a	[quant] Input-Weight Equalization - Conv convert support (#61287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287 Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` After convert: ``` scale, zero_point w (scaled) \| \| x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y \| eq_scale ``` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial model: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0] %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557055 fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e	2021-07-13 14:00:38 -07:00
Angela Yi	77d36b657a	[quant] Input-Weight Equalization - Conv prepare support (#61286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286 Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Initial: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29557051 fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9	2021-07-13 14:00:36 -07:00
Angela Yi	ce9cedd119	[quant] Input-Weight Equalization - Conv observer support (#61285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285 Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557041 fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6	2021-07-13 13:59:23 -07:00
Angela Yi	1a0195db49	[quant] Input-Weight Equalization - support for LinearReLU layers (#60653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653 Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers. Initial Model: `x -> linear1 -> relu` After fusion: `x -> linearRelu` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize` More step-throughs here: https://fb.quip.com/A9J3AsBxkykR Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` LinearReluModel( (fc): Linear(in_features=5, out_features=5, bias=True) (relu): ReLU() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29406999 fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4	2021-06-30 14:22:06 -07:00
Angela Yi	da70dd199d	[quant] Input-Weight Equalization - tests (#60378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60378 Created the following unit-tests to check that our equalization algorithm is as expected: - Check the equalization scales calculated and stored in the graph are as expected - Check the scaled weights and biases are as expected - Check that the min/max values in the quantization observers are as expected - Check that the graphs with equalization are structured in the same way as graphs without equalization (except that equalized graphs have additional equalization scale and mul nodes) before and after quantization Test Plan: `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_equalization_scales` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_weights_bias` `python test/test_quantization TestEqualizeFx.test_input_activation_values` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_graphs` Imported from OSS Reviewed By: supriyar Differential Revision: D29406942 fbshipit-source-id: 518208546ae5835c1ebb2af217507e90af66fbe4	2021-06-28 10:44:29 -07:00
Angela Yi	dfb9c0bae8	[quant] Input-Weight Equalization - support for connected F.linear layer (#60272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272 Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` FunctionalLinear2Module( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after equalization steps: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0] %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0] %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0] %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29267218 fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7	2021-06-28 10:44:27 -07:00
Angela Yi	ddf2ce03bb	[quant] Input-Weight Equalization - support for connected linear layers (#60034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034 Added support for equalizing models with connected linear layers. To account for connected linear layers, we will additionally multiply the previous weight values (row-wise) by the next equalization scale, and remove the input equalization observer between the two linear layers. We also want to scale the bias by the next equalization scale. The math is shown here: https://fb.quip.com/fK8rA9aRM4ca . Original Model: `x -> linear1 -> linear2` After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 -> OutQuantObs -> InpEqObs -> linear2` After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs -> linear2` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` Linear2Module( (linear1): Linear(in_features=2, out_features=2, bias=True) (linear2): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after equaliation functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29204347 fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1	2021-06-28 10:44:25 -07:00
Angela Yi	7917318917	[quant] Input-Weight Equalization - support for F.linear layers (#59964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964 Input-Weight Equalization support for functional layers Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original model: ``` FunctionalLinearModule( (linear1): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135459 fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde	2021-06-28 10:44:24 -07:00
angelayi	e13a9587b4	Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646 This reverts commit `e60f9cfc58`. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29361191 Pulled By: angelayi fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714	2021-06-25 15:37:05 -07:00
Rong Rong (AI Infra)	e60f9cfc58	Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications Test Plan: revert-hammer Differential Revision: D29135358 (`3de79b7757`) Original commit changeset: 2d0005672904 fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f	2021-06-23 11:23:24 -07:00
Angela Yi	3de79b7757	[quant] Input-Weight Equaliaztion - convert modifications (#59963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963 When converting, before quantizing the nodes, we call `update_obs_for_equalization()` and `convert_eq_obs()`. `update_obs_for_equalization`: 1. For each InputEqualizationObserver, we find the corresponding WeightEqualizationObserver. 2. For nn.Linear layers, we will create an instance of the WeightEqualizationObserver, run forward on the observer with the given weights. 3. Calculate the equalization scale between the InputEqualizationObserver and WeightEqualizationObserver. `convert_eq_obs`: For every InputEqualizationObserver, we will do the following: 1. Create a node (ex. `x0_activation_post_process_scale`) containing the equalization scale constant. 2. Create another node containing a `mul` operator multiplying the equalization scale and the input. 3. Remove the current InputEqualizationObserver node, and replace it with the `mul` node. For every WeightEqualizationObserver, we will do the following: 1. Get the next equalization scale (we may need this for equalizing connected linear layers). 2. Scale the weights by multiplying it with the reciprocal of the current equalization scale and the next equalization scale Currently, this supports models with `nn.Linear` layers, but does not support connecting linear layers. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` .LinearModule( (linear): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0] %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135358 fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5	2021-06-22 20:43:30 -07:00
Angela Yi	c0b7c59e55	[quant] Equalization Observer modifications (#59953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953 The following modifications were made to the equalization observers due to design changes: - [InputEqualizationObserver] Replaced `calculate_qparams()` with `calculate_scaled_minmax()` since we will need to return the scaled min/max values to update the following input quantization observer - [WeightEqualizationObserver] We no longer need a row observer since this will be taken care of by the following weight quantization observer - [WeightEqualizationObserver] Following the previous comment, we no longer need to calculate the scaled qparam values. Instead, we will use the equalization scale to later scale the weights and the qparams will be taken care of by the weight quantization observer. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: supriyar Differential Revision: D29135332 fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831	2021-06-16 22:32:30 -07:00
Angela Yi	45c31cabb5	[quant] Input Weight Equalization - prepare modifications (#59747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59747 Modifies prepare_fx for input-weight equalization. If a current node is being equalized (there exists a EqualizationQConfig), then the EqualizationObserver will be inserted before its quantization observer. For a singular linear layer, the general flow looks like: Original graph: `x0 -> linear -> x1`, `w -> linear` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> x1` `w -> WeightEqObs -> MinMaxObs -> linear1` For two connected linear layers, the general flow looks like: Original graph: `x0 -> linear1 -> linear2 -> x1`, `w1 -> linear1`, `w2 -> linear2` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> InpEqObs -> linear2 -> MinMaxObs -> x1` `w1 -> WeightEqObs -> MinMaxObs -> linear1`, `w2 -> WeightEqObs -> MinMaxObs -> linear2 Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_equalization_prepare` Original model with one `nn.Linear` layer ``` LinearModule( (linear): Linear(in_features=1, out_features=1, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` -------------------------------------- Original model with two connected functional linear layers ``` FunctionalLinearModule( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_equalization_process_0 : [#users=1] = call_module[target=linear2_w_equalization_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_00](args = (%linear2_w_equalization_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135316 fbshipit-source-id: 91697e805ede254dbb2a42ee4c23eb1c1c64590e	2021-06-16 22:32:28 -07:00
Angela Yi	cc03ea2c47	[quant] Implemented InputWeightObserver for Linear inputs Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx(). Test Plan: python test/test_quantization.py TestEqualizeFx Reviewed By: supriyar Differential Revision: D28836954 fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104	2021-06-07 11:19:43 -07:00

34 Commits