pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jerry Zhang	7ddf212f33	[quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863 This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first, and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack). This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code in quantization_patterns.py as well (in followup PRs). Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` and other internal/oss regression tests Imported from OSS Reviewed By: andrewor14 Differential Revision: D34778506 fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b (cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)	2022-03-11 17:11:30 +00:00
andrewor	4a8f27445d	[Quant] Add dynamic QAT Linear module (#67325 ) Summary: Summary: This commit adds the `torch.nn.qat.dynamic.modules.Linear` module, the dynamic counterpart to `torch.nn.qat.modules.Linear`. Functionally these are very similar, except the dynamic version expects a memoryless observer and is converted into a dynamically quantized module before inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325 Test Plan: `python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear` Reviewers: Charles David Hernandez, Jerry Zhang Subscribers: Charles David Hernandez, Supriya Rao, Yining Lu Tasks: 99696812 Tags: pytorch Reviewed By: malfet, jerryzh168 Differential Revision: D32178739 Pulled By: andrewor14 fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73	2021-11-08 10:24:25 -08:00
Zafar Takhirov	b23709df03	[ao_migration] torch/nn/quantized: torch.quantization -> torch.ao.quantization (#65900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900 This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations. ``` codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301193 fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f	2021-10-08 16:19:53 -07:00
Supriya Rao	c7027f19ef	[quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799 Add a new module that can be used for module swap with the nni.LinearReLU module in convert function. Supports INT8 currently (since FP16 op doesn't have relu fusion yet). Fixes #55393 Test Plan: python test/test_quantization.py test_dynamic_fusion Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30502812 fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51	2021-08-26 21:10:46 -07:00
Basil Hosmer	58d1b3639b	fix nn.MHA scriptability (#58727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580	2021-05-26 15:29:49 -07:00
Zafar	e12008d110	[quant] Mapping for the `_LinearWithBias` (#49964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49964 `torch.nn.modules.linear._LinearWithBias` is only used in the transformers, and is completely identical to the `torch.nn.Linear`. This PR creates a mapping so that this module would be treated the same as the Linear. Test Plan: ``` python test/test_quantization.py TestDynamicQuantizedModule TestStaticQuantizedModule ``` Differential Revision: D25731589 Reviewed By: jerryzh168 Pulled By: z-a-f fbshipit-source-id: 1b2697014e250e97d3010cdb542f9d130b71fbc3	2021-01-07 13:57:29 -08:00
Jerry Zhang	be2e3dd2a1	[quant][graphmode][fx][fix] Linear work with float_qparam_dynamic_qconfig (#47068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47068 Filter the dtype config before performing the quantization in linear Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24627907 fbshipit-source-id: 162fa47b3fcf6648049f8bc0438e41ee97ac19e9	2020-11-02 16:28:33 -08:00
Gao, Xiang	37658b144b	Remove useless py2 compatibility import __future__, part 1 (#43808 ) Summary: To avoid conflicts, this PR does not remove all imports. More are coming in further PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808 Reviewed By: wanchaol Differential Revision: D23436675 Pulled By: ailzhang fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd	2020-09-02 19:15:11 -07:00
Raghuraman Krishnamoorthi	480851ad2c	Docstring changes for dynamic quantized classes (#40931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931 Fix docstrings for dynamic quantized Linear/LSTM and associated classes ghstack-source-id: 107064446 Test Plan: Docs show up in correctly Differential Revision: D22360787 fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2	2020-07-03 21:04:12 -07:00
Supriya Rao	25a6c5f60f	[quant] Dynamic Linear module to use reduce_range (#39125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39125 switch to setting reduce_range to true for version > 3. Models serialized with older state_dict will have version <=3 so will be run with reduce_range=false Verified with backward compatibility tests (works with no changes to these tests) Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21769689 fbshipit-source-id: 131f2ae736e31705222e82bdc77480f2f1826fe8	2020-05-29 18:21:57 -07:00
Jianyu Huang	8224398c14	[pytorch] Fix the extra_repr print message for float16 dynamic quantization (#36044 ) Summary: When applying the float16 dynamic quantization with ``` model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.float16 ) print(model) ``` there is an issue when we try to print the model. Basically we cannot print the `qscheme` information for float16 weight (It is not per-tensor or per-channel quantization defined for int8 dynamic quantization). Before this PR: ``` Traceback (most recent call last): File "dlrm_s_pytorch.py", line 860, in <module> print(dlrm) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__ mod_str = repr(module) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__ mod_str = repr(module) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1136, in __repr__ extra_repr = self.extra_repr() File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/quantized/dynamic/modules/linear.py", line 55, in extra_repr self.in_features, self.out_features, self.weight().qscheme() RuntimeError: Could not run 'aten::qscheme' with arguments from the 'CPUTensorId' backend. 'aten::qscheme' is only available for these back ends: [QuantizedCPUTensorId, VariableTensorId]. ``` After this PR: ``` (4): DynamicQuantizedLinear( in_features=2, out_features=1, dtype=torch.float16 (_packed_params): LinearPackedParams() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36044 Differential Revision: D20860811 Pulled By: jianyuh fbshipit-source-id: d1405a185f46a8110e6d27982b40534c854f4d1c	2020-04-05 14:27:42 -07:00
James Reed	812b1ad869	[quantization] FP16 dynamic quantized Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331 Test Plan: Imported from OSS Differential Revision: D19441158 Pulled By: jamesr66a fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11	2020-01-27 15:45:32 -08:00
Jianyu Huang	0bebfe2143	Add the explicit per-tensor/per-channel quant info when we print the module (#30591 ) Summary: As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module. Here is an example for Lenet after applying the per-channel dynamic quantization: Before this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500 (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10 (_packed_params): LinearPackedParams() ) ) ``` After this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591 Differential Revision: D18764366 Pulled By: jianyuh fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a	2019-12-02 20:14:46 -08:00
James Reed	97fae401f0	Use LinearPackedParams everywhere Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198 Test Plan: Imported from OSS Differential Revision: D18628003 Pulled By: jamesr66a fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa	2019-11-22 11:31:17 -08:00
Lingyi Liu	7d3afc4186	enable the per channel dynamic quantization (#30122 ) Summary: The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122 Differential Revision: D18630541 Pulled By: lly-zero-one fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf	2019-11-21 10:12:05 -08:00
Jianyu Huang	b1ea19ca17	Update the misleading comments for zero_points and scale in dynamic quant linear module [1/2] (#28767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28767 The scale and zero_point are for the output activation tensor, not for the weight tensor. We removed them here because we don't need the zero points and scales for the output tensors in dynamic quantization. ghstack-source-id: 92807318 Test Plan: CI Differential Revision: D18164949 fbshipit-source-id: 0f9172bfef615c30dc28e1dd4448a9f3cc897c2e	2019-10-29 17:20:32 -07:00
Jianyu Huang	ef5a6b2262	Avoid the misleading zero_point and scale [2/2] (#28827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28827 When we print the `DynamicLinear` module, we don't want to print the scale and zero points as they are not needed for the dynamic quantization. Let's take the output of RoBERTa model as an example: Before this PR: ``` (19): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (20): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ``` After this PR: ``` (19): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (20): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ``` ghstack-source-id: 92807317 Test Plan: CI Differential Revision: D18197022 fbshipit-source-id: e41635330cfdfb008a0468d6a8ff67a06f7e1c59	2019-10-29 12:02:45 -07:00
Zafar Takhirov	dc8785a022	Refactoing names for consistency Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27670 Test Plan: Imported from OSS Differential Revision: D17846269 Pulled By: z-a-f fbshipit-source-id: ed3c7441c185bf11b2e62879aa3ecbc654aa2d4e	2019-10-16 12:18:26 -07:00
James Reed	4d7bec5f3e	Improve repr for quantized modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27008 Test Plan: Imported from OSS Differential Revision: D17649174 Pulled By: jamesr66a fbshipit-source-id: e3e6c4bb31e1ad8ed1ebe27f803f90d564ecfe53	2019-09-28 15:15:14 -07:00
Dmytro Dzhulgakov	128a65e2e0	Use noop observer to pass dtype for dynamic quantization (#26709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709 Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow. One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig. Test Plan: Imported from OSS Differential Revision: D17544103 Pulled By: dzhulgakov fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f	2019-09-24 09:24:39 -07:00
Jerry Zhang	254122dd4e	quantize_linear -> quantize_per_tensor (#26574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574 Since we also have `quantized::linear`, `quantize_linear` sounds confusing, so we plan to rename it before the branch cut Test Plan: ci Imported from OSS Differential Revision: D17514876 fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3	2019-09-20 21:58:48 -07:00
Supriya Rao	9d2d31e626	Store bias in PackedLinearWeight struct in fbgemm (#25428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428 Added bias as an optional param to the quantized_linear_prepack function. Bias is quantized during runtime using input scale and weight scale. ghstack-source-id: 89601399 Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer Differential Revision: D17121304 fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a	2019-09-06 08:37:34 -07:00
Jianyu Huang	6cf14361f4	Add the default_weight_observer for the dynamic quantization path (#24231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24231 As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r309528932, we will add a default weight observer for the dynamic quantization path. We need to move `observer` and `qconfig` to a separate namespace. ghstack-source-id: 88583658 Differential Revision: D16781092 fbshipit-source-id: 5cd59c881a7f98b82704ca318b1e63650d73062a	2019-08-19 14:54:22 -07:00
James Reed	a1b111709d	Assert weight_observer has the correct dtype Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24436 Test Plan: Imported from OSS Differential Revision: D16847141 Pulled By: jamesr66a fbshipit-source-id: 1dde5c26449115b53e71d410b41204d743787c44	2019-08-15 19:40:14 -07:00
Jerry Zhang	754bf383b1	Change return type of observer to two tensors (#24339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339 Att Differential Revision: D16820813 fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254	2019-08-15 10:26:44 -07:00
James Reed	a919fc3704	test {__init__,from_float} on nnq{,d}.Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24364 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D16812543 Pulled By: jamesr66a fbshipit-source-id: be05a658fa4562f3fcf3548e30b1fe9a77d1151c	2019-08-14 17:42:23 -07:00
Jianyu Huang	e94ba742b0	Dynamic Quantized Linear Module (#23128 ) Summary: - ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review. - Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~. - Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization. - Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```. - Add a unit test for the Model-level Quantization API Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128 ghstack-source-id: 88257232 Differential Revision: D16258664 fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0	2019-08-13 21:01:23 -07:00

27 Commits