pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kimish Patel	4cb6add471	[PT2][Quant] Use module partition for fused patterns (#102394 ) This diff introduces utility `find_sequential_partitions`. This utility allows one to specify sequential pattern of nn.Module/nn.functional and returns a list. Each item in the list contains a List[SourcePartition] that represents sequentially connected partitions that are of the pattern requested. For example `find_sequential_partitions(model, [nn.Conv2d, nn.ReLU])` will find all nn.Conv2d and nn.ReLU partitions that are sequentially connected. Furthmore, move to using `find_sequential_partitions` for conv_bn/conv_bn_relu for QAT. Differential Revision: [D45948057](https://our.internmc.facebook.com/intern/diff/D45948057/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D45948057/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/102394 Approved by: https://github.com/jerryzh168	2023-05-28 05:29:16 +00:00
Jerry Zhang	eda5abf5e0	[quant][pt2e] Fix propagate_annotation after recent refactors (#102422 ) Summary: Recently we changed the annotation from "target_dtype_info" to "quantization_annotation" and introduced QuantizationAnnotation API and SharedQuantizationSpec API for users to convey sharing between input/outputs, this PR updates the _propagate_annotation pass to accommadate the recent changes Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' ``` Reviewed By: kimishpatel Differential Revision: D46153084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102422 Approved by: https://github.com/kimishpatel	2023-05-27 16:01:47 +00:00
Jerry Zhang	23223402eb	[quant][pt2e] Add Support for DerivedQuantizationSpec (#102282 ) Summary: ``` """ 4. DerivedQuantizationSpec this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors """ class DerivedQuantizationSpec(QuantizationSpecBase): # specifies which Tensors the quantization parameters are derived from # this can either be an edge from argument to node, or a node derived_from: List[EdgeOrNode] derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]] ... ``` Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D46097855 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282 Approved by: https://github.com/andrewor14	2023-05-27 00:24:39 +00:00
Jerry Zhang	ed87508b32	[quant][pt2e] Add support for SharedQuantizationSpec (#102184 ) Summary: This PR adds support for SharedQuantizationSpec, it's used to express the sharing between two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or output of some node (expressed as an fx Node) Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Differential Revision: D46043026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184 Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel	2023-05-25 17:31:59 +00:00
Riley Dulin	424c930f76	Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926 ) Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules nn.PixelShuffle and nn.PixelUnshuffle. torch.nn.PixelUnshuffle accepts both float and quantized inputs. However, previously we would unnecessarily dequantize quantized inputs into floats before passing them to the function. This commit fixes this by lowering the pattern [dequant - PixelShuffle - quant]. [dequant - PixelUnshuffle - quant]. Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926 Approved by: https://github.com/jerryzh168	2023-05-24 19:33:26 +00:00
Jerry Zhang	3baa67caee	[quant][pt2e][be] Move annotate helper function to quantizer/utils.py (#102127 ) Summary: att Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D46001285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102127 Approved by: https://github.com/kimishpatel	2023-05-24 16:13:28 +00:00
Matthew Hoffman	29da75cc55	Enable mypy allow redefinition (#102046 ) Related #101528 I tried to enable this in another PR but it uncovered a bunch of type errors: https://github.com/pytorch/pytorch/actions/runs/4999748262/jobs/8956555243?pr=101528#step:10:1305 The goal of this PR is to fix these errors. --- This PR enables [allow_redefinition = True](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern: > Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition. `allow_redefinition` allows mypy to be more flexible by allowing reassignment to an existing variable with a different type... for instance (from the linked PR): `4a1e9230ba/torch/nn/parallel/data_parallel.py (L213)` A `Sequence[Union[int, torch.device]]` is narrowed to `Sequence[int]` thru reassignment to the same variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102046 Approved by: https://github.com/ezyang	2023-05-24 07:05:30 +00:00
Jerry Zhang	94ed26d177	[quant][pt2e] prepare_pt2e use quantization spec directly (#102054 ) Summary: In this PR we aligned with the design of annotation API and uses quantization spec directly for annotation. main change is in prepare, we consume quantization_spec object directly instead of the observer or fake quant constructor, we create the constructor inside prepare, and annotation api users only need to interact with quantization spec object after this PR Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D45934088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102054 Approved by: https://github.com/kimishpatel	2023-05-23 23:25:56 +00:00
Jerry Zhang	f7c736e1e7	[quant][pt2e] Add observer_or_fake_quant_ctr to QuantizationSpec (#101920 ) Summary: This is the second refactor to align the annotation API with design, next step is to change prepare_pt2e to consume QuantizationSpec object directly Test Plan: ``` buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D45927416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101920 Approved by: https://github.com/andrewor14	2023-05-23 05:48:23 +00:00
Jerry Zhang	15495f2d96	[quant][pt2e] Introduce QuantizationAnnotation API (#101708 ) Summary: This diff adds QuantizationAnnotation and also refactors the existing annotation to use this object ``` dataclass class QuantizationAnnotation: # How some input nodes should be quantized, expressed as QuantizationSpec # a map from torch.fx.Node to QuantizationSpec input_qspec_map: Dict[Node, QuantizationSpec] # How the output of this node is quantized, expressed as QuantizationSPec output_qspec: QuantizationSpec class QuantizationSpec: dtype: torch.dtype is_dynamic: bool = False quant_min: Optional[int] = None quant_max: Optional[int] = None qscheme: Optional[torch.qscheme] = None ch_axis: Optional[int] = None # TODO: follow up PR will add this # Kind of observer such as MinMaxObserver, PerChannelHistogramObserver etc. # observer_or_fake_quant_type: Union[ObserverBase, FakeQuantizeBase] ``` Example after full refactor: ``` int8_qspec = QuantizationSpec(dtype=torch.int8, ...) weight_qspec = QuantizationSpec(dtype=torch.int8, ...) conv_node["quantization_annotation"] = QuantizationAnnotation( input_qspec_map={input_node: int8_qspec, weight_node: weight_qspec} output_qspec=int8_qspec, ) ``` Note: right now input_qspec_map and output_qspec map are still using observer and fake quant constructors. Follow up PR: change the input_qspec_map and output_qspec to use QuantizationSpec directly Test Plan: ``` buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Differential Revision: D45895027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101708 Approved by: https://github.com/andrewor14	2023-05-19 22:54:27 +00:00
Nitin Jain	556bb691fd	[AO]Fix observed LSTM layer setup individually observed LSTM (#101299 ) Summary: We have found that `_get_lstm_with_individually_observed_parts()` is missing setup step which sets up the LSTM layer state initializing weights and biases of this layer. This diff fixes the observed numerical discrepancy seen by CTRL team in using the above API. Test Plan: N3358643 Differential Revision: D45821681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101299 Approved by: https://github.com/andrewor14	2023-05-18 19:15:01 +00:00
andrewor14	8e51521cee	[quant][pt2] Handle maxpool + conv + bn case in prepare QAT (#100941 ) Summary: This commit fixes a bug where we copy the metadata from the wrong node after replace_pattern. This happened in the case of [maxpool -> getitem1 -> conv -> bn -> getitem2], where `getitem1` is the placeholder node fed into the fused conv + bn pattern, and we incorrectly copied the metadata from `getitem1` instead of from `getitem2`. We fix this bug by filtering out the placeholder nodes before doing the metadata copying. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_getitem_placeholder Reviewers: jerryzh168, kimishpatel Differential Revision: [D45916751](https://our.internmc.facebook.com/intern/diff/D45916751) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100941 Approved by: https://github.com/jerryzh168	2023-05-17 17:36:32 +00:00
Kimish Patel	07e759eca2	[PT2][Quant] Move to module partitioner for linear pattern quantization (#101122 ) Subgraph matcher is somewhat unreliable as the pattern can vary depending on the dimensionality of input tensor used to trace _and_ what appears before linear Differential Revision: [D45713915](https://our.internmc.facebook.com/intern/diff/D45713915/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101122 Approved by: https://github.com/jerryzh168	2023-05-17 15:47:08 +00:00
Kimish Patel	2c807a4acf	[PT2][Quant] Remove None annotations (#101120 ) None annotations are not needed anymore. Remove them. Differential Revision: [D45713917](https://our.internmc.facebook.com/intern/diff/D45713917/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101120 Approved by: https://github.com/jerryzh168	2023-05-17 14:38:34 +00:00
Angela Yi	9e023e1818	[fx] Better replacements finder in subgraph rewriter (#100556 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100556 Approved by: https://github.com/mcr229	2023-05-16 14:08:44 +00:00
andrewor14	964e61ee95	[quant][pt2] Handle no conv bias in prepare QAT fusion (#100610 ) Summary: This commit adds support for conv + BN fusion for the case where conv has no bias. Since the replacement patterns with and without conv bias are substantially different, we perform the replacement for each of these two cases separately. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_no_conv_bias Reviewers: jerryzh168, kimishpatel Differential Revision: [D45743510](https://our.internmc.facebook.com/intern/diff/D45743510) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100610 Approved by: https://github.com/jerryzh168	2023-05-16 04:05:53 +00:00
PyTorch MergeBot	13056ca229	Revert "[fx] Better replacements finder in subgraph rewriter (#100556 )" This reverts commit `9842d1ef94`. Reverted https://github.com/pytorch/pytorch/pull/100556 on behalf of https://github.com/izaitsevfb due to Reverting temporarily to unblock diff train, see D45743510 and #100610 ([comment](https://github.com/pytorch/pytorch/pull/100556#issuecomment-1548934932))	2023-05-16 03:50:06 +00:00
Angela Yi	9842d1ef94	[fx] Better replacements finder in subgraph rewriter (#100556 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100556 Approved by: https://github.com/mcr229	2023-05-15 20:00:59 +00:00
andrewor14	4434b9af6a	[quant][pt2] Handle constant conv args in prepare QAT fusion (#100525 ) Summary: Previously, we would only match and replace conv + BN patterns with default constant args for conv (stride, padding, dilation etc.). If the user sets one of these args to values that are different from the default, we would simply not fuse the pattern. This is due to a limitation in the subgraph rewriter: see https://github.com/pytorch/pytorch/issues/100419. This commit works around the above limitation by first configuring the subgraph rewriter to ignore literals when matching, and then manually copy over the constant args to the new subgraph after `replace_pattern`. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_fusion_constant_args Reviewers: jerryzh168, kimishpatel Differential Revision: [D45515437](https://our.internmc.facebook.com/intern/diff/D45515437) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100525 Approved by: https://github.com/jerryzh168	2023-05-12 19:15:47 +00:00
leslie-fang-intel	a66de845de	[Quant][PT2E]Fix pt2e quantization maxpool input observer issue (#100961 ) Summary Fix the issue https://github.com/pytorch/pytorch/issues/100959. The root cause is for node of `torch.ops.aten.max_pool2d_with_indices.default`, there are 2 output node as output tensor and max indices. So in its `node.meta["val"]` is a tuple of `FakeTensors` (For example: `'val': (FakeTensor(..., size=(1, 2, s1, s1)), FakeTensor(..., size=(1, 2, s1, s1), dtype=torch.int64))`). It will fail the check of inserting observer since which only accept one `FakeTensor` case. Test Plan ``` python -m pytest test_quantize_pt2e.py -k test_max_pool2d_quantizer ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100961 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2023-05-11 06:14:34 +00:00
Jerry Zhang	058d740f59	[reland][quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 ) (#101041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005 Previously the node annotation looks like the following: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr": ..., "weight_obs_or_fq_ctr": ..., "weight_index": 1, } ``` Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...}, } ``` This can support specifying the observer/fake quant constructor for any argument of the node Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' Differential Revision: D45719781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101041 Approved by: https://github.com/andrewor14	2023-05-10 17:43:21 +00:00
PyTorch MergeBot	2241aaa60c	Revert "[quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 )" This reverts commit `f08ddae888`. Reverted https://github.com/pytorch/pytorch/pull/101005 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/101005#issuecomment-1541143426))	2023-05-10 01:27:47 +00:00
Jerry Zhang	f08ddae888	[quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 ) Summary: Previously the node annotation looks like the following: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr": ..., "weight_obs_or_fq_ctr": ..., "weight_index": 1, } ``` Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...}, } ``` This can support specifying the observer/fake quant constructor for any argument of the node Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' Reviewed By: kimishpatel Differential Revision: D45553195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005 Approved by: https://github.com/kimishpatel	2023-05-10 00:42:25 +00:00
Jerry Zhang	c3f3cb5b0f	[quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442 ) Summary: This PR adds support for folding bn weights into conv for QAT flow, this is equivalent to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223 Items that needs followup: * there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' Reviewed By: andrewor14 Differential Revision: D45344281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442 Approved by: https://github.com/kimishpatel	2023-05-09 19:43:51 +00:00
Aaron Gokaslan	8769fb854d	[BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715 ) Enables B027 and applies fixes by adding abstract method decorators. Autofix generated by ruff master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100715 Approved by: https://github.com/ezyang	2023-05-09 17:28:48 +00:00
andrewor14	4154c8ea15	[quant][pt2] Add Conv + BN + ReLU fusion for prepare QAT (#100283 ) Summary: This follows https://github.com/pytorch/pytorch/pull/98568, which lays all the groundwork for Conv + BN fusion in prepare QAT. Conv + BN + ReLU fusion can reuse the same match and replace patterns and is handled similarly. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_fusion python test/test_quantization.py TestQuantizePT2E.test_prepare_qat_conv_bn_relu_numerics Reviewers: kimishpatel, jerryzh168 Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D45515494](https://our.internmc.facebook.com/intern/diff/D45515494) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100283 Approved by: https://github.com/jerryzh168	2023-05-07 20:35:16 +00:00
Danni Li	4a90deb137	[Doc] Add GRU new gate calculation difference (#100646 ) Summary: Add a note for the calculation difference of GRU new gate `n_t` between PyTorch and original paper. Fix: #99531 Test Plan: Please see GitHub pipelines. Differential Revision: D45579790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100646 Approved by: https://github.com/mikaylagawarecki	2023-05-05 22:18:54 +00:00
Kimish Patel	24e9b8f5f4	[PT2E][Quant] Use subgraph matcher annotate linear pattern (#100566 ) This diff adds subgraph matcher for pattern matching. Furthermore, we also move annotations for the matched subgraph in a way that only input and output nodes of the matched subgraph have quantization related valid annotations. Differential Revision: [D45535539](https://our.internmc.facebook.com/intern/diff/D45535539/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100566 Approved by: https://github.com/jerryzh168	2023-05-04 21:31:59 +00:00
Richard Barnes	6370ac0251	[codemod] Replace hasattr with getattr in caffe2/torch/ao/quantization/stubs.py (#100597 ) Summary: The pattern ``` X.Y if hasattr(X, "Y") else Z ``` can be replaced with ``` getattr(X, "Y", Z) ``` The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate. This diff is very low risk. Green tests indicate that you can safely Accept & Ship. Test Plan: Sandcastle Reviewed By: vkuzo Differential Revision: D44886422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100597 Approved by: https://github.com/Skylion007	2023-05-04 16:36:23 +00:00
Richard Barnes	6120c5842c	[codemod] Replace hasattr with getattr in caffe2/torch/ao/quantization/utils.py (#100361 ) Summary: The pattern ``` X.Y if hasattr(X, "Y") else Z ``` can be replaced with ``` getattr(X, "Y", Z) ``` The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate. This diff is very low risk. Green tests indicate that you can safely Accept & Ship. Test Plan: Sandcastle Reviewed By: jerryzh168 Differential Revision: D44886493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100361 Approved by: https://github.com/Skylion007	2023-05-04 14:46:38 +00:00
Kimish Patel	771a9debbe	[PT2E][Quant] Refactor quantizer and qnnpack qantizer code to support dqlinear config (#99399 ) This diff introduces a few refactors: - Move observer creation to utils.py. - Use quantization spec to supply args to observers. - Use annotation function registration corresponding QuantizationConfig. This will be later used in dynamic quantized linear. Differential Revision: [D45073790](https://our.internmc.facebook.com/intern/diff/D45073790/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99399 Approved by: https://github.com/jerryzh168	2023-05-03 03:23:32 +00:00
Kimish Patel	8ec0a939a2	[PT2E][Quant] Fix but in quant spec of symmetric static quant (#99398 ) Activation quant spec should have qscheme = per_tensor_affine Weights quant spec should have ch_axis=0 for per_channel_symmetric Differential Revision: [D45073789](https://our.internmc.facebook.com/intern/diff/D45073789/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99398 Approved by: https://github.com/jerryzh168	2023-05-03 00:36:03 +00:00
Max Ren	151d76cc23	[quant][pt2e] remove dropout from fx quant Differential Revision: D45250152nnPull Request resolved: https://github.com/pytorch/pytorch/pull/99935	2023-04-27 11:22:41 -07:00
andrewor14	6c550bb4d5	[quant][be] Easier way to override default in QConfigMapping (#99888 ) Summary: This commit adds a private helper function to override the default QConfig in the default QConfigMapping. Previously we needed to override all the object_types manually while skipping the fixed qparams ops. This led to duplicate code every time someone wanted a new default QConfig. After this commit, we can just call the same helper function instead. Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/99888 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2023-04-26 18:14:01 +00:00
Jerry Zhang	df3455b716	[reland][quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 ) (#99767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220 Previously we have two places we need to decide whether to insert observer or fake quantizer or not: (1) input arguments of a node (2) output of a node, and right now we have separate code to do this in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output and target_dtype and is_dynamic for the current Tensor we are looking at let's use an example for conv node: ``` conv = convolution(input, weight, bias, ...) ``` let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph (1) input arguments, e.g. `input` the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from input_node.meta["target_dtype_info"]["output_act_obs_or_fq"] the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"] similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc. (2) output for conv node the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit after we decide to deprecate fx graph mode quantization the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"] there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Imported from OSS Differential Revision: D45198323 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767 Approved by: https://github.com/kimishpatel	2023-04-25 16:53:02 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
PyTorch MergeBot	c83e1f517d	Revert "Delete tracing_mode argument to export (#99555 )" This reverts commit `e9786149ab`. Reverted https://github.com/pytorch/pytorch/pull/99555 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-24 08:21:41 +00:00
Justin Chu	79c9e82e27	Fix flake8 lint errors reported by ruff - take 2 (#99798 ) Replaces #99784. This PR is pure autofix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-04-23 23:09:51 +00:00
maxren	e63c502baa	[Executorch][XNNPACK] Quantized Max Pool 2d (#99587 ) Adding support for Quantized Max Pool 2d Additions: - Add quantized max pool 2d to executorch backend config - modify max pool node visitors to grab quant params from input/output - Add qmaxpool 2d patterns for partitioners Differential Revision: [D44977783](https://our.internmc.facebook.com/intern/diff/D44977783/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99587 Approved by: https://github.com/jerryzh168	2023-04-22 07:17:13 +00:00
maxren	a964a3dbed	[quant][pt2e] add all convs-relu fusion qat configs (#99586 ) Currently when prepare_qat_fx with executorch backend config we do not properly quantize conv or conv - relu To fix this we add all the necessary qat configs for conv and conv-relu Differential Revision: [D45135947](https://our.internmc.facebook.com/intern/diff/D45135947/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99586 Approved by: https://github.com/jerryzh168	2023-04-22 06:44:23 +00:00
maxren	c139dfd71e	[quant][pt2e] add dropout to executorch backend config (#99585 ) OD Model has a dropout layer in training, In order to match eager mode qat, we also fake quantize the drop out layer in prepare_qat_fx. To do this we add the dropout layer to the default_op_configs in which the observation type uses a different observer from its input Differential Revision: [D45095936](https://our.internmc.facebook.com/intern/diff/D45095936/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99585 Approved by: https://github.com/jerryzh168	2023-04-22 06:41:44 +00:00
PyTorch MergeBot	75e754800f	Revert "[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 )" This reverts commit `d56adb1b54`. Reverted https://github.com/pytorch/pytorch/pull/99220 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2023-04-21 18:04:21 +00:00
Jerry Zhang	d56adb1b54	[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 ) Summary: Previously we have two places we need to decide whether to insert observer or fake quantizer or not: (1) input arguments of a node (2) output of a node, and right now we have separate code to do this in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output and target_dtype and is_dynamic for the current Tensor we are looking at let's use an example for conv node: ``` conv = convolution(input, weight, bias, ...) ``` let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph (1) input arguments, e.g. `input` the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from input_node.meta["target_dtype_info"]["output_act_obs_or_fq"] the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"] similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc. (2) output for conv node the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit after we decide to deprecate fx graph mode quantization the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"] there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220 Approved by: https://github.com/kimishpatel	2023-04-21 16:58:35 +00:00
Edward Z. Yang	e9786149ab	Delete tracing_mode argument to export (#99555 ) You can have any color you want, as long as it's tracing_mode="symbolic" Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99555 Approved by: https://github.com/voznesenskym	2023-04-21 16:20:51 +00:00
andrewor14	22af604e1b	[quant][pt2] Add Conv + BN fusion for prepare QAT (#98568 ) Summary: This commit adds the `prepare_qat_pt2e` API and the fusion logic for Conv + BN. We use the subgraph rewriter to match and replace the pattern with the existing logic in `nniqat.ConvBn2d`. Note this is not the end-to-end flow yet. In particular, the convert flow needs to swap the new subgraph with another one that merges the batchnorm stats back into conv. The Conv + BN fusion is implemented in the following steps: 1. Annotate all nodes in the pattern `[conv - bn - getitem]` 2. Match and replace this pattern with the fused QAT pattern (note that this is a larger subgraph than the original one) 3. Copy over metadata from the original nodes to the corresponding nodes in the new subgraph, to ensure the stack traces and dtype annotations are preserved 4. Prepare will insert fake quantizes in the right places based on the annotations Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_fusion Reviewers: jerryzh168, kimishpatel, yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/98568 Approved by: https://github.com/kimishpatel	2023-04-20 20:15:28 +00:00
Jerry Zhang	36acad58b6	[quant][pt2e][refactor] Move the annotation for observer sharing ops into separate util (#99384 ) Summary: In order to keep quantizer simple, we want to move the annotation code for operators like flatten, hardtanh etc. to a separate utility function that is called after the quantizer annotation is done, this makes these ops (operator list) not configurable by user, and also makes prepare_pt2e operator aware instead of operator agnostic, this design is not final, we may change it in the future if we find there are use cases that need these to be configurable or if we feel it is important for prepare_pt2e to stay agnostic to operator/operator patterns Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_obs_sharing_ops Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D45071006](https://our.internmc.facebook.com/intern/diff/D45071006) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99384 Approved by: https://github.com/kimishpatel	2023-04-19 23:49:33 +00:00
Nikita Shulga	8a89eec2f8	[BE] Do not use unicode quotes (#99446 ) They are mostly used in commented code examples, but even Python-3.12 does not recognize `“foobar”` as valid string literal I.e. just `s/[“”]/"/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99446 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-04-18 22:59:56 +00:00
Kimish Patel	c0be06667f	[PT2E][Quant] Support for embedding op quantization via ExecuTorchNativeQuantizer (#99106) ExecuTorchNativeQuantizer ExecuTorchNativeQuantizer is a terribly name, I admit, however lets fix it once we align on what the quantized kernel lib within executorch runtime should be called Differential Revision: [D44986258](https://our.internmc.facebook.com/intern/diff/D44986258/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44986258/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/99106 Approved by: https://github.com/jerryzh168	2023-04-18 16:59:37 +00:00
maxren	80eab63587	[Quant][pt2e] torch.mean and ReLU6 (#98984 ) Add nn.Module ReLU6 in addition to functional relu6. Also add torch .mean to quantization config Differential Revision: [D44901038](https://our.internmc.facebook.com/intern/diff/D44901038/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98984 Approved by: https://github.com/jerryzh168	2023-04-17 18:33:04 +00:00
maxren	444a9769ae	[quant][pt2e] QAT Linear (#98897 ) Differential Revision: [D44901039](https://our.internmc.facebook.com/intern/diff/D44901039/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98897 Approved by: https://github.com/tiandiao123, https://github.com/manuelcandales	2023-04-17 18:27:39 +00:00
maxren	568935caca	[quant][pt2e] QAT conv + bn + relu (#98896 ) Differential Revision: [D44901040](https://our.internmc.facebook.com/intern/diff/D44901040/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98896 Approved by: https://github.com/manuelcandales	2023-04-17 18:24:08 +00:00
Kimish Patel	cdab6c8df9	[PT2E][Quant] Support specifying None for obs_or_fq_ctr in target_dtype_info (#99071 ) It is cleaner for quantizer to say what does not need observation instead of putting fp32 observers. This diff add support for that by checking if target_dtype_info contains none for specific observers and if so skip inserting observers for those. Differential Revision: [D44971357](https://our.internmc.facebook.com/intern/diff/D44971357/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99071 Approved by: https://github.com/jerryzh168	2023-04-17 16:37:16 +00:00
Kimish Patel	36a95625da	[PT2E][Quant][BE] Refactor observer code (#99066 ) Combine per channel and per tensor observer code Differential Revision: [D44918494](https://our.internmc.facebook.com/intern/diff/D44918494/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99066 Approved by: https://github.com/jerryzh168	2023-04-17 16:17:36 +00:00
Kimish Patel	31f311a816	[PT2E][Quantization] Refactor Quantizer and QNNPACKQuantizer (#99063 ) This diff renames quantization spec/config and operator config. It moves these datastructures to base quantizer. Base quantizer API now has get_supported_operators that returns list of patterns that a quantizer quantizes. There are two choices being debated for how to convey to user what a particular quantizer will quantize. 1. Modules. We just convey what nn.Modules will be quantized. Of course that does not mean that equivalent functional variants wont be quantized, however for simplifity we just use nn.Module. If certain ops are quatnzied in fused manner then that will considered internal details. Pros and cons of this approach pros: - Simple. Only nn Modules are listed. - User does not have to see fusion patterns. Cons: - confusing perhaps because it is not clear if supported = nn.Conv2d also means that the quantizer supported functional.conv2d - Hiding fusion pattern means user has no say in not fusing. Meaning if conv2d + relu is fused and user configures to quantize only conv, quantizer will also quantize the following relu as if conv2d + relu are fused. 2. Patterns. Be explicit about what is supported and enumerate all possible compbinations. Pros: - it is very clear what quantizer will do. no surprises. Cons: - It is not simple to parse. - It can be argued taht fusion is internal detail of the quantizer. So some quantizer implementation may chose to expose fusion patterns, while others may not and may not even provide any configurability. One option is to move set_supported_operators/modules out of base quantizer and let each quantizer define its own way of communicating what is supported. Issue with this is that when we want to "Compose" multiple quantizers there is no way for user to define the order of composition if user does not know what a quantizer supports. For exampl quantizer A may quantizer conv + relu while B only conv, but B's implementation is fast. In that case you may compose (B, A) such B quantizes conv and A quantizes relu. Not knowning what A and B support, makes such composition harder Differential Revision: [D44895547](https://our.internmc.facebook.com/intern/diff/D44895547/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44895547/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/99063 Approved by: https://github.com/jerryzh168	2023-04-17 00:34:18 +00:00
Aaron Gokaslan	85f38b8a33	[BE] Update flake8-comprehensions and adapt to rule C418 (#99178 ) Applies rule C418 and fixes all instances of it. Also updates flake8-comprehension Pull Request resolved: https://github.com/pytorch/pytorch/pull/99178 Approved by: https://github.com/ezyang	2023-04-15 15:33:42 +00:00
Sudarshan Raghunathan	e45fa1a581	Back out "[core][pruning][be] rename BaseSparsifier to BasePruner (#98747 )" (#99171 ) Summary: Back out D44856390 since renaming the type breaks backwards compatibility of existing models used in integration tests and likely in prod as well. Test Plan: buck2 run //aiplatform/modelstore/model_generation/integration_tests:cogwheel_igr_tab_offline_and_recurring_model_generation_v1_api_test-launcher -- --build-fbpkg --run-disabled --run-harness-in-tupperware Now fails with an OOM: https://www.internalfb.com/servicelab/experiment/100000000259121/trial/100000000331723/run It was failing with an import error without this revert. Differential Revision: D44991351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99171 Approved by: https://github.com/izaitsevfb, https://github.com/osalpekar	2023-04-15 00:37:45 +00:00
Jerry Zhang	09ebdf44fa	[quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903 ) Summary: Fixed quant_min/quant_max for per channel quantized weight for reference quantized module in decomposed mode, this bug is triggered while onboard an internal model Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx_per_channel_quant_module Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98903 Approved by: https://github.com/andrewor14	2023-04-13 21:55:45 +00:00
PyTorch MergeBot	dda7ce4bb3	Revert "[core][pruning][be] Rename sparsifier folder to pruner (#98758 )" This reverts commit `778fd1922a`. Reverted https://github.com/pytorch/pytorch/pull/98758 on behalf of https://github.com/jcaip due to https://www.internalfb.com/diff/D44905951 need to fix broken import in fbcode	2023-04-13 16:30:47 +00:00
Jerry Zhang	6a568779b6	[quant][pt2e][improvement] Remove the need to annotate all nodes with default annotation (#99001 ) Summary: This PR changes prepare to use some default observer/fq constructor when "target_dtype_info" is not set, this allows user to not initialize all nodes to default observer/fq constructor. Note we may still need to annotate intermediate node after this PR, there will be a follow up PR to allow users to only annotate things they want to quantize Test Plan: python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/99001 Approved by: https://github.com/kimishpatel, https://github.com/andrewor14	2023-04-13 09:31:51 +00:00
PyTorch MergeBot	46a31e9bab	Revert "[quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903 )" This reverts commit `a2e809f29b`. Reverted https://github.com/pytorch/pytorch/pull/98903 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks Windows tests on trunk `a2e809f29b`	2023-04-13 01:58:27 +00:00
Jerry Zhang	a2e809f29b	[quant][pt2e] Fix a bug in reference quantized module (decomposed mode) (#98903 ) Summary: Fixed quant_min/quant_max for per channel quantized weight for reference quantized module in decomposed mode, this bug is triggered while onboard an internal model Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx_per_channel_quant_module Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98903 Approved by: https://github.com/andrewor14	2023-04-12 22:35:24 +00:00
Wyatt Borsos	6361c3debc	Return zero_point from determine_qparams as a int64 (#98746 ) Summary: In some cases, zero_point is returned as an int tensor. We want it to be a long. This fixes a failed assertion in Executorch op_choose_qparams: https://www.internalfb.com/code/fbsource/[4609e7dbbf2e]/fbcode/executorch/kernels/quantized/cpu/op_choose_qparams.cpp?lines=49-52 Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D44764070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98746 Approved by: https://github.com/jerryzh168	2023-04-11 19:01:05 +00:00
Jesse Cai	778fd1922a	[core][pruning][be] Rename sparsifier folder to pruner (#98758 ) Summary: att Test Plan: ``` python test/test_ao_sparsity.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98758 Approved by: https://github.com/jerryzh168	2023-04-11 17:26:29 +00:00
Kazuaki Ishizaki	a13a63ae9a	Fix typos under torch/ao directory (#97679 ) This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-04-10 22:25:15 +00:00
Jesse Cai	4584851da5	[core][pruning][be] rename BaseSparsifier to BasePruner (#98747 ) Summary: att Test Plan: `python test/test_ao_sparsity.py -- TestBasePruner` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98747 Approved by: https://github.com/jerryzh168	2023-04-10 21:25:19 +00:00
Edward Z. Yang	b09722f540	Convert logging f-strings to use % format, part two (#98700 ) This hits multi-line logging strings Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Jerry Zhang	c5269ad6c6	[quant][pt2e] Add support for a few ops in QNNPackQuantizer to enable quantizing internal model (#98560 ) Summary: This PR adds support for adaptive_avg_pool2d (traced as mean.dim), mean and hardtanh to QNNPackQuantizer Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_obs_sharing_ops Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98560 Approved by: https://github.com/andrewor14	2023-04-07 19:26:45 +00:00
maxren	483fd3351a	[Quant] Add get_symmetric_qnnpack_qat_qconfig_mapping (#98569 ) Differential Revision: [D44776230](https://our.internmc.facebook.com/intern/diff/D44776230/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98569 Approved by: https://github.com/andrewor14	2023-04-07 17:57:56 +00:00
Jerry Zhang	616f50da3a	[quant][pt2e] QNNPackQuantizer support annotation for resnet18 (#98507 ) Summary: This PR adds annotation support for conv2d relu, linear, maxpool2d, add and add relu so that we can successfully quantize resnet18 with the prepare_pt2e_quantizer API and get the same result as fx graph mode quantization Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_resnet18_with_quantizer_api Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98507 Approved by: https://github.com/vkuzo	2023-04-07 04:27:21 +00:00
Kazuaki Ishizaki	482f87a7bc	[quantized] Fix return values of _get_name() in quantized ConvTranspose (#97678 ) This PR fixes incorrect return values of _get_name() in quantized `ConvTranspose?d`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97678 Approved by: https://github.com/vkuzo, https://github.com/kit1980	2023-04-07 01:14:42 +00:00
Jerry Zhang	3142ce208f	[quant][pt2e] Support quantizer API in prepare_pt2e_quantizer (#97994 ) Summary: This PR added a quantizer API to prepare_pt2e_quantizer, which enables user to annotate the nodes in the graph directly to configure quantization, instead of relying on QConfigMapping, please see test cases in test_quantize_pt2e.py for examples. Also added a prototype for QNNPackQuantizer, that will be modified later to fully support different quantization capabilities of QNNPack/XNNPack The goal for introducing quantizer is to add flexibility to the quantization API to allow modeling users and backend developers to express their quantization intentions programmably, which will free architecture optimization team from supporting different use cases in the core API in the future, as a concrete example, we used to have https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping as the API for users to express their intent for quantization in fx graph mode quantization, and it has some fancy options like `set_module_name_regex` and `set_module_name_object_type_order`, this is not needed for all backends and adds burden of maintenance to AO team, in the quantizer API we will move these APIs to a backend specific `Quantizer` that needs this feature, and all the backends or even advanced modeling users can implement their own quantizer to express their intent for quantization through annotating the nodes, for example, to express the quantization intention of quantizing a convolution node, a user will find the convolution node in the graph and do: ``` operator_spec = qnnpack_quantizer.get_default_per_channel_symmetric_qnnpack_operator_spec() conv_node.meta["target_dtype_info"] = { "input_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec), "weight_obs_or_fq_ctr": _get_weight_obs_or_fq_ctr(operator_spec) "bias_obs_or_fq_ctr": _get_bias_obs_or_fq_ctr(operator_spec), "output_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec), # TODO: validation of weight_index must be set if weight_obs_or_fq_ctr is set "weight_index": 1, # TODO: validation of bias_index must be set if bias_obs_or_fq_ctr is set "bias_index": 2, } ``` each backend will introduce their own quantizer, e.g. QNNPackQuantizer, which may expose more convenient APIs for modeling users to configure the annotation, and different quantizer can compose with each other to annotate the graph correctly for quantization. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_simple_quantizer python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_conv Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97994 Approved by: https://github.com/vkuzo	2023-04-06 11:34:10 +00:00
Jerry Zhang	a76114832a	[quant][pt2e][fix] Fix the internal test failures caused by refactor (#98378 ) Summary: att, this PR removes some incorrect assumptions from `_maybe_insert_observers_before_graph_output` Test Plan: internal test Differential Revision: D44697212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98378 Approved by: https://github.com/andrewor14	2023-04-05 23:27:34 +00:00
Jesse Cai	93063768da	[pruning][core][feature] Implement convert for pruner (#97545 ) Summary: This PR implements `BaseSparsifier.convert()`, which performs module swapping. The modules and mappings will be merged in a future PR. Test Plan: `python test/test_ao_sparsity.py -- TestBaseSparsifier.test_convert` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97545 Approved by: https://github.com/jerryzh168	2023-04-05 16:57:11 +00:00
Tugsbayasgalan Manlaibaatar	75ac6fdcdd	Propogate dynamo shape_env to make_fx (#96437 ) Currently, when we use assume_static_by_default flag, dynamo won't produce any symbols for input tensors. But when we pass the dynamo generated graph onto make_fx via torchdynamo.export(aten_graph=True), there is no way to pass this flag. We enable this by directly passing the fake tensors dynamo used to make_fx and call make_fx with "real" mode with fake tensors from dynamo. Note that this is modified version of (https://github.com/pytorch/pytorch/pull/96143) Differential Revision: [D44561753](https://our.internmc.facebook.com/intern/diff/D44561753) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96437 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-04-04 20:37:30 +00:00
Jerry Zhang	b109083098	[quant][pt2e][refactor] Remove `backend_config` from `_maybe_insert_input_observers_for_node` (#98094 ) Summary: The goal is to remove the need to use backend_config when pt2e flow code call this function Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98094 Approved by: https://github.com/jcaip	2023-04-04 03:18:24 +00:00
Jerry Zhang	553bb01df9	[quant][pt2e][refactor] Remove extra arguments of _maybe_insert_observers_before_graph_output (#98029 ) Summary: This PR allows _maybe_insert_observers_before_graph_output to be reused by pt2e flow Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98029 Approved by: https://github.com/vkuzo	2023-04-01 05:38:36 +00:00
Jerry Zhang	7dde61ce46	[quant][pt2e][refactor] Remove extra arguments of `_maybe_insert_output_observer_for_node` (#97959 ) Summary: The goal is for this function to be reused by the pt2e flow Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97959 Approved by: https://github.com/andrewor14	2023-03-31 23:59:43 +00:00
Jesse Cai	d158545b16	[pruning] Add gelu to list of supported activation functions (#95618 ) Summary: This PR adds nn.GELU and F.gelu respectively to the list of suppported activation functions Test Plan: ``` python test/test_ao_sparsity.py -- TestBaseSparsifier ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95618 Approved by: https://github.com/andrewor14	2023-03-31 19:55:12 +00:00
Jerry Zhang	1c21cd2213	[quant][pt2e][refactor] Add input_output_share_observers to node.meta["target_dtype_info"] (#97949 ) Summary: The goal for this PR is to unify the flow of information to reduce fragmentation of implementations between fx graph mode quantization and quantize_pt2e, since quantize_pt2e will be using node.meta to store this information, we'd like to make sure fx graph mode quantization get this information from the same place Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97949 Approved by: https://github.com/andrewor14	2023-03-31 15:54:19 +00:00
Xia, Weiwen	e073979794	[Quant][FX] Add test case for lowering conv_transpose with kwargs (#97311 ) Summary As the title Test plan python test/test_quantization.py -k test_lowering_functional_conv_transpose_with_kwargs Pull Request resolved: https://github.com/pytorch/pytorch/pull/97311 Approved by: https://github.com/jerryzh168	2023-03-31 10:39:29 +00:00
Xia, Weiwen	e61b842001	[Quant][FX] lower functional conv_transpose ops (#97126 ) Summary Support quantizing and lowering functional `conv_transpose1d`, `conv_transpose2d` and `conv_transpose3d`. Please note that - `conv_tranpose + relu` fusion is not supported. Remember to keep `relu` node in graph when lowering. - `conv_tranpose` requires `per-tensor` scheme for weight. Use default `qconfig_mappings` instead of deprecated `qconfig_dict` for test cases. Test plan python test/test_quantization.py -k test_conv_transpose_not_reference python test/test_quantization.py -k test_conv_transpose_reference python test/test_quantization.py -k test_conv_transpose_relu_not_reference python test/test_quantization.py -k test_conv_transpose_relu_reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/97126 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-03-31 07:17:29 +00:00
maxren	3a5ca4bdd4	[quant][pt2e] Add support for conv bn fusion in et backend config (#97389 ) Batch Norm was supported by XNNPACK via fusion with the preceding convolution op. We do the same here by fusing across q -> dq nodes. We must update the original pass in order to fuse convolution weight/bias with batch norm parameters, this way quantization is supported for batch norm Differential Revision: [D43976324](https://our.internmc.facebook.com/intern/diff/D43976324/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97389 Approved by: https://github.com/salilsdesai	2023-03-31 05:33:42 +00:00
maxren	fe2bdfb2cd	[Executorch][XNNPACK] Quantized mean (#97388 ) Support Quantized Mean.dim for xnnpack Adding another pattern for Quantized Partitioner and test to ensure quantized operator works Differential Revision: [D43915706](https://our.internmc.facebook.com/intern/diff/D43915706/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97388 Approved by: https://github.com/salilsdesai	2023-03-31 05:08:53 +00:00
Jerry Zhang	f78b44b2d9	[quant][pt2e][refactor] Refactor prepare to remove the use of qconfig in `_maybe_insert_input_observer_for_arg_or_kwarg` (#97948 ) Summary: The goal is for this function to be reused by quantize_pt2e Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D44558929](https://our.internmc.facebook.com/intern/diff/D44558929) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97948 Approved by: https://github.com/andrewor14	2023-03-31 05:07:58 +00:00
maxren	f9ca48ddb5	[Executorch][XNNPACK] Quantized hardtanh (#97387 ) Lower Quantized Hardtanh to XNNPACK Also add symmetric quantization support for hardtanh in executorch backend config Differential Revision: [D43901222](https://our.internmc.facebook.com/intern/diff/D43901222/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97387 Approved by: https://github.com/salilsdesai	2023-03-31 04:58:24 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Jerry Zhang	15271d353a	[quant][pt2e] Support convtranspose + bn fusion (#97933 ) Summary: This PR extends `_fuse_conv_bn_` function to support fusing convtranspose and bn Test Plan: python test/test_quantization.py TestQuantizePT2E.test_transposed_conv_bn_fusion Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97933 Approved by: https://github.com/vkuzo	2023-03-30 07:02:39 +00:00
PyTorch MergeBot	8e5c5d2023	Revert "Propogate dynamo shape_env to make_fx (#96437 )" This reverts commit `3a22916c7a`. Reverted https://github.com/pytorch/pytorch/pull/96437 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2023-03-29 23:47:59 +00:00
Tugsbayasgalan Manlaibaatar	3a22916c7a	Propogate dynamo shape_env to make_fx (#96437 ) Currently, when we use assume_static_by_default flag, dynamo won't produce any symbols for input tensors. But when we pass the dynamo generated graph onto make_fx via torchdynamo.export(aten_graph=True), there is no way to pass this flag. We enable this by directly passing the fake tensors dynamo used to make_fx and call make_fx with "real" mode with fake tensors from dynamo. Note that this is modified version of (https://github.com/pytorch/pytorch/pull/96143) Differential Revision: [D43994693](https://our.internmc.facebook.com/intern/diff/D43994693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96437 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-03-29 22:34:37 +00:00
Aaron Gokaslan	597b558c51	[BE]: Update flake8 and plugins and fix bugs (#97795 ) Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795 Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang	2023-03-28 23:51:55 +00:00
Xia, Weiwen	08766b23de	[Quant][FX] lower ConvTranspose3d (#97125 ) Summary Enable quantization and lowering of `ConvTranspose3d`. Add test cases for `ConvTranspose1d`, `ConvTranspose2d` and `ConvTranspose3d` since there were no such test cases. Test plan python test/test_quantization.py -k test_conv_transpose_not_reference python test/test_quantization.py -k test_conv_transpose_reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/97125 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-03-28 11:58:29 +00:00
leslie-fang-intel	a6d8c70933	Init quantization backend config for inductor (#96476 ) Summary Init the backend config file with quantization recipes for quantization 2.0 inductor path. In this PR, we only init the recipe for `convolution` and `convolution_relu`. Test Plan ``` clear && python -m pytest test_quantization.py -k test_inductor_backend_config_conv ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96476 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jerryzh168	2023-03-22 07:56:56 +00:00
Xia, Weiwen	e8be6d813b	[Quant][FX] Fix issue of lowering weighted functional ops with kwargs (#95865 ) Fixes #95492 Summary This PR fixes the issue that weighted functional ops with kwargs are not lowered correctly since kwargs are ignored. These kwargs should be moved from the functional op to its cooresponding prepack op, e.g., from `F.conv2d` to `quantized.conv2d_prepack`. Test plan python test/test_quantization.py -k test_lowering_functional_conv_with_kwargs python test/test_quantization.py -k test_lowering_functional_conv_transpose_with_kwargs python test/test_quantization.py -k test_lowering_functional_linear_with_kwargs Pull Request resolved: https://github.com/pytorch/pytorch/pull/95865 Approved by: https://github.com/jgong5, https://github.com/supriyar	2023-03-21 05:29:03 +00:00
Nitin Jain	40df3b41aa	[AO] Update qLSTM implementation to remove unsupported backend ops (#96436 ) Summary: The reference quantized LSTM implementation uses unbind and inplace squeeze both of which are not supported when building BoltNN's Espresso IR graph. This change adjusts the reference AO Quantizable LSTM implementation without affecting numerically while enabling removal of unsupported ops in BoltNN. Modifications & Adjustments 1. Unbind ops appear when unstacking tensor in loop. Replaced this by getting first dim from shape and looping using ranged index. 2. Removed unbind ops call where the pattern is `[x = t.unbind(0) -> x[i]]` can be just replaced by `t[i]` as creating a tuple from unbind is unnecessary. 3. inplace squeeze `squeeze_` uses which were not required has been replaced by `squeeze`. See notebook N3235193 which was used for testing quantization flow and inspect the torch scripted quantized model for the set of ops used(See last cell). Test Plan: N3235193 Reviewed By: andrewor14 Differential Revision: D43935389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96436 Approved by: https://github.com/andrewor14	2023-03-14 17:58:34 +00:00
andrewor14	ca7e53324f	[Quant][fx] Remove unused is_qat args in prepare_fx (#96631 ) Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: vkuzo, jcaip Subscribers: vkuzo, jcaip Pull Request resolved: https://github.com/pytorch/pytorch/pull/96631 Approved by: https://github.com/vkuzo	2023-03-14 00:33:18 +00:00
chenxujun	6a492908cc	Update conv_fused.py (#95551 ) Fix typos in conv_fused.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/95551 Approved by: https://github.com/Skylion007, https://github.com/kit1980, https://github.com/malfet	2023-03-13 23:42:34 +00:00
yiliu30	2ea0cb1207	Fix the typo for the docstring of args in the observer (#95887 ) This PR fixes the typo in `torch.ao.quantization.observer.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95887 Approved by: https://github.com/kit1980	2023-03-13 23:03:57 +00:00
Vasiliy Kuznetsov	cdab1d676c	pt2e short term quant: respect qmin/qmax for linear weight (#96232 ) Summary: Makes the `nnqr.Linear` module respect the qmin/qmax attributes of weight observer. This is to unblock some customer teams who are depending on non-default values of these attributes. Test plan: ``` python test/test_quantization.py -k TestReferenceQuantizedModule.test_linear_decomposed ``` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96232 Approved by: https://github.com/andrewor14	2023-03-10 04:46:20 +00:00
andrewor14	faa4cb29b2	[Quant][fx] Create new FX-based LSTM reference module (#96343 ) Summary: The previous LSTM reference module implementation did not handle dtypes other than quint8 correctly. This is because the internal LSTM custom module quantization used eager mode, which did not insert the q-dq ops properly. E.g., we want the following reference quantized model: ``` [dq -> linear1_fp32 -> q_to_qint32] -> dq -> q_to_quint8 -> [dq - linear2_fp32 -> q_to_quint8] -> dq -> ... ``` This requires two sets of `q - dq` pairs between two adjacent ops that have different dtypes (linear1 and linear2). However, these `q - dq` pairs were not inserted in the old flow, because eager mode required users to insert Quant/DeQuantStubs manually. This commit changes the internal LSTM custom module quantization to use FX graph mode quantization, which automatically inserts the `q - dq` ops that convert the dtypes between adjacent ops correctly. However, using FX graph mode quantization here comes with its own set of challenges that required some hacks to get the end-to-end flow to work. These hacks are detailed in the comments in the util functions. Test Plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams This commit also updates the corresponding test to verify the dtypes as well as the qparams in the reference quantized graph. This test case should serve as an example for users to set up their own LSTM reference module flows. Reviewers: vkuzo, supriyar, jcaip Subscribers: vkuzo, supriyar, jcaip Pull Request resolved: https://github.com/pytorch/pytorch/pull/96343 Approved by: https://github.com/vkuzo	2023-03-09 23:23:48 +00:00
Jiaxu Zhu	08fb13db65	[Quant] Add lowering for pixel_unshuffle/narrow (#96160 ) Summary: ## Summary torch.nn.functional.pixel_unshuffle and torch.narrow accepts both float and quantized inputs. However, previously we would unnecessarily dequantize quantized inputs into floats before passing them to the function. This commit fixes this by lowering the pattern [dequant - pixel_unshuffle - quant]. [dequant - narrow - quant]. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle ``` ``` python test/test_quantization.py TestQuantizeFxOps.test_narrow ``` Differential Revision: D43858199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96160 Approved by: https://github.com/andrewor14	2023-03-08 05:25:03 +00:00
Xia, Weiwen	f3c25cd348	[Quant][PT2.0] fix issues for rearranging weight observer for decomposed linear (#94296 ) Summary Linear is decomposed to `t - addmm/mm` after `dynamo.export`. And weight's observer is inserted between `t` and `addmm/mm` in the first place. `_rearrange_weight_observer_for_addmm()` is then called to move the observer between weight and `t`. ``` before: weight - t - observer \ input - observer - addmm/mm after: weight - observer - t \ input - observer - addmm/mm ``` We found two issues of `_rearrange_weight_observer_for_addmm()`: - It does not call `m.recompile()` in the end, so it does not function correctly. - It does not support `aten.mm.default` which is from decomposed linear without bias. This PR fixes the two issues and renames the function to `_rearrange_weight_observer_for_decomposed_linear`. Test plan python test/test_quantization.py -k test_rearrange_weight_observer_for_decomposed_linear Pull Request resolved: https://github.com/pytorch/pytorch/pull/94296 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-03-03 15:54:11 +00:00
Kazuaki Ishizaki	b3d8fae042	Fix typos in documents under torch directory (#95709 ) This PR fixes typo in `.md` files under `torch` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95709 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-03-01 23:43:35 +00:00
Xuehai Pan	ef731cdaf0	[2/3] Update `.pyi` Python stub files: Prettify `rnn.py` by using type annotated `NamedTuple` (#95267 ) Changes: - #95200 1. Recognize `.py.in` and `.pyi.in` files as Python in VS Code for a better development experience. 2. Fix deep setting merge in `tools/vscode_settings.py`. - => this PR: #95267 3. Use `Namedtuple` rather than `namedtuple + __annotations__` for `torch.nn.utils.rnn.PackedSequence_`: `namedtuple + __annotations__`: ```python PackedSequence_ = namedtuple('PackedSequence_', ['data', 'batch_sizes', 'sorted_indices', 'unsorted_indices']) # type annotation for PackedSequence_ to make it compatible with TorchScript PackedSequence_.__annotations__ = {'data': torch.Tensor, 'batch_sizes': torch.Tensor, 'sorted_indices': Optional[torch.Tensor], 'unsorted_indices': Optional[torch.Tensor]} ``` `Namedtuple`: Python 3.6+ ```python class PackedSequence_(NamedTuple): data: torch.Tensor batch_sizes: torch.Tensor sorted_indices: Optional[torch.Tensor] unsorted_indices: Optional[torch.Tensor] ``` - #95268 4. Sort import statements and remove unnecessary imports in `.pyi`, `.pyi.in` files. 5. Format `.pyi`, `.pyi.in` files and remove unnecessary ellipsis `...` in type stubs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95267 Approved by: https://github.com/janeyx99	2023-03-01 19:37:23 +00:00
Kevin Zheng (FRL)	f1dbfe2f2a	[ao][fx] Enable observed -> quantized float for static quantized MultiheadAttention (#95636 ) Test Plan: Sandcastle cc andrewor14 any suggestions here? Differential Revision: D43631794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95636 Approved by: https://github.com/andrewor14	2023-02-28 20:50:19 +00:00
Jacob Szwejbka	fc324d3485	[quant][pt2e] Add support for dynamic quantization with symmetric quant for input (#94854 ) Summary: Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization for the input in dynamic quantization Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic" Reviewed By: digantdesai Differential Revision: D43134794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854 Approved by: https://github.com/digantdesai	2023-02-28 19:39:31 +00:00
andrewor14	a3b505c55e	[Quant] Fix setting fixed qparams for inner LSTM ops (#95537 ) Summary: The existing util function did not quantize all inner ops in the quantizable LSTM module, resulting in the error "Could not run X with arguments from the 'QuantizedCPU' backend." This commit fixes this by ensuring that all the other ops whose qparams were not specifically configured are still quantized as before, as in `torch.ao.nn.quantizable.LSTM.from_float`. Test Plan: This commit also adds an additional check in the test to ensure that the final converted model is in fact quantized, in addition to just checking the qparams in the observers have the right values. python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams Reviewers: vkuzo Subscribers: vkuzo, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/95537 Approved by: https://github.com/vkuzo	2023-02-27 19:08:51 +00:00
Kazuaki Ishizaki	31ce32b03d	Fix typos in documents under torch (#95597 ) This PR fixes typos of documents in `.md` files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95597 Approved by: https://github.com/ezyang	2023-02-27 19:07:47 +00:00
leslie-fang-intel	d89bfa16e7	[quant] add serialization method for quantized hardswish (#94486 ) Summary Fix the issue: https://github.com/pytorch/pytorch/issues/91877. The root cause is serialization and deserialization method for `state_dict` does not enable for `QuantizedHardswish`. Added these methods in this PR. Test plan ``` python -m pytest quantization/core/test_quantized_module.py -k test_hard_swish ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94486 Approved by: https://github.com/jgong5, https://github.com/vkuzo	2023-02-24 04:43:27 +00:00
Jesse Cai	cba8b12fa7	[quant][bug fix] Fix qrange_len in `torch.ao.quantization.utils.py` (#95297 ) Summary: It looks like there is a typo and qrange_len should be 2^32 instead of 2^31, as it is currently set. Test Plan: ``` python test/test_quantization.py TestObserver.test_per_tensor_observers ``` Reviewers: Subscribers: Tasks: https://github.com/pytorch/pytorch/issues/95295 Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/95297 Approved by: https://github.com/vkuzo	2023-02-23 20:23:45 +00:00
Sergii Dymchenko	f98733e976	Fix disbale typos (#95322 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95322 Approved by: https://github.com/clee2000	2023-02-23 02:08:45 +00:00
ydwu4	4d753b5045	[WIP][dynamo] simplify module_key creation logic (#94945 ) After some thoughts, I find it difficult to come up with a robust naming convention that satisfies the following constraints at the same time: 1. the new name should be a valid nn.Moule attribute (as required by minifier and it's a good thing to have in general) 2. it can cover various cases such as GetItemSource, GetAttrSource 3. it's easy to recover the original path 4. robust to users' naming scheme. Thanks to @yanboliang for pointing out the original access path is preserved in Source, now we just need to add an additonal value source.name() to node.meta["nn_module_stack"] to get the access path in original module. We also address some TODO in quantization, which relies on the original naming convention in nn_module_stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94945 Approved by: https://github.com/jansel, https://github.com/yanboliang	2023-02-20 07:28:04 +00:00
andrewor14	4fc277c338	[Quant] Add lowering for pixel_shuffle (#94769 ) Summary: `torch.nn.functional.pixel_shuffle` accepts both float and quantized inputs. However, previously we would unnecessarily dequantize quantized inputs into floats before passing them to the function. This commit fixes this by lowering the pattern [dequant - pixel_shuffle - quant]. Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle Reviewers: vkuzo Subscribers: vkuzo, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/94769 Approved by: https://github.com/vkuzo	2023-02-17 23:11:17 +00:00
PyTorch MergeBot	641dc0b844	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `782e4f5c02`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv	2023-02-13 09:20:37 +00:00
Jacob Szwejbka	2628901033	[Executorch][Quant] Add Choose_qparams_symmetric (#94685 ) Summary: needed for symmetric dynamic quant flow Test Plan: todo Reviewed By: jerryzh168 Differential Revision: D43134117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94685 Approved by: https://github.com/larryliu0820	2023-02-13 07:27:48 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Jerry Zhang	782e4f5c02	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-10 01:40:12 +00:00
Jerry Zhang	2394e6baa9	[quant][fx] Change prepare_fx and convert_fx to preserve the GraphModule type of input (#94412 ) Summary: Previously prepare_fx returns an ObservedGraphModule and convert_fx returns a QuantizedGraphModule, this is to preserve the attributes since torch.fx.GraphModule did not preserve them, after https://github.com/pytorch/pytorch/pull/92062 we are preserving `model.meta`, so we can store the attributes in model.meta now to preserve them. With this, we don't need to create a new type of GraphModule in these functions and can use GraphModule directly, this is useful for quantization in pytorch 2.0 flow, if other transformations are using GraphModule as well, the quantization passes will be composable with them Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E Imported from OSS Differential Revision: D42979722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94412 Approved by: https://github.com/vkuzo	2023-02-09 23:03:23 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Jacob Szwejbka	bb48d90b00	[Executorch][Quant][BE] Refactor Choose_Qparams (#94338 ) Summary: Refactor so that it can be decomposed Test Plan: ci Differential Revision: D42681268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94338 Approved by: https://github.com/jerryzh168	2023-02-09 01:20:17 +00:00
Aaron Gokaslan	1e2d82b8e4	[BE] Merge isinstance calls together (#94419 ) Simplify and speeds up isinstance calls by checking for multiple types at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419 Approved by: https://github.com/ezyang	2023-02-09 00:47:26 +00:00
PyTorch MergeBot	3a5a762443	Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312 )" This reverts commit `3fd46a2f9c`. Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace `3fd46a2f9c`. Please rebase and re-land it	2023-02-08 18:29:10 +00:00
Jerry Zhang	3fd46a2f9c	[quant] Add quantize and dequantize operators to decomposition table (#93312 ) Summary: This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize operators, which can be expressed more precises in terms of underlying aten operators Note: this PR just adds them to the decomposition table, we haven't enable this by default yet Test Plan: python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312 Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad	2023-02-08 17:26:01 +00:00
Jerry Zhang	cd057390b5	[quant][fx][pt2e] cleanup the args for some helper functions (#94352 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/94352 Approved by: https://github.com/vkuzo	2023-02-08 08:39:21 +00:00
Aaron Gokaslan	3ce1ebb6fb	Apply some safe comprehension optimizations (#94323 ) Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323 Approved by: https://github.com/albanD	2023-02-07 23:53:46 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Jerry Zhang	59c1b5025f	[quant][fx][pt2e] Refactor prepare so it's aligned better with the new API plan in pt2e (#94011 ) Summary: There are three things that happens in the current prepare code, (1). user express their intention of how they want the model to be quantized with QConfigMapping, we translate that to node.meta["target_dtype_info"] (2). we validate the setting against BackendConfig (3). insert observers based on the validated node.meta["target_dtype_info"] previously (2) and (3) are mixed together, this PR tries to move (2) closer to (1), with one edge case left, this refactor moves us closer to our target design for quantization in pytorch 2.0 export path this is a follow up PR for https://github.com/pytorch/pytorch/pull/92641 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/94011 Approved by: https://github.com/vkuzo	2023-02-07 08:23:56 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
Vasiliy Kuznetsov	f84f89b1c3	ns: add compare_weights API with a single model (#92058 ) Summary: Adds a compare weights NS API using a single model. Note: this is not intended for wide usage, so testing is limited to specific functions our customers care about. The main reason for adding this is because existing customers of NS are using the old `compare_weights` API, and we'd like to move everyone to a single-model API style. Once all the customers are moved over, we can delete all the old NS code. Test plan: ``` python test/test_quantization.py -k NShadows.test_extract_weights_linear ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92058 Approved by: https://github.com/jerryzh168	2023-02-03 01:17:19 +00:00
Vasiliy Kuznetsov	660bea10ba	add add_loggers implementation using PNP (#91639 ) Summary: This PR reimplements the old `add_loggers(name_a, model_a, name_b, model_b)` API in a single-model API style, similar to PNP. This allows for memory efficiency savings of not having to load two models. Test plan: ``` python test/test_quantization.py -k NShadows.test_add_loggers ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91639 Approved by: https://github.com/jerryzh168	2023-02-03 01:17:19 +00:00
Jesse Cai	86ab4d49d4	[pruning][core][feature] LSTM Structured Pruning prune_functions + pattern (#90801 ) Summary: This PR adds in support for LSTM Structured Pruning. - Adds in LSTMSaliencyPruner, an implemented pruner that splits the packed weights, finds the appropriate mask for each piece individually based on saliency, and then combines to create an overall mask for the LSTM. - Adds in pruning functions for LSTM pruning, which will split the weights, apply the masks, and then recombine the pruned weights. Works for both single and multiple-layer LSTMs. Also added a basic pattern to the default set of of patterns for LSTM -> Linear pruning LSTM -> LayerNorm -> Linear pruning Adds in test to check that LSTM pruning works, as well as for LSTMSaliencyPruner Test Plan: `python test/test_ao_sparsity.py -- TestBaseStructuredSparsifier.test_prune_lstm_linear_single_layer` `python test/test_ao_sparsity.py -- TestBaseStructuredSparsifier.test_prune_lstm_linear_multiple_layer` `python test/test_ao_sparsity.py -- TestBaseStructuredSparsifier.test_prune_lstm_layernorm_linear_single_layer` `python test/test_ao_sparsity.py -- TestBaseStructuredSparsifier.test_prune_lstm_layernorm_linear_multiple_layer` `python test/test_ao_sparsity.py -- TestSaliencyPruner.test_lstm_saliency_pruner_update_mask` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D42199001](https://our.internmc.facebook.com/intern/diff/D42199001) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90801 Approved by: https://github.com/jerryzh168	2023-02-01 19:29:03 +00:00
Vasiliy Kuznetsov	6fe234ecc4	pnp: move shadow loggers to parent module (#91428 ) Summary: Before this PR, PNP added shadow loggers to insides of the shadow wrapper modules. This PR moves those loggers to the parent module. There are a couple of benefits: 1. this will unbreak features of quantization API which don't support loggers (such as hardcoding model output to be quantized) 2. this makes it easier to look at the parent graph and visualize what is logged, since now all the logging is in the same graph 3. this will make it easier to implement features such as propagation error calculation in the future Test plan: ``` python test/test_quantization.py -k NShadows ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91428 Approved by: https://github.com/jerryzh168	2023-02-01 18:34:04 +00:00
leslie-fang-intel	0f802eedc2	[Quant][FX] Lower QConvAddReLU2d for onednn backend (#91155 ) Summary Add quantization mappings for QConvAddReLU2d for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode. Test plan ``` python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_onednn python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_by_default python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_lowering ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91155 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-02-01 01:18:52 +00:00
leslie-fang-intel	e77f28a03d	[Quant] Add fused ConvAddReLU2d module for onednn backend (#91154 ) Summary Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused ConvAddReLU2d module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. Test plan ``` python -m pytest test_quantization.py -k test_conv2d_add_relu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91154 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-02-01 01:16:23 +00:00
leslie-fang-intel	ef4118e435	[Quant][FX] Lower QConvAdd2d for onednn backend (#91153 ) Summary Add quantization mappings for QConvAdd2d for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode. Test plan ``` python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_onednn python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_by_default python -m pytest test_quantization.py -k test_fuse_conv_bn_add_relu_lowering ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91153 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-02-01 01:14:12 +00:00
leslie-fang-intel	53c3555a6a	[Quant] Add fused ConvAdd2d module for onednn backend (#91152 ) Summary Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `ConvAdd2d` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. Test plan ``` python -m pytest test_quantization.py -k test_conv2d_add ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91152 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-02-01 01:11:25 +00:00
Jerry Zhang	ae79f95cb8	[quant][fx][pt2e][refactor] Refactor prepare.py for upcoming quantize_pt2e changes (#92641 ) Summary: Changes node.meta["target_dtype_info"] to store observer/fake_quant constructors instead of (dtype, is_dynamic), so that in the future user can provide configure this by themselves, follow up refactors: (1). generalized structure for "target_dtype_info": right now, we have "input_act_obs_or_fq_ctr", "weight_obs_or_fq_ctr", "bias_obs_or_fq_ctr", "output_obs_or_fq_ctr" this works OK for current use cases, and users are using a different config to specify which input is weight and which input is bias, to generalize it we should just expose an api that allow users to specify either a dictionary from input_index to obs_or_fq_ctr, and output_index to obs_or_fq_ctr, e.g. e.g. out1, (out2, out3) = op(arg0, (arg1, arg2)) "input_act_obs_or_fq_ctr" = {0: obs1, 1: obs2} "output_act_obs_or_fq_ctr" = {0: obs3, 1: obs4} note that this would not allow configuring obs/fq for nested structures or have a config that mimics the structure of arguments and output, e.g. out1, (out2, out3) = op(arg0, (arg1, arg2)), we can have "input_act_obs_or_fq_ctr" = (obs1, (obs2, obs3)) "output_act_obs_or_fq_ctr" = (obs4, (obs5, obs6)) (2). use these observer/fq directly for inserting observers instead of using qconfig (3). clean up the TODOs in the code base Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/92641 Approved by: https://github.com/jcaip	2023-01-30 22:57:20 +00:00
Jerry Zhang	61457671a5	[quant][fx][be] Remove _input_output_observed from backend_config (#92589 ) Summary: This is no longer needed, we can use dtype to decide whether an observer is needed or not Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/92589 Approved by: https://github.com/jcaip	2023-01-27 22:17:05 +00:00
Xia, Weiwen	6fa84fdea2	[FX][Quant] Enable FX quant for patterns like x.view(x.size(...), ...) (#90001 ) Summary This work continues with https://github.com/pytorch/pytorch/pull/83784 by @vkuzo and includes all the changes in that PR. Quote from https://github.com/pytorch/pytorch/pull/83784: > Issue #83658 reports that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization. Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization. > The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way. Note: This PR needs https://github.com/pytorch/pytorch/pull/91297 to be landed first otherwise there is a UT failure. Test plan ``` python test/test_quantization.py -k test_linear_size_view python test/test_quantization.py -k test_linear_shape_view ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90001 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-01-27 07:56:29 +00:00
Xia, Weiwen	1d03a6a901	[Quant][Fx] Fix issue: qconfig_mappings of onednn backend are not correctly set for fused modules (#91297 ) Summary For onednn quantization backend only. Currently, FX fusion requires that all separate ops in a fused module/op have the same `qconfig`. To support `linear - leaky_relu` and `linear - tanh` fusion with onednn backend, we previously explicitly set the same `qconfig` to `linear`, `leaky_relu` and `tanh`. However, this brings two problems: - It breaks fusion of `linear - relu` since `relu` does not have the same `qconfig` as `linear` does. And it does not look good if we set `qconfig` to all these ops. They should use a global `qconfig` by default. - `Tanh` requires `fixed_qparams_qconfig` otherwise it is not quantized. So, we cannot set another `qconfig` to `tanh`. Looks like there is not a straightforward way to solve the problems. This PR fixes them by the following: - Do not set `qconfig` to these ops so that these ops use a global `qconfig` and `linear - relu` and `linear - leaky_relu` can be fused correctly. - Set the same `qconfig` to `linear` and `tanh` manually by users when they want to fuse `linear - tanh` with onednn backend. A known issue still exists: users cannot fuse `linear - tanh` and quantize standalone `tanh` at the same time. Test plan python test/test_quantization.py -k test_qconfig_dict_with_fused_modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/91297 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-01-26 09:55:34 +00:00
Michael Gschwind	7265f60ad0	Regularize mask handling for attn_mask and key_padding_mask (#92733 ) Summary: Regularize mask handling for attn_mask and key_padding_mask * Update documentation to remove reference to byte masks (which were deprecated long ago) * Introduce check and warn about deprecation if attn_mask and key_padding_mask types mismatch * Convert all masks to float before combining * Combine by adding Test Plan: sandcastle & github CI Differential Revision: D42653215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92733 Approved by: https://github.com/ngimel, https://github.com/drisspg	2023-01-24 14:12:05 +00:00
Jacob Szwejbka	eb32bb2ca6	[Executorch][Quantization] Backend Config for functional embedding (#92700 ) Summary: title Test Plan: ci Differential Revision: D42643985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92700 Approved by: https://github.com/jerryzh168	2023-01-24 03:12:56 +00:00
Nikita Shulga	c0dd9b3b67	Revert "[Executorch][Quantization][BE] Refactor Choose Qparams (#92592 )" This reverts commit `59071ab1e7`. It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.	2023-01-23 09:13:02 -08:00
Jerry Zhang	6016e4c707	[quant][fx][refactor] Rename modules to named_modules (#92575 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/92575 Approved by: https://github.com/jcaip	2023-01-22 04:53:03 +00:00
Jerry Zhang	a74c8df7cd	[quant][fx][pt2e][be] Store node_name_to_target_dtype to node.meta["target_dtype_info"] (#92574 ) Summary: This is in preparation for quantize_pt2e API where we allow programability for users to set how they want to quantize their model Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/92574 Approved by: https://github.com/jcaip	2023-01-21 00:27:15 +00:00
Jerry Zhang	1464db08b4	[quant][pt2e] Support setting qconfig by module_type (#92355 ) Summary: This PR supports the following feature for QConfigMapping: ``` qconfig_mapping = QConfigMapping().set_object_type(torch.nn.Conv2d, qconfig) backend_config = get_qnnpack_pt2e_backend_config() m = prepare_pt2e(m, qconfig_mapping, example_inputs, backend_config) ``` which means users want to set the qconfig for all calls to `torch.nn.Conv2d` to use `qconfig`, note this is only verified for the case when the module is broken down to a single aten op right now, e.g. torch.nn.Conv2d will be torch.ops.aten.convolution op when traced through. will need to support more complicated modules that is broken down to multiple operators later, e.g. (MaxPool) Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qconfig_module_type Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/92355 Approved by: https://github.com/jcaip	2023-01-20 03:18:21 +00:00
Jacob Szwejbka	59071ab1e7	[Executorch][Quantization][BE] Refactor Choose Qparams (#92592 ) Summary: Should hopefully be a little faster. Definitely cleaner to not create an observer inside the op Test Plan: ci Differential Revision: D42154677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92592 Approved by: https://github.com/jerryzh168	2023-01-20 01:36:47 +00:00
Alex Settle	f8a07ca422	Reland 2nd attempt "Add heirachical module names to torchFX graph.node" (#91721 ) Fixes #87659 Reland of PR #87742 and PR #90205 PR #90205 was reverted due to BC issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/91721 Approved by: https://github.com/jerryzh168	2023-01-18 23:00:36 +00:00
Xia, Weiwen	61a7618f3c	[Quant][Eager] Copy MHA's batch_first attribute in prepare() (#91680 ) Summary Fixes #91571 MHA's batch_first attribute is not copied after `torch.quantization.prepare()`. Now we copy MHA's batch_first attribute in torch/ao/nn/quantizable/modules/activation.py: `MultiheadAttention.from_float()`. Test plan python test/test_quantization.py -k test_mha_batch_first_attr_is_copied_in_prepare Pull Request resolved: https://github.com/pytorch/pytorch/pull/91680 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-01-18 10:49:05 +00:00
Jerry Zhang	ec3941ada6	[quant][fx] Add support for GRU in fx graph mode quantization (#91976 ) Summary: might be needed by a meta-internal use case Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_rnn Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/91976 Approved by: https://github.com/jcaip	2023-01-13 07:00:12 +00:00
andrewor14	0bd3fa3d22	[Quant][docs] Move parts of BackendConfig tutorial (#91999 ) Summary: This commit moves the API specification section of the BackendConfig tutorial to the docstrings, which is a more suitable place for this content. This change also reduces some duplication. There is no new content added in this change. Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/91999 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2023-01-13 05:59:22 +00:00
Harshit Khaitan	ffbd13b654	Fix for swap_custom_module_to_observer doing duplicate swaps on the same node.target (#91905 ) Summary: This is a fix for the following issue: "When two nodes in a model have the same dTypes / node.target, the torch quantization prepare_fx flow does not check for duplicates and tries to do a custom module swap twice. When it attempts the swap the same target for a second time, the swap_custom_module_to_observed detects the observed module instead of the float module class on the target, and fails on an assertion. " The added unit test demonstrates a simple example where it fails in absence of this fix. Test Plan: buck test mode/dev //caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_custom_module_class_input_has_duplicate_nodes (quantization.fx.test_quantize_fx.TestQuantizeFx)' Reviewed By: vkuzo Differential Revision: D42023273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91905 Approved by: https://github.com/jerryzh168	2023-01-12 05:24:38 +00:00
Jesse Cai	32e9b29ce9	[pruning][core][feature] Add in SaliencyPruner to pruner._experimental (#91814 ) Summary: This PR adds in SaliencyPruner, an implementation of L1 norm pruning for structured pruning, as well as additional tests for the SaliencyPruner The README.md references this file but I forgot to add it in earlier when writing the tutorial. Test Plan: ``` python test/test_ao_sparsity.py -- TestSaliencyPruner ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/91814 Approved by: https://github.com/jerryzh168	2023-01-10 04:04:55 +00:00
leslie-fang-intel	aab55d6d0d	[Quant] Remove all the dequant nodes when the ref module has multi input args (#90157 ) Summary: When converting a ref module into a quant module, `_lower_static_weighted_ref_module` pass assumes the `ref_node` only has 1 input node, and only remove the first `dequant` node. We add a check in this PR to ensure this is the case for `_lower_static_weighted_ref_module` pass. Test Plan: We only add a check in this PR, there is no new added test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90157 Approved by: https://github.com/Xia-Weiwen, https://github.com/jgong5, https://github.com/jerryzh168	2023-01-05 23:58:45 +00:00
Vasiliy Kuznetsov	ebb7f20afc	quant: make various configs printable (#91419 ) Summary: Makes various quantization configs print out human readable values instead of just the class name. This is useful when printing these configs out when debugging. Test plan: test script ``` conf_1 = torch.ao.quantization.backend_config.backend_config.DTypeConfig() print(conf_1) conf_2 = torch.ao.quantization.backend_config.backend_config.BackendConfig() print(conf_2) conf_3 = torch.ao.quantization.backend_config.backend_config.BackendPatternConfig() print(conf_3) conf_4 = torch.ao.quantization.fx.custom_config.PrepareCustomConfig()\ .set_input_quantized_indexes([0]) print(conf_4) conf_5 = torch.ao.quantization.fx.custom_config.ConvertCustomConfig()\ .set_preserved_attributes(['foo']) print(conf_5) conf_6 = torch.ao.quantization.fx.custom_config.FuseCustomConfig()\ .set_preserved_attributes(['foo']) print(conf_6) ``` test script output ``` DTypeConfig(input_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None, quant_max_ upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero_point_exa ct_match=None), output_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None, quant _max_upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero_poin t_exact_match=None), weight_dtype_with_constraints=DTypeWithConstraints(dtype=None, quant_min_lower_bound=None, quant_max_upper_bound=None, scale_min_lower_bound=None, scale_max_upper_bound=None, scale_exact_match=None, zero _point_exact_match=None), bias_dtype=None, is_dynamic=None) BackendConfig({'name': '', '_pattern_complex_format_to_config': {}}) BackendPatternConfig({'observation_type': <ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT: 0>}) PrepareCustomConfig({'input_quantized_indexes': [0]}) ConvertCustomConfig({'preserved_attributes': ['foo']}) FuseCustomConfig({'preserved_attributes': ['foo']}) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91419 Approved by: https://github.com/andrewor14	2023-01-04 04:52:20 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Jerry Zhang	2a23dfe8ed	[quant] Support lowering for quantized embedding byte operator (#91159 ) Summary: This PR adds lowering for embedding in quantization in executorch flow Test Plan: buck run executorch/exir/tests:quant_fusion_pass -- "executorch.exir.tests.test_quant_fusion_pass.TestQuantFusionPass.test_embedding_byte" Reviewed By: qihqi Differential Revision: D41673139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91159 Approved by: https://github.com/vkuzo	2022-12-21 22:52:24 +00:00
Jesse Cai	48511eca82	[pruning][docs] Update README.md for structured pruning (#90403 ) Summary: I wrote a tutorial of how to use structured pruning flow as part of BE week Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90403 Approved by: https://github.com/HDCharles	2022-12-21 20:07:06 +00:00
Xia, Weiwen	a5eb564ba4	[Quant] lower fused LinearTanh for onednn backend (#89188 ) Summary Add fuser method and quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering are supported only in FX mode. Test plan python test_quantization.py TestFuseFx TestQuantizeFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/89188 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-20 01:30:21 +00:00
Xia, Weiwen	6686e9bc07	[Quant] Add fused LinearTanh module for onednn backend (#88923 ) Summary This PR adds fused `QLinearTanh` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. Test plan python test_quantization.py TestStaticQuantizedModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/88923 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-19 13:42:25 +00:00
Xia, Weiwen	9ca41a986c	[Quant][FX] Lower QLinearLeakyReLU for onednn backend (#88668 ) Summary Add quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering is supported only in FX mode. Test plan python test_quantization.py TestQuantizeFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88668 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-19 00:44:24 +00:00
Xia, Weiwen	7b0ec67e34	[Quant][FX] Add backend config for onednn backend and fuse Linear-LeakyReLU (#88665 ) Summary Add backend config for onednn backend so that it can support more post op fusion for int8 inference. First `Linear - LeakyReLU` fusion is implemented based on previous PRs. Test plan python test_quantization.py TestFuseFx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88665 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-17 03:33:08 +00:00
Jerry Zhang	f7b384cc46	[reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#91035 ) Summary: This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack * torch.ao.quantization.quantize_pt2e.prepare_pt2e Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration for post training quantization * torch.ao.quantization.quantize_pt2e.convert_pt2e Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules Also added a backend config for the qnnpack_pt2e backend: * torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees Test Plan: python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/91035 Approved by: https://github.com/HDCharles	2022-12-17 02:15:53 +00:00
Sergii Dymchenko	4438b019a8	Fix non-existing parameters in docstrings in torch/ao (#90875 ) This is a continuation of https://github.com/pytorch/pytorch/pull/90505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90875 Approved by: https://github.com/clee2000	2022-12-16 22:34:33 +00:00
Michael Gschwind	512ec181ec	Introduce causal mask (#90508 ) Summary: Introduce causal mask This PR introduces a causal mask option _causal_mask (as well as causal mask detection if attn_mask is provided), since current custom kernels do not support arbitrary masks. Test Plan: sandcastle & github ci/cd Differential Revision: D41723137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90508 Approved by: https://github.com/albanD	2022-12-16 21:39:42 +00:00
Jacob Szwejbka	bd94ee66ea	[quantized] [executorch] typo (#89960 ) Summary: Inefficient impl in python Test Plan: buck2 test mode/dev //caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_quantized_embedding_byte (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' Differential Revision: D41627744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89960 Approved by: https://github.com/jerryzh168	2022-12-16 19:49:09 +00:00
PyTorch MergeBot	ad1b04c4a9	Revert "[reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971 )" This reverts commit `7dd5e55497`. Reverted https://github.com/pytorch/pytorch/pull/90971 on behalf of https://github.com/ezyang due to still broke tons of master jobs sorry	2022-12-16 09:29:39 +00:00
HDCharles	a01c1ee594	[ao] making _is_activation_post_process private with BC (#90554 ) same function in observer and quantize, consolidated to a single function note: this is a recreation of D40709276 which caused severa breakages due to not maintaining BC for models with cached code with calls to the old function name Differential Revision: [D41793604](https://our.internmc.facebook.com/intern/diff/D41793604/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41793604/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/90554 Approved by: https://github.com/jcaip	2022-12-16 08:09:33 +00:00
Xia, Weiwen	6ea93b2295	[Quant] Add fused LinearLeakyReLU module for onednn backend (#88661 ) Summary Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. Test plan python test_quantization.py TestStaticQuantizedModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/88661 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-16 07:28:13 +00:00
Jerry Zhang	7dd5e55497	[reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971 ) Summary: This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack * torch.ao.quantization.quantize_pt2e.prepare_pt2e Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration for post training quantization * torch.ao.quantization.quantize_pt2e.convert_pt2e Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules Also added a backend config for the qnnpack_pt2e backend: * torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees Test Plan: python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90971 Approved by: https://github.com/HDCharles	2022-12-16 06:24:28 +00:00
PyTorch MergeBot	d9d263efb9	Revert "[Quant] Add fused LinearLeakyReLU module for onednn backend (#88661 )" This reverts commit `353c2e7d39`. Reverted https://github.com/pytorch/pytorch/pull/88661 on behalf of https://github.com/Xia-Weiwen due to This is breaking tests. Need to rebase.	2022-12-16 02:58:26 +00:00
Xia, Weiwen	353c2e7d39	[Quant] Add fused LinearLeakyReLU module for onednn backend (#88661 ) Summary Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. Test plan python test_quantization.py TestStaticQuantizedModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/88661 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2022-12-16 01:54:10 +00:00
HDCharles	9259933edd	[ao][fx] fixing public v private prepare.py (#88398 ) Summary: made _DO_NOT_OBS_DTYPE_LIST, _add_matched_node_name_to_set, _get_arg_target_is_dynamic_as_input_to_node, _get_arg_target_is_dynamic_as_input_to_node, _get_arg_target_dtype_as_input_to_node, _get_arg_target_dtype_as_output, _get_target_activation_dtype_for_node, _get_standalone_module_configs, _insert_observer, _is_activation_post_process_node, _is_input_arg_dtype_supported_by_backend, _is_observer_in_same_graph, _is_output_dtype_supported_by_backend, _maybe_insert_input_equalization_observers_for_node, _maybe_insert_input_observer_for_arg_or_kwarg, _maybe_insert_input_observers_for_node, _maybe_insert_observers_before_graph_output, _maybe_insert_output_observer_for_node, _maybe_make_input_output_share_observers, _maybe_propagate_dtype_for_node, _qat_swap_modules, _remove_output_observer, _run_prepare_fx_on_standalone_modules, _save_state, _swap_custom_module_to_observed private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015542](https://our.internmc.facebook.com/intern/diff/D41015542) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88398 Approved by: https://github.com/jcaip	2022-12-16 00:30:41 +00:00
PyTorch MergeBot	9c912c7dd0	Revert "[quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802 )" This reverts commit `a66af1feba`. Reverted https://github.com/pytorch/pytorch/pull/90802 on behalf of https://github.com/malfet due to somehow broke test_resnet18 (quantization.fx.test_quantize_pt2e.TestQuantizePT2EModels), see `a66af1feba`	2022-12-15 23:28:21 +00:00
Jerry Zhang	a66af1feba	[quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802 ) Summary: This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack * torch.ao.quantization.quantize_pt2e.prepare_pt2e Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration for post training quantization * torch.ao.quantization.quantize_pt2e.convert_pt2e Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules Also added a backend config for the qnnpack_pt2e backend: * torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees Test Plan: python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90802 Approved by: https://github.com/qihqi	2022-12-15 21:50:29 +00:00
HDCharles	173accd1c1	[ao][fx] fixing public v private qconfig_mapping_utils.py (#88399 ) Summary: made _check_is_valid_config_dict, _compare_prepare_convert_qconfig_mappings, _generate_node_name_to_qconfig, _is_qconfig_supported_by_dtype_configs, _maybe_adjust_qconfig_for_module_name_object_type_order, _update_qconfig_for_fusion private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015544](https://our.internmc.facebook.com/intern/diff/D41015544) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88399 Approved by: https://github.com/jcaip	2022-12-15 17:48:34 +00:00
HDCharles	6a866c3ed1	[ao] fixing public v private for torch.ao.nn.X (#87883 ) Summary: this mostly consisted of adding __all__ to files without them. A few functions in X.utils were made private too Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40814548](https://our.internmc.facebook.com/intern/diff/D40814548) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87883 Approved by: https://github.com/jcaip, https://github.com/anjali411	2022-12-15 03:03:07 +00:00
HDCharles	f286cbebce	[ao][fx] fixing public v private graph_module.py (#88395 ) Summary: made _is_observed_module, _is_observed_standalone_module private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015545](https://our.internmc.facebook.com/intern/diff/D41015545) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88395 Approved by: https://github.com/jcaip	2022-12-15 02:15:04 +00:00
HDCharles	1ca9d43d4e	[ao] quantize.py fixing public v private (#87521 ) Summary: made _register_activation_post_process_hook, _add_observer, _get_unique_devices_, _get_observer_dict private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709277](https://our.internmc.facebook.com/intern/diff/D40709277) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87521 Approved by: https://github.com/jerryzh168	2022-12-14 22:50:39 +00:00
andrewor14	691a44f403	[Quant][fx][bc-breaking] Add simpler BackendConfig pattern format (#90698 ) Summary: The existing BackendConfig fusion pattern uses a "reversed nested tuple" format that is highly unintuitive. For example, ``` linear-relu -> (nn.ReLU, nn.Linear) conv-bn-relu -> (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d)) ``` This pattern format also complicates the signatures of the user specified "fuser methods", which needed to accept arguments in reverse nested order to match the patterns: ``` def fuse_linear_relu(is_qat, relu, linear): ... def fuse_conv_bn_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv ... ``` Instead, this commit introduces a new pattern format that simply specifies the ops in forward order with no nesting: ``` linear-relu -> (nn.Linear, nn.ReLU) conv-bn-relu -> (nn.Conv2d, nn.BatchNorm2d, nn.ReLU) def fuse_linear_relu(is_qat, linear, relu): ... def fuse_conv_bn_relu(is_qat, conv, bn, relu): ... ``` Note that the legacy "reversed nested tuple" is still used internally since it is more general. In the future, we should replace it with the format used in the subgraph rewriter in `torch.fx`, and simplify the existing pattern matching code to handle the new format added in this commit. BC-breaking Notes: Before: ``` import torch as nn import torch.ao.nn.intrinsic as nni from torch.ao.quantization.backend_config import BackendPatternConfig def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` After: ``` def fuse_linear_relu(is_qat, conv, bn, relu): return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` OR (for backward-compatibility) ``` def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig() \ ._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) \ ._set_use_legacy_pattern_format(True) ``` Before: ``` backend_config.configs # returns Dict[Pattern, BackendPatternConfig] ``` After: ``` backend_config.configs # returns List[BackendPatternConfig] ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestBackendConfig Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D41954553](https://our.internmc.facebook.com/intern/diff/D41954553) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90698 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2022-12-14 22:44:29 +00:00
HDCharles	258860fa3a	[ao][fx] fixing public v private for pattern_utils.py (#88397 ) Summary: made _DEFAULT_FUSION_PATTERNS, _register_fusion_pattern, _DEFAULT_QUANTIZATION_PATTERNS, _DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP, _DEFAULT_OUTPUT_OBSERVER_MAP, _register_quant_pattern, _sorted_patterns_dict private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015537](https://our.internmc.facebook.com/intern/diff/D41015537) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88397 Approved by: https://github.com/jcaip	2022-12-14 03:40:02 +00:00
HDCharles	79156c11c3	[ao][fx] fixing public v private match_utils.py (#88396 ) Summary: made _is_match, _find_matches, _MatchResult private also added __all__ to lower_to_qnnpack.py Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015540](https://our.internmc.facebook.com/intern/diff/D41015540) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88396 Approved by: https://github.com/jcaip	2022-12-13 20:16:55 +00:00
HDCharles	a856557b3a	[ao][fx] public v private convert.py (#88394 ) Summary: made _restore_state, _has_none_qconfig, _run_weight_observers, _maybe_recursive_remove_dequantize, _get_module_path_and_prefix, _insert_dequantize_node, _maybe_get_observer_for_node, _remove_previous_dequantize_in_custom_module private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015547](https://our.internmc.facebook.com/intern/diff/D41015547) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88394 Approved by: https://github.com/jcaip	2022-12-13 20:10:12 +00:00
PyTorch MergeBot	1119d2fa54	Revert "Reland "Add heirachical module names to torchFX graph.node" (#90205 )" This reverts commit `6b7efac3c9`. Reverted https://github.com/pytorch/pytorch/pull/90205 on behalf of https://github.com/seemethere due to Reverting since this caused failures in internal systems, see https://fb.workplace.com/groups/802176577445480/posts/894284641568006 for discussion	2022-12-13 17:47:07 +00:00
Jerry Zhang	94b9bb324f	[quant] Add example for lowering quantized dynamic linear pattern through delegation (#90640 ) Summary: Only the pattern part, will leave the delegation example to Chen Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic" Reviewed By: cccclai Pull Request resolved: https://github.com/pytorch/pytorch/pull/90640 Approved by: https://github.com/cccclai	2022-12-13 00:57:33 +00:00
HDCharles	e11650887e	[ao] fix incorrect integer cast on histogram observer bounds (#90355 ) Summary: A cast to int was added in https://github.com/pytorch/pytorch/pull/45630 to make mypy not complain. However this leads to unexpected behavior where the histogram doesn't actually capture the full range of activation values. note1: the test_histogram_observer_against_reference test was secretly broken, on master. The random parameters that normally get run apparently don't cause a test failure but if you make a loop repeatedly run the test, it would eventually fail. This was due to in some cases sum(<tensor>)!=torch.sum(<tensor>).item(). I was not able to reproduce this with a toy example but running this test in a loop and editing either observer to print the calculation for 'total' would break the test and show different behaviors. Fixing this test was necessary to land this PR since the changing histogram bounds changed things enough that this test would error. note2: updating histogram observer breaks some BC tests unless I regenerate the model using the HistogramObserver from this PR Test Plan: python test/test_quantization.py TestHistogramObserver.test_histogram_observer_correct_numel python test/test_quantization -k histogram Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90355 Approved by: https://github.com/vkuzo	2022-12-12 20:30:44 +00:00
Sergii Dymchenko	f51f6aa387	Fix non-existing parameters in docstrings (#90505 ) Continuation after https://github.com/pytorch/pytorch/pull/90163. Here is a script I used to find all the non-existing arguments in the docstrings (the script can give false positives in presence of args/*kwargs or decorators): _Edit:_ I've realized that the indentation is wrong for the last `break` in the script, so the script only gives output for a function if the first docstring argument is wrong. I'll create a separate PR if I find more issues with corrected script. ``` python import ast import os import docstring_parser for root, dirs, files in os.walk('.'): for name in files: if root.startswith("./.git/") or root.startswith("./third_party/"): continue if name.endswith(".py"): full_name = os.path.join(root, name) with open(full_name, "r") as source: tree = ast.parse(source.read()) for node in ast.walk(tree): if isinstance(node, ast.FunctionDef): all_node_args = node.args.args if node.args.vararg is not None: all_node_args.append(node.args.vararg) if node.args.kwarg is not None: all_node_args.append(node.args.kwarg) if node.args.posonlyargs is not None: all_node_args.extend(node.args.posonlyargs) if node.args.kwonlyargs is not None: all_node_args.extend(node.args.kwonlyargs) args = [a.arg for a in all_node_args] docstring = docstring_parser.parse(ast.get_docstring(node)) doc_args = [a.arg_name for a in docstring.params] clean_doc_args = [] for a in doc_args: clean_a = "" for c in a.split()[0]: if c.isalnum() or c == '_': clean_a += c if clean_a: clean_doc_args.append(clean_a) doc_args = clean_doc_args for a in doc_args: if a not in args: print(full_name, node.lineno, args, doc_args) break ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90505 Approved by: https://github.com/malfet, https://github.com/ZainRizvi	2022-12-09 21:43:09 +00:00
Alex Settle	6b7efac3c9	Reland "Add heirachical module names to torchFX graph.node" (#90205 ) Fixes #87659 Reland of PR #87742 Resolves errors that caused the changes to be backed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90205 Approved by: https://github.com/jerryzh168	2022-12-09 06:20:31 +00:00
HDCharles	c71b12851d	[ao] public vs private for ao.quantization._X (#88392 ) Summary: added all for these modules without altering names since they tend to be experimental Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015543](https://our.internmc.facebook.com/intern/diff/D41015543) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88392 Approved by: https://github.com/jcaip	2022-12-09 05:39:29 +00:00
HDCharles	6050a7a3d9	[ao] backend_config moving all to top (#88391 ) Summary: moved __all__ to top of functions, removed private funcitons from all Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41015538](https://our.internmc.facebook.com/intern/diff/D41015538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88391 Approved by: https://github.com/jcaip	2022-12-09 05:39:29 +00:00
Jerry Zhang	f978a8b026	[quant][be] Remove special casing for getitem in prepare (#90393 ) Summary: This PR cleans up previous special casing for getitem, it should be configured through BackendConfig Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41846185](https://our.internmc.facebook.com/intern/diff/D41846185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90393 Approved by: https://github.com/andrewor14	2022-12-09 01:59:02 +00:00
Jesse Cai	de016b3799	[pruning][core][feature] Implement prune for structured pruning (#89777 ) Summary: This PR implements `prune` in BaseStructuredSparsifier: `prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed. `prune` is defined by a mapping from patterns to different pruning functions. - patterns are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)` - pruning functions are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights. ``` def prune_linear_activation_linear(linear1, activation, linear2): pass ``` - This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear` At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns. This is unlike the previous code which attempted to do both at the same time. There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same. Currently we have pruning functions for the following patterns: - linear -> linear - linear -> activation -> linear - conv2d -> conv2d - conv2d -> activation -> conv2d - conv2d -> activation -> pool -> conv2d - conv2d -> pool -> activation -> conv2d - conv2d -> adaptive pool -> flatten -> linear Added in MyPy type hints as well for the prune_functions. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777 Approved by: https://github.com/vkuzo	2022-12-08 07:13:24 +00:00
Jerry Zhang	47071c3d47	[quant] Add support for symmetric quant in executorch (#90304 ) Summary: This PR adds symmetric quant in the backend config for executorch Test Plan: NA, will be tested in meta internal flow Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90304 Approved by: https://github.com/cccclai, https://github.com/jcaip, https://github.com/andrewor14	2022-12-08 01:03:00 +00:00
PyTorch MergeBot	9f7bc7bc24	Revert "[Quant][fx][bc-breaking] Make convert.py smaller (#90189 )" This reverts commit `824641b083`. Reverted https://github.com/pytorch/pytorch/pull/90189 on behalf of https://github.com/seemethere due to Fails internal tests due to potential circular import, see https://www.internalfb.com/diff/D41817429?dst_version_fbid=1453307181865235&transaction_fbid=899728221278938	2022-12-08 00:51:13 +00:00
PyTorch MergeBot	1b1301f16a	Revert "[pruning][core][feature] Implement prune for structured pruning (#89777 )" This reverts commit `3531e44307`. Reverted https://github.com/pytorch/pytorch/pull/89777 on behalf of https://github.com/clee2000 due to breaking test_ao_sparcity due to import `3531e44307` https://github.com/pytorch/pytorch/actions/runs/3641476330/jobs/6147830487, probably a landrace with 824641b083860df4d7ffef06a798ea2702bc4bde?	2022-12-07 19:41:15 +00:00
Jesse Cai	3531e44307	[pruning][core][feature] Implement prune for structured pruning (#89777 ) Summary: This PR implements `prune` in BaseStructuredSparsifier: `prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed. `prune` is defined by a mapping from patterns to different pruning functions. - patterns are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)` - pruning functions are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights. ``` def prune_linear_activation_linear(linear1, activation, linear2): pass ``` - This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear` At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns. This is unlike the previous code which attempted to do both at the same time. There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same. Currently we have pruning functions for the following patterns: - linear -> linear - linear -> activation -> linear - conv2d -> conv2d - conv2d -> activation -> conv2d - conv2d -> activation -> pool -> conv2d - conv2d -> pool -> activation -> conv2d - conv2d -> adaptive pool -> flatten -> linear Added in MyPy type hints as well for the prune_functions. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777 Approved by: https://github.com/vkuzo	2022-12-07 17:52:01 +00:00
Jesse Cai	d680ea7e36	[quant]Fix public bindings for DTypeWithConstraint (#90315 ) Summary: Need this to fix `test_public_bindings`. Test Plan: `python test/test_public_bindings.py` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90315 Approved by: https://github.com/HDCharles	2022-12-07 17:52:01 +00:00
andrewor14	824641b083	[Quant][fx][bc-breaking] Make convert.py smaller (#90189 ) Summary: This commit moves helper functions that are not core to the convert logic out of convert.py, which was more than 1000 lines. This helps with readability since a new developer won't have to scroll through hundreds of lines of util functions to understand the core logic. There should be no change in functionality in this commit. BC-breaking notes: The following helper functions that were previously exposed under the `torch.ao.quantization.fx.convert` namespace are now made private. Many of these are moved to the new convert_utils.py ``` convert_custom_module convert_standalone_module convert_weighted_module get_module_path_and_prefix, has_none_qconfig, insert_dequantize_node, is_conversion_supported, maybe_recursive_remove_dequantize, replace_observer_or_dequant_stub_with_dequantize_node, restore_state, run_weight_observers, ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/90189 Approved by: https://github.com/jerryzh168	2022-12-07 16:16:25 +00:00

... 2 3 4 5 6 ...

1019 Commits