pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jerry Zhang	32a16d4999	[quant][pt2e] Support int16 quantization (#108453 ) Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453 Approved by: https://github.com/kimishpatel	2023-09-06 19:31:20 +00:00
Jerry Zhang	16fcb07846	[quant][pt2e] Add support for channel in DerivedQuantizationSpec (#107833 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E.test_derived_qspec_per_channel Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48630535](https://our.internmc.facebook.com/intern/diff/D48630535) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107833 Approved by: https://github.com/andrewor14	2023-08-24 07:45:13 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit `88ab3e4322`. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
Jerry Zhang	d3c4ec767b	[quant][pt2e] Fix handling for SharedQuantizationSpec (#106922 ) Summary: Previously if we have: ``` conv1 -> cat conv2 / ``` and configure output of conv1/conv2 to be int8 quantized, and cat also int8 quantized and with shared inputs, it will not produce expected results (input of cat will not be shared) The problem is that there is some missing checks when inserting observers for input for cat This PR fixes the problem. Fixes: https://github.com/pytorch/pytorch/issues/106760 Test Plan: python tes/test_quantization.py TestQuantzePT2E.test_shared_qspec Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/106922 Approved by: https://github.com/kimishpatel	2023-08-16 21:16:45 +00:00
Jiaxu Zhu	9e35df4adc	[pytorch][ao] force weight observer/fake_quant to be on the same device as the weight tensor (#106755 ) Summary: As title. There's a corner case where both cpu and gpu are avaiable, although the model is moved to cpu, the newly created PTQ weight observer is still on gpu. Therefore, during the convert, this line will fail https://fburl.com/4rhipfvb Test Plan: CI Differential Revision: D48141494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106755 Approved by: https://github.com/jerryzh168	2023-08-09 00:22:49 +00:00
Leon	850ad54139	correct spelling mistake (#106309 ) Fixes #ISSUE_NUMBER correct spelling mistake Pull Request resolved: https://github.com/pytorch/pytorch/pull/106309 Approved by: https://github.com/kit1980	2023-08-02 04:38:23 +00:00
PyTorch MergeBot	93b2036bef	Revert "[quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894 )" This reverts commit `3ca71ed735`. Reverted https://github.com/pytorch/pytorch/pull/105894 on behalf of https://github.com/huydhn due to breaking executorch tests internally ([comment](https://github.com/pytorch/pytorch/pull/105894#issuecomment-1654831950))	2023-07-28 01:16:02 +00:00
Jerry Zhang	3ca71ed735	[quant][pt2e] store scale/zero_point as tensor attributes to support serialization (#105894 ) Summary: Currently scale/zero_point for per tensor quant is stored as burnt in literals, this means these values can't be serialized in state_dict, this PR changes them to buffers/Tensors so that they can be serialized Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D47770963](https://our.internmc.facebook.com/intern/diff/D47770963) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105894 Approved by: https://github.com/kimishpatel	2023-07-26 20:15:06 +00:00
Jerry Zhang	3a77f9aaaf	[quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885 ) Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations Test Plan: CIs sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18" Differential Revision: D47727838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885 Approved by: https://github.com/andrewor14	2023-07-26 18:20:09 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
Justin Chu	c0d8a4af0a	[BE] Enable ruff's UP rules and autoformat ao/ (#105430 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430 Approved by: https://github.com/albanD, https://github.com/malfet	2023-07-19 13:44:37 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
Jerry Zhang	7b4d080496	[quant][pt2e] Rename _pt2e to pt2e (#104668 ) Summary: X-link: https://github.com/pytorch/executorch/pull/3 att Test Plan: Imported from OSS Differential Revision: D47202807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668 Approved by: https://github.com/andrewor14	2023-07-15 06:34:17 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	3c5a494d7a	Revert "Update mypy to 1.4.1 (#91983 )" This reverts commit `634659e262`. Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))	2023-07-14 15:59:16 +00:00
Nikita Shulga	634659e262	Update mypy to 1.4.1 (#91983 ) Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi	2023-07-13 16:30:36 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Jerry Zhang	ecca9591d5	[quant][pt2e] Add reference representation for quantize/dequantize operators (#104395 ) Summary: Similar to quantized add, in this PR we added the reference represenation for quantize/dequantize operators Test Plan: buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_quantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_dequantize (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' Reviewed By: kimishpatel Differential Revision: D46959928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104395 Approved by: https://github.com/andrewor14	2023-06-30 04:32:18 +00:00
Jerry Zhang	c98896b76f	[quant][pt2e] Add more precise representation for quantized add (#104130 ) Summary: The planned e2e for quantization in pytorch 2.0 export is the following: float_model -> prepare_pt2e -> calibration -> convert_pt2e -> ... inside convert_pt2e, we will first produce a q/dq representation of the quantized model, similar to the previous output of convert_to_reference_fx in fx grah mode quantization: ``` torch.ops.quantized_decomposed.dequantize_per_tensor -> torch.ops.aten.add -> torch.ops.quantized_decomopsed.quantize_per_tensor torch.ops.quantized_decomposed.dequantize_per_tensor / ``` Then we'll rewrite the above to a more precise representation that express the intention in a more precise manner, since here we actually want to do int8 addition, instead of simulating the int8 addition with fp32 operations, the representation for quantized add is: ``` def quantized_add(x_i8, x_scale, x_zero_point, y_i8, y_scale, y_zero_point, out_scale, out_zero_point): x = (x_scale / out_scale) * x_i8 y = (y_scale / out_scale) * y_i8 out = x + y out -= (x_zero_point * x_scale - y_zero_point * y_scale) / out_scale out += out_zero_point return out ``` Test Plan: ``` buck2 test caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_representation_add (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' ``` Reviewed By: kimishpatel Differential Revision: D45628032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104130 Approved by: https://github.com/kimishpatel	2023-06-27 20:11:30 +00:00
Andrew Or	303ff84b04	[quant][pt2] Update special qspecs after QAT rewrite (#103970 ) Summary: Special qspecs like `SharedQuantizationSpec` and `DerivedQuantizationSpec` refer to other nodes in the graph. However, after subgraph rewriting in QAT, the nodes referred to in these special qspecs may be replaced by new nodes. This could lead to the following error when inserting observers according to these qspecs: ``` AssertionError: please make sure only refer to edge or node that has observer/fake_quant inserted: 'getitem' not in dict_keys([(arg0, convolution_default_1), (mul_tensor, convolution_default_1), getitem_3]) ``` This commit fixes this by keeping track of the nodes that are replaced during subgraph rewriting in QAT, and using this mapping to update the dangling references used in these special qspecs. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_update_shared_qspec Reviewed By: jerryzh168 Differential Revision: D46606614 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103970 Approved by: https://github.com/jerryzh168	2023-06-22 20:05:57 +00:00
Kimish Patel	90ee6a7354	[PT2][Quant] Update op names for decomposed quantized lib (#103251 ) Summary: Dynamo trace, via dynamo.export, with aten_graph, generates graph with nodes whose target is an isntance of torch._ops.OpOverload. Quantization workflow inserting quantize/dequantize ops which are sometimes instances of torch._ops.OpOverload (quantize_per_tensor.tensor) while other times instances of torch._ops.OpOverloadPacket (quantizer_per_tensor) is a bit inconsistent. Also not sure if it is a valid exported model, if it has nodes with target of type torch._ops.OpOverloadPacket. Without op overload name attached to the 'target', it fails during executorch tracing. Reason is that executorch tracing expects node's targets to be instances of torch._ops.OpOverload and not torch._ops.OpOverloadPacket. So for consistency and tracing reasons, fixing convert pass to insert ops which are torch._ops.OpOverload Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D46342822 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103251 Approved by: https://github.com/andrewor14	2023-06-15 04:37:58 +00:00
Piotr Sebastian Kluska	b4056ba744	chore: Update ModelReportObserver variables to buffers (#97971 ) This commit changes ModelReportObserver variables to buffers similar to other observers. This will allow for gathering data on other device than CPU. Moreover, updates InputWeightEqualizationDetector to compute weight stats that are on GPU Tested with running tests `test/quantization/fx/test_model_report_fx.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97971 Approved by: https://github.com/vkuzo	2023-06-15 03:15:41 +00:00
Andrew Or	604a414bfc	[quant][pt2] Fix convert in Conv + BN QAT fusion (#102224 ) Summary: Previously, the test for the convert flow in Conv + BN QAT fusion was not enabled by mistake. However, reenabling this test uncovered several bugs: (1) The replaced nodes returned by subgraph rewriter were not handled correctly. This is because a recent change in the subgraph rewriter (#100556) fixed only the prepare case but not the convert case. This commit brings this fix to the convert case as well and deduplicates some code between the two cases. (2) When folding BN into conv, we used the wrong arg index to get the BN eps value. This resulted in an incorrect conv weight. (3) In FX, we currently do a hack for weighted modules where we observe the weights once in convert in order to ensure we get the right shapes for these weight observers. This caused the numerics to diverge between PT2 and FX. This commit fixes this by skipping this unnecessary hack for `_convert_to_reference_decomposed_fx`. (4) Per channel support was simply missing. This commit adds support for this by matching the quantize_per_channel and dequantize_per_channel ops in addition to the existing ones. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_numerics Reviewed By: jerryzh168 Differential Revision: D46097783 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102224 Approved by: https://github.com/jerryzh168	2023-06-05 18:09:28 +00:00
Kimish Patel	a53acafd2b	[PT2][Quant] Enable dynamic quantization (#102703 ) Enable dynamic quantization of linear layers. Differential Revision: [D46235070](https://our.internmc.facebook.com/intern/diff/D46235070/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102703 Approved by: https://github.com/andrewor14	2023-06-02 17:52:14 +00:00
Jerry Zhang	ce8d31551b	[quant][be] Change return type for zero_point to be int32 Tensor (#102234 ) Summary: This is probably a typo Test Plan: CI Reviewed By: salilsdesai Differential Revision: D46172706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102234 Approved by: https://github.com/salilsdesai	2023-06-01 18:30:44 +00:00
Jerry Zhang	f14ac74fce	[quant][pt2e] Add support for FixedQParamsQuantizationSpec (#102439 ) Summary: This PR adds support for FixedQParamsQuantizationSpec: ``` dataclass(eq=True, frozen=True) class FixedQParamsQuantizationSpec(QuantizationSpecBase): dtype: torch.dtype scale: float zero_point: int quant_min: Optional[int] = None quant_max: Optional[int] = None qscheme: Optional[torch.qscheme] = None ``` This is useful to define quantization spec for operators like sigmoid which has predefined and fixed scale/zero_point Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_fixed_qparams_qspec (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)' ``` Reviewed By: kimishpatel Differential Revision: D46153082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102439 Approved by: https://github.com/kimishpatel	2023-05-30 21:28:13 +00:00
Jerry Zhang	23223402eb	[quant][pt2e] Add Support for DerivedQuantizationSpec (#102282 ) Summary: ``` """ 4. DerivedQuantizationSpec this is the quantization spec for the Tensors whose quantization parameters are derived from other Tensors """ class DerivedQuantizationSpec(QuantizationSpecBase): # specifies which Tensors the quantization parameters are derived from # this can either be an edge from argument to node, or a node derived_from: List[EdgeOrNode] derive_qparams_fn: Callabale[List[ObserverOrFakeQuantize], Tuple[Tensor, Tensor]] ... ``` Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D46097855 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102282 Approved by: https://github.com/andrewor14	2023-05-27 00:24:39 +00:00
Jerry Zhang	ed87508b32	[quant][pt2e] Add support for SharedQuantizationSpec (#102184 ) Summary: This PR adds support for SharedQuantizationSpec, it's used to express the sharing between two Tensors in the prepared graph, the Tensor will either be input of some node (expressed as a Tuple of fx nodes) or output of some node (expressed as an fx Node) Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e' buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Differential Revision: D46043026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102184 Approved by: https://github.com/kimishpatel, https://github.com/leslie-fang-intel	2023-05-25 17:31:59 +00:00
Riley Dulin	424c930f76	Add quantization lowering for nn.PixelShuffle and nn.PixelUnshuffle (#101926 ) Similar to https://github.com/pytorch/pytorch/pull/96160 but for the modules nn.PixelShuffle and nn.PixelUnshuffle. torch.nn.PixelUnshuffle accepts both float and quantized inputs. However, previously we would unnecessarily dequantize quantized inputs into floats before passing them to the function. This commit fixes this by lowering the pattern [dequant - PixelShuffle - quant]. [dequant - PixelUnshuffle - quant]. Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_pixel_shuffle_module python test/test_quantization.py TestQuantizeFxOps.test_pixel_unshuffle_module Pull Request resolved: https://github.com/pytorch/pytorch/pull/101926 Approved by: https://github.com/jerryzh168	2023-05-24 19:33:26 +00:00
Matthew Hoffman	29da75cc55	Enable mypy allow redefinition (#102046 ) Related #101528 I tried to enable this in another PR but it uncovered a bunch of type errors: https://github.com/pytorch/pytorch/actions/runs/4999748262/jobs/8956555243?pr=101528#step:10:1305 The goal of this PR is to fix these errors. --- This PR enables [allow_redefinition = True](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern: > Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition. `allow_redefinition` allows mypy to be more flexible by allowing reassignment to an existing variable with a different type... for instance (from the linked PR): `4a1e9230ba/torch/nn/parallel/data_parallel.py (L213)` A `Sequence[Union[int, torch.device]]` is narrowed to `Sequence[int]` thru reassignment to the same variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102046 Approved by: https://github.com/ezyang	2023-05-24 07:05:30 +00:00
Jerry Zhang	94ed26d177	[quant][pt2e] prepare_pt2e use quantization spec directly (#102054 ) Summary: In this PR we aligned with the design of annotation API and uses quantization spec directly for annotation. main change is in prepare, we consume quantization_spec object directly instead of the observer or fake quant constructor, we create the constructor inside prepare, and annotation api users only need to interact with quantization spec object after this PR Test Plan: ``` buck2 test mode/opt caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Reviewed By: kimishpatel Differential Revision: D45934088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102054 Approved by: https://github.com/kimishpatel	2023-05-23 23:25:56 +00:00
Jerry Zhang	15495f2d96	[quant][pt2e] Introduce QuantizationAnnotation API (#101708 ) Summary: This diff adds QuantizationAnnotation and also refactors the existing annotation to use this object ``` dataclass class QuantizationAnnotation: # How some input nodes should be quantized, expressed as QuantizationSpec # a map from torch.fx.Node to QuantizationSpec input_qspec_map: Dict[Node, QuantizationSpec] # How the output of this node is quantized, expressed as QuantizationSPec output_qspec: QuantizationSpec class QuantizationSpec: dtype: torch.dtype is_dynamic: bool = False quant_min: Optional[int] = None quant_max: Optional[int] = None qscheme: Optional[torch.qscheme] = None ch_axis: Optional[int] = None # TODO: follow up PR will add this # Kind of observer such as MinMaxObserver, PerChannelHistogramObserver etc. # observer_or_fake_quant_type: Union[ObserverBase, FakeQuantizeBase] ``` Example after full refactor: ``` int8_qspec = QuantizationSpec(dtype=torch.int8, ...) weight_qspec = QuantizationSpec(dtype=torch.int8, ...) conv_node["quantization_annotation"] = QuantizationAnnotation( input_qspec_map={input_node: int8_qspec, weight_node: weight_qspec} output_qspec=int8_qspec, ) ``` Note: right now input_qspec_map and output_qspec map are still using observer and fake quant constructors. Follow up PR: change the input_qspec_map and output_qspec to use QuantizationSpec directly Test Plan: ``` buck2 test mode/optcaffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' ``` Differential Revision: D45895027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101708 Approved by: https://github.com/andrewor14	2023-05-19 22:54:27 +00:00
Nitin Jain	556bb691fd	[AO]Fix observed LSTM layer setup individually observed LSTM (#101299 ) Summary: We have found that `_get_lstm_with_individually_observed_parts()` is missing setup step which sets up the LSTM layer state initializing weights and biases of this layer. This diff fixes the observed numerical discrepancy seen by CTRL team in using the above API. Test Plan: N3358643 Differential Revision: D45821681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101299 Approved by: https://github.com/andrewor14	2023-05-18 19:15:01 +00:00
Jerry Zhang	058d740f59	[reland][quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 ) (#101041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005 Previously the node annotation looks like the following: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr": ..., "weight_obs_or_fq_ctr": ..., "weight_index": 1, } ``` Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...}, } ``` This can support specifying the observer/fake quant constructor for any argument of the node Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' Differential Revision: D45719781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101041 Approved by: https://github.com/andrewor14	2023-05-10 17:43:21 +00:00
PyTorch MergeBot	2241aaa60c	Revert "[quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 )" This reverts commit `f08ddae888`. Reverted https://github.com/pytorch/pytorch/pull/101005 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/101005#issuecomment-1541143426))	2023-05-10 01:27:47 +00:00
Jerry Zhang	f08ddae888	[quant][pt2e] Change input act annotation to a map and allow dynamic quantization for non zeroth argument (#101005 ) Summary: Previously the node annotation looks like the following: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr": ..., "weight_obs_or_fq_ctr": ..., "weight_index": 1, } ``` Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to: ``` node.meta["..."] = { "input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...}, } ``` This can support specifying the observer/fake quant constructor for any argument of the node Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)' Reviewed By: kimishpatel Differential Revision: D45553195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005 Approved by: https://github.com/kimishpatel	2023-05-10 00:42:25 +00:00
Aaron Gokaslan	8769fb854d	[BE] Fix flake8 B027 errors - missing abstractmethod decorator (#100715 ) Enables B027 and applies fixes by adding abstract method decorators. Autofix generated by ruff master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100715 Approved by: https://github.com/ezyang	2023-05-09 17:28:48 +00:00
Jerry Zhang	df3455b716	[reland][quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 ) (#99767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220 Previously we have two places we need to decide whether to insert observer or fake quantizer or not: (1) input arguments of a node (2) output of a node, and right now we have separate code to do this in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output and target_dtype and is_dynamic for the current Tensor we are looking at let's use an example for conv node: ``` conv = convolution(input, weight, bias, ...) ``` let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph (1) input arguments, e.g. `input` the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from input_node.meta["target_dtype_info"]["output_act_obs_or_fq"] the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"] similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc. (2) output for conv node the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit after we decide to deprecate fx graph mode quantization the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"] there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Imported from OSS Differential Revision: D45198323 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767 Approved by: https://github.com/kimishpatel	2023-04-25 16:53:02 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
PyTorch MergeBot	75e754800f	Revert "[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 )" This reverts commit `d56adb1b54`. Reverted https://github.com/pytorch/pytorch/pull/99220 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2023-04-21 18:04:21 +00:00
Jerry Zhang	d56adb1b54	[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220 ) Summary: Previously we have two places we need to decide whether to insert observer or fake quantizer or not: (1) input arguments of a node (2) output of a node, and right now we have separate code to do this in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output and target_dtype and is_dynamic for the current Tensor we are looking at let's use an example for conv node: ``` conv = convolution(input, weight, bias, ...) ``` let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph (1) input arguments, e.g. `input` the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from input_node.meta["target_dtype_info"]["output_act_obs_or_fq"] the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"] similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc. (2) output for conv node the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit after we decide to deprecate fx graph mode quantization the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"] there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220 Approved by: https://github.com/kimishpatel	2023-04-21 16:58:35 +00:00
Nikita Shulga	8a89eec2f8	[BE] Do not use unicode quotes (#99446 ) They are mostly used in commented code examples, but even Python-3.12 does not recognize `“foobar”` as valid string literal I.e. just `s/[“”]/"/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99446 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-04-18 22:59:56 +00:00
Kimish Patel	cdab6c8df9	[PT2E][Quant] Support specifying None for obs_or_fq_ctr in target_dtype_info (#99071 ) It is cleaner for quantizer to say what does not need observation instead of putting fp32 observers. This diff add support for that by checking if target_dtype_info contains none for specific observers and if so skip inserting observers for those. Differential Revision: [D44971357](https://our.internmc.facebook.com/intern/diff/D44971357/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99071 Approved by: https://github.com/jerryzh168	2023-04-17 16:37:16 +00:00
Jerry Zhang	6a568779b6	[quant][pt2e][improvement] Remove the need to annotate all nodes with default annotation (#99001 ) Summary: This PR changes prepare to use some default observer/fq constructor when "target_dtype_info" is not set, this allows user to not initialize all nodes to default observer/fq constructor. Note we may still need to annotate intermediate node after this PR, there will be a follow up PR to allow users to only annotate things they want to quantize Test Plan: python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/99001 Approved by: https://github.com/kimishpatel, https://github.com/andrewor14	2023-04-13 09:31:51 +00:00
Kazuaki Ishizaki	a13a63ae9a	Fix typos under torch/ao directory (#97679 ) This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-04-10 22:25:15 +00:00
Jerry Zhang	3142ce208f	[quant][pt2e] Support quantizer API in prepare_pt2e_quantizer (#97994 ) Summary: This PR added a quantizer API to prepare_pt2e_quantizer, which enables user to annotate the nodes in the graph directly to configure quantization, instead of relying on QConfigMapping, please see test cases in test_quantize_pt2e.py for examples. Also added a prototype for QNNPackQuantizer, that will be modified later to fully support different quantization capabilities of QNNPack/XNNPack The goal for introducing quantizer is to add flexibility to the quantization API to allow modeling users and backend developers to express their quantization intentions programmably, which will free architecture optimization team from supporting different use cases in the core API in the future, as a concrete example, we used to have https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping as the API for users to express their intent for quantization in fx graph mode quantization, and it has some fancy options like `set_module_name_regex` and `set_module_name_object_type_order`, this is not needed for all backends and adds burden of maintenance to AO team, in the quantizer API we will move these APIs to a backend specific `Quantizer` that needs this feature, and all the backends or even advanced modeling users can implement their own quantizer to express their intent for quantization through annotating the nodes, for example, to express the quantization intention of quantizing a convolution node, a user will find the convolution node in the graph and do: ``` operator_spec = qnnpack_quantizer.get_default_per_channel_symmetric_qnnpack_operator_spec() conv_node.meta["target_dtype_info"] = { "input_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec), "weight_obs_or_fq_ctr": _get_weight_obs_or_fq_ctr(operator_spec) "bias_obs_or_fq_ctr": _get_bias_obs_or_fq_ctr(operator_spec), "output_act_obs_or_fq_ctr": _get_act_obs_or_fq_ctr(operator_spec), # TODO: validation of weight_index must be set if weight_obs_or_fq_ctr is set "weight_index": 1, # TODO: validation of bias_index must be set if bias_obs_or_fq_ctr is set "bias_index": 2, } ``` each backend will introduce their own quantizer, e.g. QNNPackQuantizer, which may expose more convenient APIs for modeling users to configure the annotation, and different quantizer can compose with each other to annotate the graph correctly for quantization. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_simple_quantizer python test/test_quantization.py TestQuantizePT2E.test_qnnpack_quantizer_conv Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97994 Approved by: https://github.com/vkuzo	2023-04-06 11:34:10 +00:00
Jerry Zhang	a76114832a	[quant][pt2e][fix] Fix the internal test failures caused by refactor (#98378 ) Summary: att, this PR removes some incorrect assumptions from `_maybe_insert_observers_before_graph_output` Test Plan: internal test Differential Revision: D44697212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98378 Approved by: https://github.com/andrewor14	2023-04-05 23:27:34 +00:00

1 2 3 4 5 ...

436 Commits