pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Maggie Moss	b13cd141b3	Add pyrefly suppressions (#164748 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the `project-excludes` field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: 0 errors (4,263 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748 Approved by: https://github.com/oulgen	2025-10-07 17:31:18 +00:00
PyTorch MergeBot	5d7360bb03	Revert "Enable all SIM rules except disabled ones (#164645 )" This reverts commit `321e602692`. Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))	2025-10-05 19:32:21 +00:00
Yuanyuan Chen	321e602692	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang	2025-10-05 07:38:25 +00:00
Naveen Suda	afdd4247a2	[torchao][pt2e] Make prepare and convert faster by caching (#162550 ) Summary: D79674759 tried to fix the expensive prepare and convert steps, as `assert_and_get_unique_device` was called multiple times. This change fixes that issue by using `functools.cache` decorator. Test Plan: Verified on llm export to QNN. LLM Quantization prepare time of ~20min reduced to ~3min. Rollback Plan: Differential Revision: D82073679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162550 Approved by: https://github.com/andrewor14	2025-09-11 07:59:22 +00:00
andrewor14	f4c33cd44a	[pt2e] Avoid getting model device once per node (#159901 ) Summary: Previously, we call `assert_and_get_unqiue_device` once per node in both prepare and convert. This is expensive and unnecessary since the model device is the same across all nodes, so we should just call this once in the beginning and reuse the same model device across all the nodes. Test Plan: python test/test_quantization.py -k TestQuantizePT2E Pull Request resolved: https://github.com/pytorch/pytorch/pull/159901 Approved by: https://github.com/jerryzh168	2025-09-03 19:29:00 +00:00
Xuehai Pan	f8293116f5	[BE][13/16] fix typos in torch/ (torch/ao/) (#156603 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156603 Approved by: https://github.com/msaroufim	2025-06-29 04:34:04 +00:00
Xuehai Pan	279cae52e7	[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format` (#148185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148185 Approved by: https://github.com/ezyang	2025-06-14 16:47:04 +00:00
Aaron Orenstein	9e0437a04a	PEP585 update - torch/ao/quantization (#145140 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145140 Approved by: https://github.com/bobrenjc93	2025-01-19 10:20:00 +00:00
bobrenjc93	a55977f763	Migrate from Tuple -> tuple in torch/ao (#144265 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144265 Approved by: https://github.com/aorenste	2025-01-10 00:12:06 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
PyTorch MergeBot	945bf78894	Revert "[BE] typing for decorators - fx/_compatibility (#131568 )" This reverts commit `193f62fde9`. Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
Aaron Orenstein	193f62fde9	[BE] typing for decorators - fx/_compatibility (#131568 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568 Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519	2024-07-25 22:24:19 +00:00
Xuehai Pan	2ce734cee9	[BE] enable UFMT for `torch/ao/quantization/` (#128863 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863 Approved by: https://github.com/ezyang ghstack dependencies: #128861, #128862	2024-07-25 04:17:54 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	8b08b0f340	[BE] enable ruff rule `Q` from flake8-quotes (#127713 ) Enable [ruff rule `Q`](https://docs.astral.sh/ruff/rules/#flake8-quotes-q) from flake8-quotes. Fixes: - [avoidable-escaped-quote (Q003)](https://docs.astral.sh/ruff/rules/avoidable-escaped-quote/#avoidable-escaped-quote-q003) - [unnecessary-escaped-quote (Q004)](https://docs.astral.sh/ruff/rules/unnecessary-escaped-quote/#unnecessary-escaped-quote-q004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127713 Approved by: https://github.com/ezyang	2024-06-02 23:25:26 +00:00
PyTorch MergeBot	7ffa5558ee	Revert "[FX] Update type hints in `torch.fx._compatibility.py` (#125469 )" This reverts commit `235b4d6ec2`. Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))	2024-05-06 18:36:43 +00:00
Xuehai Pan	235b4d6ec2	[FX] Update type hints in `torch.fx._compatibility.py` (#125469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469 Approved by: https://github.com/Skylion007 ghstack dependencies: #125468	2024-05-05 19:30:22 +00:00
andrewor14	85b28ffc3a	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 22:01:50 +00:00
PyTorch MergeBot	e739a2d59e	Revert "[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 )" This reverts commit `4efb28c900`. Reverted https://github.com/pytorch/pytorch/pull/123957 on behalf of https://github.com/jeanschmidt due to reverting to check if it will fix rocm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/123957#issuecomment-2075158146))	2024-04-24 15:02:11 +00:00
andrewor14	4efb28c900	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 01:02:59 +00:00
Aaron Gokaslan	c5fafe9f48	[BE]: TRY002 - Ban raising vanilla exceptions (#124570 ) Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR. I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570 Approved by: https://github.com/ezyang	2024-04-21 22:26:40 +00:00
Snabel Kabiya	c21320b3b1	CPU Publish: Fix Assign device error, when module has multiple devices (#109149 ) (#113509 ) Summary: new version of this: https://www.internalfb.com/diff/D49110166?dst_version_fbid=252052334533986 Fix Assign device error, when module has multiple devices If fc_fp16_quantization enabled for CPU model. And module REMOTE_OTHER has multiple devices: {device(type='meta'), device(type='cpu')} We fail on this assertion: fbcode/caffe2/torch/ao/quantization/fx/utils.py 232 assert len(devices) <= 1, ( Since CPU models work on CPU devices, added a condition before the assertion. In case, we have CPU in module list of devices. Set device as CPU. Please see debug details: https://docs.google.com/document/d/1pMPCeJyMPA15NhFc2uAyNDkS9azR40uaNyOP0DIgHjU/edit Test Plan: AIMP_DISAGG_CPU=true buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true lego/scripts:lego_cli -- run-locally --model_entity_id 959168967 --config_version 28 --publish_context OFFLINE_PUBLISH --lego_pipeline aiplatform.modelstore.model_generation.lego.lego_pipeline_builder.gmpp_lego_pipeline --gmpp_config '{"gmpp_pipeline_descriptor": "aiplatform.modelstore.model_generation.v1.ads_pipelines.aimp_pyper_pipeline.model_generation_pipeline", "worker_process_number":12, "worker_thread_per_process_number": 6, "use_work_assignment": true}' 2>&1 \| tee /tmp/gmpp_lc.txt Snapshot: https://www.internalfb.com/manifold/explorer/ads_storage_fblearner/tree/user/facebook/fblearner/predictor/959168967/47 Differential Revision: D51226114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113509 Approved by: https://github.com/jerryzh168	2023-11-14 06:15:32 +00:00
Justin Chu	c0d8a4af0a	[BE] Enable ruff's UP rules and autoformat ao/ (#105430 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430 Approved by: https://github.com/albanD, https://github.com/malfet	2023-07-19 13:44:37 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Kazuaki Ishizaki	a13a63ae9a	Fix typos under torch/ao directory (#97679 ) This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-04-10 22:25:15 +00:00
Jerry Zhang	b109083098	[quant][pt2e][refactor] Remove `backend_config` from `_maybe_insert_input_observers_for_node` (#98094 ) Summary: The goal is to remove the need to use backend_config when pt2e flow code call this function Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/98094 Approved by: https://github.com/jcaip	2023-04-04 03:18:24 +00:00
Xia, Weiwen	e61b842001	[Quant][FX] lower functional conv_transpose ops (#97126 ) Summary Support quantizing and lowering functional `conv_transpose1d`, `conv_transpose2d` and `conv_transpose3d`. Please note that - `conv_tranpose + relu` fusion is not supported. Remember to keep `relu` node in graph when lowering. - `conv_tranpose` requires `per-tensor` scheme for weight. Use default `qconfig_mappings` instead of deprecated `qconfig_dict` for test cases. Test plan python test/test_quantization.py -k test_conv_transpose_not_reference python test/test_quantization.py -k test_conv_transpose_reference python test/test_quantization.py -k test_conv_transpose_relu_not_reference python test/test_quantization.py -k test_conv_transpose_relu_reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/97126 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-03-31 07:17:29 +00:00
Kevin Zheng (FRL)	f1dbfe2f2a	[ao][fx] Enable observed -> quantized float for static quantized MultiheadAttention (#95636 ) Test Plan: Sandcastle cc andrewor14 any suggestions here? Differential Revision: D43631794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95636 Approved by: https://github.com/andrewor14	2023-02-28 20:50:19 +00:00
Jerry Zhang	2394e6baa9	[quant][fx] Change prepare_fx and convert_fx to preserve the GraphModule type of input (#94412 ) Summary: Previously prepare_fx returns an ObservedGraphModule and convert_fx returns a QuantizedGraphModule, this is to preserve the attributes since torch.fx.GraphModule did not preserve them, after https://github.com/pytorch/pytorch/pull/92062 we are preserving `model.meta`, so we can store the attributes in model.meta now to preserve them. With this, we don't need to create a new type of GraphModule in these functions and can use GraphModule directly, this is useful for quantization in pytorch 2.0 flow, if other transformations are using GraphModule as well, the quantization passes will be composable with them Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels python test/test_quantization.py TestQuantizePT2E Imported from OSS Differential Revision: D42979722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94412 Approved by: https://github.com/vkuzo	2023-02-09 23:03:23 +00:00
HDCharles	a01c1ee594	[ao] making _is_activation_post_process private with BC (#90554 ) same function in observer and quantize, consolidated to a single function note: this is a recreation of D40709276 which caused severa breakages due to not maintaining BC for models with cached code with calls to the old function name Differential Revision: [D41793604](https://our.internmc.facebook.com/intern/diff/D41793604/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41793604/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/90554 Approved by: https://github.com/jcaip	2022-12-16 08:09:33 +00:00
andrewor14	691a44f403	[Quant][fx][bc-breaking] Add simpler BackendConfig pattern format (#90698 ) Summary: The existing BackendConfig fusion pattern uses a "reversed nested tuple" format that is highly unintuitive. For example, ``` linear-relu -> (nn.ReLU, nn.Linear) conv-bn-relu -> (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d)) ``` This pattern format also complicates the signatures of the user specified "fuser methods", which needed to accept arguments in reverse nested order to match the patterns: ``` def fuse_linear_relu(is_qat, relu, linear): ... def fuse_conv_bn_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv ... ``` Instead, this commit introduces a new pattern format that simply specifies the ops in forward order with no nesting: ``` linear-relu -> (nn.Linear, nn.ReLU) conv-bn-relu -> (nn.Conv2d, nn.BatchNorm2d, nn.ReLU) def fuse_linear_relu(is_qat, linear, relu): ... def fuse_conv_bn_relu(is_qat, conv, bn, relu): ... ``` Note that the legacy "reversed nested tuple" is still used internally since it is more general. In the future, we should replace it with the format used in the subgraph rewriter in `torch.fx`, and simplify the existing pattern matching code to handle the new format added in this commit. BC-breaking Notes: Before: ``` import torch as nn import torch.ao.nn.intrinsic as nni from torch.ao.quantization.backend_config import BackendPatternConfig def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` After: ``` def fuse_linear_relu(is_qat, conv, bn, relu): return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) ``` OR (for backward-compatibility) ``` def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = BackendPatternConfig() \ ._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) \ .set_dtype_configs(...) \ .set_fuser_method(fuse_conv_bn_relu) \ .set_fused_module(nni.ConvBnReLU2d) \ ._set_use_legacy_pattern_format(True) ``` Before: ``` backend_config.configs # returns Dict[Pattern, BackendPatternConfig] ``` After: ``` backend_config.configs # returns List[BackendPatternConfig] ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestBackendConfig Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D41954553](https://our.internmc.facebook.com/intern/diff/D41954553) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90698 Approved by: https://github.com/vkuzo, https://github.com/jerryzh168	2022-12-14 22:44:29 +00:00
andrewor14	13fcc412be	[Quant][fx][bc-breaking] Remove unused functions in fx/utils.py (#90025 ) Summary and BC-breaking notes: This commit removes the following unused functions from both the `torch.quantization` and the `torch.ao.quantization` namespaces: ``` graph_pretty_str get_per_tensor_qparams quantize_node get_qconv_op create_qparam_nodes node_return_type_is_int is_get_tensor_info_node ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestAOMigrationQuantizationFx Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/90025 Approved by: https://github.com/HDCharles	2022-12-07 01:31:28 +00:00
andrewor14	d80056312a	[Quant][fx][bc-breaking] Rename fx/patterns.py (#89872 ) Summary: This commit renames fx/quantization_patterns.py to fx/quantize_handler.py, and fx/fusion_patterns.py to fx/fuse_handler.py. This is because these files contain only QuantizeHandler and FuseHandler respectively, so the new names are more descriptive. A future commit will further break BC by removing all the empty QuantizeHandler classes. BC-breaking notes: The following classes under the `torch.ao.quantization.fx.quantization_patterns` namespace are migrated to the `torch.ao.quantization.fx.quantize_handler` namespace: ``` QuantizeHandler BinaryOpQuantizeHandler CatQuantizeHandler ConvReluQuantizeHandler LinearReLUQuantizeHandler BatchNormQuantizeHandler EmbeddingQuantizeHandler RNNDynamicQuantizeHandler DefaultNodeQuantizeHandler FixedQParamsOpQuantizeHandler CopyNodeQuantizeHandler GeneralTensorShapeOpQuantizeHandler CustomModuleQuantizeHandler StandaloneModuleQuantizeHandler ``` The following classes under the `torch.ao.quantization.fx.fusion_patterns` namespace are migrated to the `torch.ao.quantization.fx.fuse_handler` namespace: ``` DefaultFuseHandler FuseHandler ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/89872 Approved by: https://github.com/jerryzh168	2022-12-01 17:37:07 +00:00
Jerry Zhang	95474e00a9	[quant][be] Remove unused util code (#89272 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272 Approved by: https://github.com/andrewor14	2022-11-23 18:27:41 +00:00
PyTorch MergeBot	9d209e7834	Revert "[ao] making _is_activation_post_process private (#87520 )" This reverts commit `45c62a3377`. Reverted https://github.com/pytorch/pytorch/pull/87520 on behalf of https://github.com/bigfootjon due to Diff reverted internally	2022-11-21 16:48:26 +00:00
HDCharles	45c62a3377	[ao] making _is_activation_post_process private (#87520 ) Summary: same function in observer and quantize, consolidated to a single function. Note the definitions were slightly different, I've changed the definition to be maximally inclusive so that the name of the function is more accurate Test Plan: python test/test_public_bindings.py python test/test_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520 Approved by: https://github.com/jcaip	2022-11-16 21:31:57 +00:00
andrewor14	61801799a0	[Quant][bc-breaking] Remove overwrite_output_observer (#88620 ) Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, https://github.com/pytorch/pytorch/pull/80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88620 Approved by: https://github.com/jerryzh168	2022-11-16 18:44:12 +00:00
Jiaxu Zhu	2cd05a2818	Support torch.qint32 in Convert (#88871 ) Enable the `torch.qint32` when creating `quantize_per_tensor` function call in `convert_fx` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88871 Approved by: https://github.com/jerryzh168	2022-11-12 01:20:52 +00:00
Jerry Zhang	2e1199d171	[quant][fx] Fix a typo in utils.py (#88024 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88024 Approved by: https://github.com/HDCharles, https://github.com/z-a-f	2022-10-31 17:31:58 +00:00
Jerry Zhang	195a13f48c	[quant][be] Remove unused function `quantize_node` (#87153 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87153 Approved by: https://github.com/andrewor14	2022-10-27 01:50:00 +00:00
Jerry Zhang	0e3b5ea026	[quant][fx] Add _convert_to_reference_decomposed (#87094 ) Summary: _convert_to_reference_decomposed is a private convert function in fx graph mode quantization flow to convert a calibrated/trained model to a reference quantized model with decomposed quantized tensor representations. Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87094 Approved by: https://github.com/andrewor14	2022-10-27 01:22:08 +00:00
andrewor14	24fc680ee4	[Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863 ) Summary: This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config Reviewers: jerryzh168, vkuzo Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168	2022-09-30 22:53:38 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
andrewor14	4ca125a9e1	[Quant][fx] Add quant and scale ranges to BackendConfig (#85200 ) Summary: This commit adds the following constraints to BackendConfig: quant_min_lower_bound quant_max_upper_bound scale_min_lower_bound scale_max_upper_bound This is motivated by QNNPACK constraints on qint8 weight values and the min scale value. Actually enforcing these constraints in the QNNPACK BackendConfig will follow in a future commit. Today, users can also specify the above constraints through QConfigs, and these settings may not necessarily match the ones specified in the BackendConfig. In this case, we will handle the discrepancy as follows: (1) Require QConfig quant ranges to fall within the backend's (2) Require QConfig min scale value (eps) >= backend's (3) Require QConfig to specify quant range if the backend specified one (4) Require QConfig to specify min scale value (eps) if the backend specified one Public API changes: * Previous API, still supported after this commit: ``` dtype_config = DTypeConfig( input_dtype=torch.quint8, output_dtype=torch.quint8, weight_dtype=torch.qint8, bias_dtype=torch.float, ) ``` * New API: ``` dtype_config = DTypeConfig( input_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), output_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), weight_dtype=DTypeWithConstraints( dtype=torch.qint8, quant_min_lower_bound=-128, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), bias_dtype=torch.float, ) ``` * Additionally, the following `DTypeConfig` attributes have new types with helper getters: ``` # These have type DTypeWithConstraints dtype_config.input_dtype dtype_config.output_dtype dtype_config.weight_dtype # These return Optional[torch.dtype] dtype_config.get_input_dtype() dtype_config.get_output_dtype() dtype_config.get_weight_dtype() ``` Note that scale_max is currently not used because there is no existing mechanism to enforce this on the observer. In the future, we can validate this as well if there is a use case. Test Plan: python test/test_quantization.py TestBackendConfig.test_dtype_with_constraints python test/test_quantization.py TestQuantizeFx.test_backend_config_scale_min python test/test_quantization.py TestQuantizeFx.test_backend_config_quantization_range Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200 Approved by: https://github.com/jerryzh168	2022-09-28 00:33:29 +00:00
andrewor14	034f2b4d23	[Quant][fx] Enable FX static quantization for LSTM (#85068 ) Summary: This commit enables the custom module LSTM path for FX graph mode static quantization. This has the same flow as eager mode, which was already previously supported: ``` torch.nn.LSTM \| (prepare_fx) v torch.ao.nn.quantizable.LSTM \| (convert_fx) v torch.ao.nn.quantized.LSTM ``` The main reason why custom module LSTM is not supported in FX graph mode quantization today is because its inputs and outputs are nested tuples, and existing constructs such as observers, "quantize" nodes, and "dequantize" nodes do not understand how to handle complex structures. Note that the approach taken in this commit is only intended to be a short-term solution highly tailored to the input and output formats of custom module LSTM. In the future, for the longer-term solution, we should design a more general QConfig that allows users to specify complex input and output formats, and enable FX graph mode quantization to understand arbitrary nested structures and automatically infer how to transform the graph accordingly. Context: Today, in FX graph mode static quantization, custom modules are assumed to have quantized inputs and quantized outputs, with the exact dtypes derived from the associated QConfig (default quint8). Since custom modules are currently not handled through the reference model flow, their observer replacement logic are a little different from normal operators: ``` # (1) Original model input -> custom_module -> output # (2) Observed model (after prepare) input -> obs0 -> custom_module -> obs1 -> output # (3) Quantized model (after convert) input -> quant -> quantized_custom_module -> dequant -> output ``` In the last step, input observers are replaced with "quantize" and output observers are replaced with "dequantize", in contrast to other non-custom-module patterns where observers are replaced with "quantize-dequantize" pairs instead. Note that, conceptually, the output observer `obs1` is really just a DeQuantStub, since no observation is actually needed. Custom module LSTM: The reason why custom module LSTM cannot be handled in the same way is because, unlike other custom modules, its inputs and outputs are nested tuples instead of single tensors. This is how the existing custom module code would try to handle LSTMs: ``` # (1) Original model # input format: (input, (hidden0, hidden1)) # output format: (output, (hidden0, hidden1)) input -> lstm -> output hidden0 -/ \-> hidden0 hidden1 -/ \-> hidden1 # (2) Observed model (after prepare) input -> obs0 -> lstm -> obs1 # fails hidden0 -/ # missing observer hidden1 -/ # missing observer ``` However, this fails today because 1) we assume there is only one input to the custom module, and so we never end up quantizing `hidden0` and `hidden1`, and 2) the output observer `obs1` is fed a tuple, which it does not understand how to handle. Short-term fix: This commit addresses the above by specifically handling the input and output structures used by custom module LSTM. For the inputs, we manually insert observers for `hidden0` and `hidden1` to ensure all input tensors are quantized. For the outputs, we split the tuple into its internal nodes, attach a DeQuantStub to each node, and recombine these DeQuantStubs according to the original structure. Finally, we must also reroute consumers of the original LSTM tuple (and its internal nodes, e.g. `lstm[0]`) to these DeQuantStubs: ``` # (1) Original model input -> lstm -> output -> linear0 hidden0 -/ \-> hidden0 -> linear1 hidden1 -/ \-> hidden1 -> linear2 # (2) Observed model (after prepare) input -> obs0 -> lstm -> output -> dqstub -> linear0 -> obs3 hidden0 -> obs1 -/ \-> hidden0 -> dqstub -> linear1 -> obs4 hidden1 -> obs2 -/ \-> hidden1 -> dqstub -> linear2 -> obs5 # (3) Reference model (after convert) input -> quant -> qlstm -> output -> dequant -> linear0 -> quant -> dequant hidden0 -> quant -/ \-> hidden0 -> dequant -> linear1 -> quant -> dequant hidden1 -> quant -/ \-> hidden1 -> dequant -> linear2 -> quant -> dequant # (4) Quantized model (after lowering) input -> quant -> qlstm -> output -> quantized_linear0 -> dequant hidden0 -> quant -/ \-> hidden0 -> quantized_linear1 -> dequant hidden1 -> quant -/ \-> hidden1 -> quantized_linear2 -> dequant ``` Note that we choose to insert DeQuantStubs here instead of observers because these will ultimately be replaced by "dequantize" nodes. This matches the general custom module behavior, where output observers are replaced only with "dequantize" nodes (as opposed to the normal "quantize-dequantize" pair), since custom module outputs are assumed to already be quantized. Using DeQuantStubs instead of observers also simplifies the "dequantize" insertion logic. In the future, we should use DeQuantStubs in place of output observers for custom modules in general. Test plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm python test/test_quantization.py TestQuantizeFx.test_static_lstm_consume_tuple Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85068 Approved by: https://github.com/jerryzh168	2022-09-23 13:53:39 +00:00
Vasiliy Kuznetsov	09965957cd	quantization: align observer dtype with reference model spec (#85345 ) Summary: Before this PR, the `dtype` attribute of observers was not clearly defined. It originally meant `interface_dtype` in the eager mode workflow, which is how the codebase before this PR is using it. In the new reference model spec, `dtype` attribute of an observer represents the `dtype` value which needs to be passed into a `quantize` function in the reference model spec. This PR aligns the codebase to this definition of dtype. In detail: 1. change util functions to interpret `dtype` using the reference model definition 2. change `prepare` to interpret `dtype` using the reference model definition 3. change observers for dynamic quantization to interpret `dtype` using the reference model definition. A future PR (left out of this one to keep LOC small) will deprecate the `compute_dtype` field and instead expose `is_dynamic` on observers. " Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345 Approved by: https://github.com/z-a-f, https://github.com/jerryzh168	2022-09-21 06:34:26 +00:00
Jesse Cai	d144594512	[Quant][fx] Remove WEIGHT_INDEX_DICT and BIAS_INDEX_DICT (Part 2) (#83853 ) Summary: - Finishes the second part of https://github.com/pytorch/pytorch/pull/83263 - Removes WEIGHT_INDEX_DICT and BIAS_INDEX_DICT from utils.py - Moves two funcitons, `node_arg_is_weight` and `node_arg_is_bias` into utils.py from prepare.py convert.py and _equalize.py now use node_arg_is_weight instead of the dictionaries - Adds in quantization support for `F.groupnorm`. Add in missing BackendPatternConfigs for layernorm, instancenorm, and groupnorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2b157e0dc4f1553be1f4813b4693db952e6fc558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83848 Fixes #83093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83853 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14	2022-08-29 18:08:36 +00:00
PyTorch MergeBot	caee732aa1	Revert "[quant][fx] Support keyword arguments for functional linear (#79095 )" This reverts commit `d71fb40d98`. Reverted https://github.com/pytorch/pytorch/pull/79095 on behalf of https://github.com/jerryzh168 due to broken master	2022-07-09 21:45:01 +00:00
Jerry Zhang	d71fb40d98	[quant][fx] Support keyword arguments for functional linear (#79095 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/78117 Fixes: https://github.com/pytorch/pytorch/issues/73463 This PR adds a normalization pass that normalizes all the args to keyword args in positional order and fixes lowering code that previously only uses node.args to use both args and kwargs instead. Also tried to add a test for F.conv2d, but since conv2d matches multiple schemas we are doing an extra schema match, and because we are using symbolic values in `transform`, we don't have a schema match, so F.conv2d still fails with runtime errors. we can resolve this issue later when there is a need. Another thing I'm considering is to do the normalization with real inputs instead of symbolic inputs and not rely on operator_schemas (which is based on torchscript), and rely on inspect.signature, I tried this briefly but didn't get too far, it looks like we cannot get the python signature for `torch._C._nn.linear`, it might be possible to fix as well, but will need follow up discussions. The goal for this PR is just to introduce normalization in our codebase so that we can adapt some downstream code to this, and also fix the F.linear issue. Test Plan: python test/test_quantization.py TestQuantizeFx.test_normalize_args Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37163228](https://our.internmc.facebook.com/intern/diff/D37163228) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79095 Approved by: https://github.com/andrewor14	2022-07-09 20:01:09 +00:00

1 2

63 Commits