pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Scott Wolchok	e2b94923ba	[PyTorch] Speed up decomposed quantize_per_channel (#133029 ) Similar to D60871396 (#132828). Differential Revision: [D60978385](https://our.internmc.facebook.com/intern/diff/D60978385/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133029 Approved by: https://github.com/cccclai	2024-08-08 23:48:34 +00:00
Scott Wolchok	eeb6ad0744	[quant] Speed up dequantize_per_channel (#132828 ) Tensor-wise operations are much faster than looping over tensor elements. Rewrite loop in dequantize_per_channel to use whole-Tensor operations accordingly. Differential Revision: [D60871396](https://our.internmc.facebook.com/intern/diff/D60871396/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132828 Approved by: https://github.com/cccclai	2024-08-08 16:44:41 +00:00
Shangdi Yu	825002c9c6	[export][fx] More robust DCE pass (#132764 ) Summary: - make default DCE pass check schema, - need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added). - mark Proxy dump as NotImplemented for better error msg - Remove Proxy from tensors when dumping models, as Proxy cannot be dumped. More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing. Test Plan: CI ``` - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d - test_export.py - buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export - buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Reviewed By: angelayi Differential Revision: D60319175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764 Approved by: https://github.com/angelayi	2024-08-06 22:27:22 +00:00
Max Ren	81a5a7a30a	[Quantizer] Fix getattr for quantizing constants (#132705 ) Mobilebert quantization was failing because there were embedding constants that could not be accessed through getattr(). It seems that we have to search the submodule for the embeddings. Which we do here. This is just to help get around looking at unlifted attrs to check if they are large scalars Differential Revision: [D60492338](https://our.internmc.facebook.com/intern/diff/D60492338/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132705 Approved by: https://github.com/jerryzh168 ghstack dependencies: #132704	2024-08-06 18:16:27 +00:00
Max Ren	1de4ebc85d	[Quantizer] Fix Maxpool2d share q params (#132704 ) There seems to be a bug in the code for sharing q params for maxpool2d. This case occurs when output_node = maxpool_node. When this happens we overwrite the node's "quantization_annotation" metadata. This fix ensures that qparams are indeed shared across input and output Differential Revision: [D60492341](https://our.internmc.facebook.com/intern/diff/D60492341/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132704 Approved by: https://github.com/jerryzh168	2024-08-06 18:13:16 +00:00
andrewor14	fc7849b93f	[pt2e][quant] Ensure BN node is erased after convert (#131651 ) Summary: Previously, when folding BN into conv, we rely on DCE to clean up the unused BN node from the graph. This works if the model is already in eval mode, but fails if the model is still in train mode because DCE doesn't remove nodes with potential side effects (in this case `_native_batch_norm_legit`). This required users to move the model to eval mode before calling convert in order to get a properly DCE'd graph. To solve this, we manually erase the BN node after folding instead of relying on DCE. This relaxes the ordering constraints between `move_exported_model_to_eval` and `convert_pt2e`. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node Reviewers: jerryzh168, yushangdi Subscribers: jerryzh168, yushangdi, supriyar Differential Revision: [D60520149](https://our.internmc.facebook.com/intern/diff/D60520149) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651 Approved by: https://github.com/yushangdi, https://github.com/leslie-fang-intel	2024-08-06 16:37:39 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Max Ren	d95756f6a5	[Quantizer][Add] Fix add annotation with constant (#132092 ) Summary: Occaisonally we run into a partition that looks like this for Add: ``` SourcePartition(nodes=[_constant2, add_2], source=<built-in function add>, input_nodes=[x], output_nodes=[_constant2, add_2], params=[_constant2]) ``` In this case we are adding a constant to an input, and reusing the constant later down the line. This causes our constant to be an output in our SourcePartition. The assumption then that: ``` add_node = add_partition.output_nodes[0] ``` Will not necessarily hold. As a result we must check that the output node is indeed a call function and not a constant. Test Plan: buck test mode/dev-nosan //executorch/backends/xnnpack/test:test_xnnpack_ops -- test_qs8_add_constant Differential Revision: D60413221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132092 Approved by: https://github.com/jerryzh168	2024-08-01 09:57:43 +00:00
Joel Schlosser	e6cddc9271	Fix public API tests (#131386 ) This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in: * `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers * `torch/library.py` - add `register_vmap` to `__all__` * `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore * `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API * `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386 Approved by: https://github.com/albanD	2024-07-30 18:42:54 +00:00
PyTorch MergeBot	e73a4cb21f	Revert "[pt2e][quant] Ensure BN node is erased after convert (#131651 )" This reverts commit `eba2ffd278`. Reverted https://github.com/pytorch/pytorch/pull/131651 on behalf of https://github.com/ZainRizvi due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/131651#issuecomment-2256407968))	2024-07-29 16:42:24 +00:00
PyTorch MergeBot	945bf78894	Revert "[BE] typing for decorators - fx/_compatibility (#131568 )" This reverts commit `193f62fde9`. Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
PyTorch MergeBot	a3ba405871	Revert "[BE] typing for decorators - library (#131570 )" This reverts commit `5731b486c8`. Reverted https://github.com/pytorch/pytorch/pull/131570 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
PyTorch MergeBot	609447a626	Revert "[BE] typing for decorators - _jit_internal (#131573 )" This reverts commit `f0f20f7e97`. Reverted https://github.com/pytorch/pytorch/pull/131573 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:32 +00:00
PyTorch MergeBot	b1d640a2b7	Revert "[BE] typing for decorators - ao/quantization/quantizer/xnnpack_quantizer_utils (#131577 )" This reverts commit `5ee6a6dacc`. Reverted https://github.com/pytorch/pytorch/pull/131577 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))	2024-07-28 03:29:32 +00:00
PyTorch MergeBot	8f5cf46405	Revert "Fix public API tests (#131386 )" This reverts commit `91fcfd8760`. Reverted https://github.com/pytorch/pytorch/pull/131386 on behalf of https://github.com/clee2000 due to reverting this to revert something else, only action you should need to do is to rebase and merge again, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/131386#issuecomment-2254327487))	2024-07-28 03:23:04 +00:00
Joel Schlosser	91fcfd8760	Fix public API tests (#131386 ) This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in: * `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers * `torch/library.py` - add `register_vmap` to `__all__` * `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore * `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API * `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386 Approved by: https://github.com/albanD	2024-07-26 23:38:43 +00:00
Sergii Dymchenko	5489ff8e94	Use Mermaid for the diagram in torch/ao/quantization/fx/README.md (#131412 ) preview `3a0efcdfa3/torch/ao/quantization/fx/README.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131412 Approved by: https://github.com/jerryzh168	2024-07-26 22:01:21 +00:00
Mikayla Gawarecki	1dd10ac802	[BE] [Reland] Make nn.Module state_dict load_state_dict pre-hook and state_dict post-hook public (#131690 ) Reland https://github.com/pytorch/pytorch/pull/126704 #### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic: Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up. If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised #### Copied from previous PR description Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437 - `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook` - Add a test as this API was previously untested - `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True` ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~ - For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook - Document this for private `_register_state_dict_hook` - Remove this for the public `register_state_dict_post_hook` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690 Approved by: https://github.com/albanD	2024-07-26 18:14:07 +00:00
andrewor14	eba2ffd278	[pt2e][quant] Ensure BN node is erased after convert (#131651 ) Summary: Previously, when folding BN into conv, we rely on DCE to clean up the unused BN node from the graph. This works if the model is already in eval mode, but fails if the model is still in train mode because DCE doesn't remove nodes with potential side effects (in this case `_native_batch_norm_legit`). This required users to move the model to eval mode before calling convert in order to get a properly DCE'd graph. To solve this, we manually erase the BN node after folding instead of relying on DCE. This relaxes the ordering constraints between `move_exported_model_to_eval` and `convert_pt2e`. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node Reviewers: jerryzh168, yushangdi Subscribers: jerryzh168, yushangdi, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651 Approved by: https://github.com/yushangdi	2024-07-26 15:30:45 +00:00
Aaron Orenstein	5ee6a6dacc	[BE] typing for decorators - ao/quantization/quantizer/xnnpack_quantizer_utils (#131577 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131577 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576	2024-07-25 22:24:19 +00:00
Aaron Orenstein	f0f20f7e97	[BE] typing for decorators - _jit_internal (#131573 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131573 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569, #131570, #131571, #131572	2024-07-25 22:24:19 +00:00
Aaron Orenstein	5731b486c8	[BE] typing for decorators - library (#131570 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131570 Approved by: https://github.com/oulgen, https://github.com/zou3519 ghstack dependencies: #131568, #131569	2024-07-25 22:24:19 +00:00
Aaron Orenstein	193f62fde9	[BE] typing for decorators - fx/_compatibility (#131568 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568 Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519	2024-07-25 22:24:19 +00:00
Xuehai Pan	2ce734cee9	[BE] enable UFMT for `torch/ao/quantization/` (#128863 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863 Approved by: https://github.com/ezyang ghstack dependencies: #128861, #128862	2024-07-25 04:17:54 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
kausik	4f60a2e39c	Set correct output dtype for dequantize op during convert_pt2e in decomposed mode (#128953 ) Earlier the signature of dequantize ops for decomposed quantized Tensor was changed for wider use-cases where the output dtype can be different from torch.float and needs to be passed during dequantization. Please refer: https://github.com/pytorch/pytorch/pull/121450 However, setting of correct output dtype for dequantize ops was still missing in convert_pt2e flow. This change enables the users to use PT2E quantization flow with non torch.float unquantized dtype, such as torch.bfloat16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128953 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-07-19 04:58:02 +00:00
Jerry Zhang	793b17ebcb	Add numeric_debugger top level APIs (#130643 ) Summary: Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model and calculate summary for metric comparisons between nodes in two graphs * `prepare_for_propagation_comparison` * `extract_results_from_loggers` * `compare_results` Test Plan: python test/test_quantization.py -k test_prepare_for_propagation_comparison python test/test_quantization.py -k test_extract_results_from_loggers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-18 20:54:18 +00:00
Jerry Zhang	b893aa71ca	Rename generate_numeric_debug_handle to numeric_debugger (#130590 ) Summary: att Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-15 22:42:27 +00:00
Tijmen Blankevoort	bc18863713	Corner-case fix for upscale_histogram in the new HistogramObserver (#130316 ) Summary: Small fix to the bucketize function that caused a run-time error in some corner cases. Test Plan: Unit tests Differential Revision: D59508432 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130316 Approved by: https://github.com/jerryzh168	2024-07-11 19:49:21 +00:00
Jerry Zhang	df9d1b44e7	Preserve _numeric_debug_handle throguh deepcopy and re-export (#129287 ) Summary: * Added support for preserving it during deepcopy, need to remap the args since _numeric_debug_handle refers to the nodes in the graph TODO: need to fully support re-export, currently the metadata for output node is not preserved Test Plan: python test/test_quantization.py -k test_deepcopy_preserve_handle python test/test_quantization.py -k test_copy_preserve_handle all related tests: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129287 Approved by: https://github.com/zhxchen17	2024-07-11 02:19:41 +00:00
Jerry Zhang	4c19623800	Change numeric_debug_handle to store per-node id (#129811 ) Summary: Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack, but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional support for numerical debugging for inputs and willing to hack around to achieve this. This PR changes the structure of numeric_debug_handle to store unique_id for each node instead. e.g. graph: ``` node = op(input_node, weight_node) ``` Before: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3} ``` After: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1 ``` Test Plan: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811 Approved by: https://github.com/tarun292	2024-07-08 23:36:19 +00:00
Xia, Weiwen	36e2608783	[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 ) Description Add fusion path for dynamic quant and for QAT. The following patterns can be matched for static quant with QAT cases: `qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant` The following patterns can be matched for dynamic quant cases: `qx -> qlinear -> add -> optional relu` Test plan python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear python test/test_quantization.py -k test_linear_unary python test/test_quantization.py -k test_linear_binary Differential Revision: [D57655830](https://our.internmc.facebook.com/intern/diff/D57655830) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel	2024-07-08 20:04:39 +00:00
PyTorch MergeBot	784e3b4123	Revert "Change numeric_debug_handle to store per-node id (#129811 )" This reverts commit `a9a744e442`. Reverted https://github.com/pytorch/pytorch/pull/129811 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/129811#issuecomment-2211245852))	2024-07-05 18:14:02 +00:00
Jerry Zhang	a9a744e442	Change numeric_debug_handle to store per-node id (#129811 ) Summary: Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack, but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional support for numerical debugging for inputs and willing to hack around to achieve this. This PR changes the structure of numeric_debug_handle to store unique_id for each node instead. e.g. graph: ``` node = op(input_node, weight_node) ``` Before: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3} ``` After: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1 ``` Test Plan: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811 Approved by: https://github.com/tarun292	2024-07-03 22:03:31 +00:00
Tijmen Blankevoort	e3b3431c42	Fix for HistogramObserver (#129387 ) Summary: There were two problems with the HistogramObserver: 1. It does not work when someone passes a batch_size 1, tensor_size 1 data-point. 2. The Histogram doesn't seem to actually update if the range of the new x falls within the old one These issues were both fixed. On top of this, I greatly simplified the logic for the histogram updating. Now, it doesn't do the downsampling anymore, which saves a ton of memory and code. The accuracy can still be controlled with the upsampling ratio. This ratio was also too high for the accuracy we generally need here, I reduced the default for this. Also the code is cleaner now, much easier to follow what's happening. test_histogram_observer_same_inputs was likely wrong - If I pass 0s and 1s to my histogramobserver, I want them to actually count! The current test now thinks it's good to discard and ignore these values. Test Plan: You can run the included tests. Differential Revision: D58931336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129387 Approved by: https://github.com/jerryzh168	2024-07-02 15:41:44 +00:00
Piotr Kluska	9cf8e5dd32	chore(quantization): Enable PT2E symmetric dynamic quantization (#124615 ) in `_find_choose_qparams_node` function compare the current node if it is affine or symmetric Pull Request resolved: https://github.com/pytorch/pytorch/pull/124615 Approved by: https://github.com/kimishpatel, https://github.com/malfet	2024-06-26 16:14:58 +00:00
Tijmen Blankevoort	ddb95dbb0d	Fixing equalize with three things and improving functionality (#124632 ) Summary: (1) Make code work when a first layer does not have a bias. (2) Make it possible to provide both modules and module names as input (3) Allow sequences of contiguous layers as input, that then get split into pairs (4) fix documentation to be more clear on inputs to be provided Test Plan: Run this new version of the algorithm on a network and see if it throws errors. There's also this notebook to run and test N5199827 It you tell me where I can find the tests for this code, I can add some simple unit tests as well. Differential Revision: D55895862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124632 Approved by: https://github.com/jerryzh168	2024-06-20 16:55:56 +00:00
yiliu30	4669c6d3ae	[quant][pt2e][quantizer] Support `set_module_name_qconfig` in X86InductorQuantizer (#126044 ) Summary: Added `set_module_name_qconfig` support to allow users to set configurations based on module name in `X86InductorQuantizer`. For example, only quantize the `sub`: ```python class M(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(5, 5) self.sub = Sub() def forward(self, x): x = self.linear(x) x = self.sub(x) return x m = M().eval() example_inputs = (torch.randn(3, 5),) # Set config for a specific submodule. quantizer = X86InductorQuantizer() quantizer.set_module_name_qconfig("sub", xiq.get_default_x86_inductor_quantization_config()) ``` - Added `set_module_name_qconfig` to allow user set the configuration at the `module_name` level. - Unified the annotation process to follow this order: `module_name_qconfig`, `operator_type_qconfig`, and `global_config`. - Added `config_checker` to validate all user configurations and prevent mixing of static/dynamic or QAT/non-QAT configs. - Moved `_get_module_name_filter` from `xnnpack_quantizer.py` into `utils.py` as it common for all quantizer. Test Plan ```bash python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_set_module_name ``` @Xia-Weiwen @leslie-fang-intel @jgong5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126044 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168	2024-06-14 07:13:10 +00:00
Aaron Gokaslan	93a14aba6e	[BE]: Update mypy to 1.10.0 (#127717 ) Updates mypy to the latest and greatest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127717 Approved by: https://github.com/ezyang	2024-06-13 15:57:13 +00:00
PyTorch MergeBot	1d233b8f50	Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 )" This reverts commit `c38b3381a1`. Reverted https://github.com/pytorch/pytorch/pull/126704 on behalf of https://github.com/clee2000 due to broke internal typecheck D58394110 (which probably means the code wouldn't work either but I guess it didn't run on the diff). Probably an easy fix? ([comment](https://github.com/pytorch/pytorch/pull/126704#issuecomment-2161299193))	2024-06-11 17:45:20 +00:00
Mikayla Gawarecki	c38b3381a1	Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 ) Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437 - `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook` - Add a test as this API was previously untested - `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True` ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~ - Document issue pointed out by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook - Remove this for the public `register_state_dict_post_hook` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126704 Approved by: https://github.com/albanD ghstack dependencies: #126906	2024-06-10 21:50:17 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Andrea Frittoli	04272a0e12	Add docstring for the torch.ao.quantization.utils.get_combined_dict function (#128127 ) Fixes: #127906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128127 Approved by: https://github.com/jerryzh168	2024-06-06 21:22:09 +00:00
rk7697	72e863df27	Update _learnable_fake_quantize.py (#127993 ) Remove sentence "For literature references, please see the class _LearnableFakeQuantizePerTensorOp." and add "s" to "support" (Possibly) Fixes #99107 (But not sure, sorry) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127993 Approved by: https://github.com/jerryzh168	2024-06-05 20:02:33 +00:00
PyTorch MergeBot	84776d7597	Revert "[BE]: Update mypy to 1.10.0 (#127717 )" This reverts commit `30213ab0a7`. Reverted https://github.com/pytorch/pytorch/pull/127717 on behalf of https://github.com/huydhn due to I am not sure why but the failures look legit and they are showing up in trunk `30213ab0a7` ([comment](https://github.com/pytorch/pytorch/pull/127717#issuecomment-2144183347))	2024-06-03 02:52:47 +00:00
Xuehai Pan	8b08b0f340	[BE] enable ruff rule `Q` from flake8-quotes (#127713 ) Enable [ruff rule `Q`](https://docs.astral.sh/ruff/rules/#flake8-quotes-q) from flake8-quotes. Fixes: - [avoidable-escaped-quote (Q003)](https://docs.astral.sh/ruff/rules/avoidable-escaped-quote/#avoidable-escaped-quote-q003) - [unnecessary-escaped-quote (Q004)](https://docs.astral.sh/ruff/rules/unnecessary-escaped-quote/#unnecessary-escaped-quote-q004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127713 Approved by: https://github.com/ezyang	2024-06-02 23:25:26 +00:00
Aaron Gokaslan	30213ab0a7	[BE]: Update mypy to 1.10.0 (#127717 ) Updates mypy to the latest and greatest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127717 Approved by: https://github.com/ezyang	2024-06-02 21:07:23 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Kwanghoon An	b704c7cf0f	Re trying Support min/max carry over for eager mode from_float method (#127576 ) Summary: Original commit changeset: 2605900516c8 Original Phabricator Diff: D57977896 Test Plan: Re enabling due to prod failure Reviewed By: jerryzh168 Differential Revision: D57978925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127576 Approved by: https://github.com/jerryzh168	2024-05-31 19:08:07 +00:00
Chen Lai	7827afca14	Copy the constant folding pass to the pass under export/passes folder (#127456 ) It's a generic pass and I'm trying to find a good place to host it. It's currently needed by quantization flow. See context in D55930580, it's too much effort to land a fix in the inductor folder. Differential Revision: [D57934182](https://our.internmc.facebook.com/intern/diff/D57934182/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127456 Approved by: https://github.com/angelayi	2024-05-30 18:04:08 +00:00
Kwanghoon An	24a4bfdcc2	[AdaRound] Make versatile for data / extra param for callback function (#126891 ) Summary: For Speech sequential model, there could be a case where model(data) does not work correctly for feed forward, Speech model uses a different type of Criterion (a.k.a loss function) to feed a data on individual components like encoder, predictor, joiner. Hence we need extra parameter to pass feedforward wrapper Differential Revision: D57680391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126891 Approved by: https://github.com/jerryzh168	2024-05-29 20:05:27 +00:00
Kwanghoon An	c404b2968c	Support min/max carry over for eager mode from_float method (#127309 ) Summary: After QAT is completed or given pre-tuned weight observer via tunable PTQ algorithm, it should not over-write again with a given weight, at least for static QAT never. Dynamic QAT also does not require to re-run weight observer again by design. This is a fix Test Plan: Signals Differential Revision: D57747749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127309 Approved by: https://github.com/jerryzh168	2024-05-29 19:33:26 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit `7763c83af6`. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
PyTorch MergeBot	980f5ac049	Revert "[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 )" This reverts commit `3642e51ea5`. Reverted https://github.com/pytorch/pytorch/pull/122667 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122667#issuecomment-2122642317))	2024-05-21 13:45:07 +00:00
Xia, Weiwen	3642e51ea5	[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 ) Description Add fusion path for dynamic quant and for QAT. The following patterns can be matched for static quant with QAT cases: `qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant` The following patterns can be matched for dynamic quant cases: `qx -> qlinear -> add -> optional relu` Test plan python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear python test/test_quantization.py -k test_linear_unary python test/test_quantization.py -k test_linear_binary Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667 Approved by: https://github.com/jgong5	2024-05-20 15:55:18 +00:00
Yihan He	d7de4c9d80	Fix issue of lowering nn.linear ops with kwargs (#126331 ) Summary: Support kwarg bias for nn.linear quantization Differential Revision: D57403190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126331 Approved by: https://github.com/ZhengkaiZ, https://github.com/huydhn	2024-05-17 21:50:55 +00:00
Kwanghoon An	eb0b16db92	Initial implementation of AdaRound (#126153 ) Summary: This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568 This algorithm is going to be used by multiple people, hence we need make it official implementation. Differential Revision: D57227565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153 Approved by: https://github.com/jerryzh168, https://github.com/huydhn	2024-05-17 19:44:50 +00:00
andrewor14	6931f781c2	[quant][pt2e] Allow multi users without output observers (#126487 ) Summary: The PT2E quantization flow does not support unquantized outputs yet. To work around this, users may wish to remove the output observer from their graphs. However, this fails currently in some cases because the `PortNodeMetaForQDQ` pass is too restrictive, for example: ``` conv -> obs -------> output0 \\-> add -> output1 ``` Previously we expected conv to always have exactly 1 user, which is the observer. When the observer is removed, however, conv now has 2 users, and this fails the check. ``` conv -------> output0 \\-> add -> output1 ``` This commit relaxes the error into a warning to enable this workaround. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_multi_users_without_output_observer Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Differential Revision: [D57472601](https://our.internmc.facebook.com/intern/diff/D57472601) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126487 Approved by: https://github.com/tarun292	2024-05-17 18:48:21 +00:00
PyTorch MergeBot	ae6fdfa539	Revert "Initial implementation of AdaRound (#126153 )" This reverts commit `175c18af81`. Reverted https://github.com/pytorch/pytorch/pull/126153 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit because there are more than one lint issues, torch/optim/asgd.py is just the last one ([comment](https://github.com/pytorch/pytorch/pull/126153#issuecomment-2113902522))	2024-05-16 02:34:49 +00:00
Kwanghoon An	175c18af81	Initial implementation of AdaRound (#126153 ) Summary: This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568 This algorithm is going to be used by multiple people, hence we need make it official implementation. Differential Revision: D57227565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153 Approved by: https://github.com/jerryzh168	2024-05-16 02:09:18 +00:00
albanD	af9acc4168	Fix public binding to actually traverse modules (#126103 ) The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal. There is an unused function just above that handles that, so I guess this is what was supposed to be called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103 Approved by: https://github.com/suo	2024-05-15 19:36:03 +00:00
andrewor14	3cba50e478	[quant] Make per_group and per_token quant match torch.fake_quantize (#125781 ) Summary: Follow-up to https://github.com/pytorch/ao/pull/229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781 Approved by: https://github.com/jerryzh168	2024-05-14 18:18:54 +00:00
Aaron Gokaslan	34910f87f0	[BE]: Update ruff to v0.4.4 (#125031 ) Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031 Approved by: https://github.com/albanD, https://github.com/malfet	2024-05-12 20:02:37 +00:00
PyTorch MergeBot	7ffa5558ee	Revert "[FX] Update type hints in `torch.fx._compatibility.py` (#125469 )" This reverts commit `235b4d6ec2`. Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))	2024-05-06 18:36:43 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Xuehai Pan	235b4d6ec2	[FX] Update type hints in `torch.fx._compatibility.py` (#125469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469 Approved by: https://github.com/Skylion007 ghstack dependencies: #125468	2024-05-05 19:30:22 +00:00
andrewor14	8242fb62a7	[quant][pt2e] Fix conv-bn weight + bias per channel QAT (#125208 ) Summary: This commit fixes the pattern matching for conv-bn during QAT fusion where both weight and bias are quantized per channel. Previously this failed because weights and biases used the same example kwargs for their scales and zero points, causing these qparams to be tied during pattern matching. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_bn_per_channel_weight_bias python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_bn_per_channel_weight_bias Reviewers: jerryzh168, angelayi Subscribers: jerryzh168, angelayi, supriyar Differential Revision: [D56740694](https://our.internmc.facebook.com/intern/diff/D56740694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125208 Approved by: https://github.com/angelayi	2024-04-30 18:12:25 +00:00
Xia, Weiwen	35b332882b	[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 ) As the title Test plan python test/test_quantization.py -k test_linear_binary Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #123240	2024-04-27 02:40:57 +00:00
Aaron Orenstein	a8574a9719	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-26 15:35:53 +00:00
Aaron Gokaslan	2f3b0befed	[BE]: Apply ruff FURB 118. (#124743 ) Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-26 14:34:52 +00:00
PyTorch MergeBot	1ac60484c1	Revert "Fix global flake8 issues (#124771 )" This reverts commit `f01275934b`. Reverted https://github.com/pytorch/pytorch/pull/124771 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))	2024-04-26 06:15:17 +00:00
Aaron Orenstein	f01275934b	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-25 14:25:00 +00:00
andrewor14	85b28ffc3a	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 22:01:50 +00:00
Shen Xu	8885638f95	[quant][pt2e] Propagate get_attr meta through known ops only (#124415 ) Summary: Avoid situation where the graph traversal finds a matmul node with a `get_attr` as its `args[0]`, and incorrectly propagate the `get_attr`'s meta to everything downstream. Test Plan: CI Differential Revision: D56219120 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124415 Approved by: https://github.com/jerryzh168	2024-04-24 20:55:56 +00:00
PyTorch MergeBot	e739a2d59e	Revert "[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 )" This reverts commit `4efb28c900`. Reverted https://github.com/pytorch/pytorch/pull/123957 on behalf of https://github.com/jeanschmidt due to reverting to check if it will fix rocm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/123957#issuecomment-2075158146))	2024-04-24 15:02:11 +00:00
andrewor14	4efb28c900	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 01:02:59 +00:00
Pian Pawakapan	cf98cab1b6	[export] Forward fix XNNPackQuantizer.module_type_config to detect str nn_module_stack (#123662 ) https://github.com/pytorch/pytorch/pull/123308 previously changed the nn_module_stack format (module type -> module str). This modifies XNNPackQuantizer's module_type_config to detect class module strs instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123662 Approved by: https://github.com/williamwen42	2024-04-23 15:21:37 +00:00
Amadeusz Skrzypczak	107f944f22	Support fp8 quantization (#123161 ) This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E. Motivation for using fp8 quantization instead of int8: - it works better to run inference with the same datatype the model was trained with, - fp8 can handle outliers better, which is one of the problems in LLMs activations. The numerical recipe we want to use it for is fp8 inference: - bgemms/gemms running in float8_e4m3fn, - Per-Tensor-Quantization/Scaling, - amax observer for measurement with input_backoff and weight_backoff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 13:35:27 +00:00
leslie-fang-intel	dd440ac734	Add Matmul recipe into x86_inductor_quantizer (#122776 ) Summary Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block ``` Differential Revision: [D56288468](https://our.internmc.facebook.com/intern/diff/D56288468) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 00:25:41 +00:00
Aaron Gokaslan	c5fafe9f48	[BE]: TRY002 - Ban raising vanilla exceptions (#124570 ) Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR. I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570 Approved by: https://github.com/ezyang	2024-04-21 22:26:40 +00:00
Aaron Gokaslan	29cc293725	[BE]: FURB142 - Remove set mutations. Use set update (#124551 ) Uses set mutation methods instead of manually reimplementing (update, set_difference etc). Pull Request resolved: https://github.com/pytorch/pytorch/pull/124551 Approved by: https://github.com/ezyang	2024-04-21 14:12:33 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
andrewor14	3eea300680	[quant] Do not decompose choose_qparams_per_token_asymmetric (#124178 ) Summary: https://github.com/pytorch/pytorch/pull/123452 added backward support to this op by turning it into CompositeImplicitAutograd, which meant it gets decomposed during export/compile. However, this is not desirable behavior for the PTQ case when we try to lower the model. This commit enables QAT without breaking PTQ by refactoring the impl into a separate op that does have backward support. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai, zou3519 Subscribers: jerryzh168, digantdesai, zou3519, supriyar Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178 Approved by: https://github.com/digantdesai	2024-04-16 22:58:48 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
andrewor14	762e19606e	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519	2024-04-12 20:05:56 +00:00
andrewor14	5c0a380bdf	[pt2e][qat] Support conv-transpose-bn[-relu] QAT fusion (#123652 ) Summary: This commit adds support for QAT fusion for the [conv-transpose-bn] and [conv-transpose-bn-relu] patterns. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn_relu python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn_relu Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Tasks: https://github.com/pytorch/pytorch/issues/122224 Differential Revision: [D55930704](https://our.internmc.facebook.com/intern/diff/D55930704) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123652 Approved by: https://github.com/jerryzh168	2024-04-12 17:16:02 +00:00
PyTorch MergeBot	f0eb162730	Revert "Switch quantized_decomposed over to new custom ops API (#123454 )" This reverts commit `638729c0cd`. Reverted https://github.com/pytorch/pytorch/pull/123454 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123454#issuecomment-2051738976))	2024-04-12 13:14:59 +00:00
PyTorch MergeBot	5669334175	Revert "Add Matmul recipe into x86_inductor_quantizer (#122776 )" This reverts commit `e8e9261b90`. Reverted https://github.com/pytorch/pytorch/pull/122776 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122776#issuecomment-2051073373))	2024-04-12 06:29:27 +00:00
PyTorch MergeBot	fe092da874	Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 )" This reverts commit `c83900887f`. Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs `c83900887f` https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))	2024-04-11 16:19:28 +00:00
andrewor14	c83900887f	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168	2024-04-11 14:51:42 +00:00
rzou	638729c0cd	Switch quantized_decomposed over to new custom ops API (#123454 ) We are taking API feedback. Changes: - I removed some of the default values (they weren't being used). - I was unable to convert the last op (which is essentially an autograd.Function registered as CompositeImplicitAutograd). That one is "incorrectly registered"; I punt fixing it to the future. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123454 Approved by: https://github.com/andrewor14 ghstack dependencies: #123453, #123578	2024-04-11 13:18:06 +00:00
leslie-fang-intel	e8e9261b90	Add Matmul recipe into x86_inductor_quantizer (#122776 ) Summary Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #122775	2024-04-11 09:32:47 +00:00
leslie-fang-intel	8798f5bf0d	Add Quantization recipe filter per operator type for x86_inductor_quantizer (#122775 ) Summary Default recipes are enabled in `X86InductorQuantizer` and request comes to customize recipes based on these defaults. - Avoid annotation propagation and restrict annotation only to annotate `conv`/`linear`. - Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. To meet these requests, we made changes in this PR by introducing interface as `set_function_type_qconfig` and `set_module_type_qconfig` - `set_function_type_qconfig` accepts functional input as `torch.nn.functional.linear` or `torch.matmul`; `set_module_type_qconfig` accepts nn.Module input as `torch.nn.Conv2d`. - To disable the recipe for this operator, user can simply exclude it from the list of operations as `quantizer.set_function_type_qconfig(op, None)`. - To modify or extend the recipe for this operator with default recipe, user can customize as `quantizer.set_function_type_qconfig(op, config)`. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_conv2d_recipe python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_linear_recipe python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_maxpool2d_recipe ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122775 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-11 09:30:31 +00:00
PyTorch MergeBot	8d9af8b91c	Revert "[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 )" This reverts commit `82e0153487`. Reverted https://github.com/pytorch/pytorch/pull/122387 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122387#issuecomment-2048294643))	2024-04-10 19:34:26 +00:00
Xia, Weiwen	82e0153487	[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 ) As the title Test plan python test/test_quantization.py -k test_linear_binary Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5	2024-04-10 01:34:14 +00:00
RoboSchmied	af27bc443b	fix typo in 4 files (#123529 ) fix typo: `information` has no plural. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123529 Approved by: https://github.com/albanD	2024-04-09 23:37:35 +00:00
Zhicheng Yan	77643ed2eb	[torch quantization]raise exception when OOM during combine histogram in observer (#123309 ) Summary: Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on downsample_rate. For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model. To fix OOM issue better, we use try-catch clause to avoid OOM. Empirically, we set the max size of a single histogram tensor size to 1 GB. Test Plan: Test the change for Multi-Task model (depth + segmentation) Differential Revision: D55567292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309 Approved by: https://github.com/jerryzh168	2024-04-06 03:15:02 +00:00
andrewor14	fe29a8fbea	[quant][be] Simplify fake_quant_per_channel (#123186 ) Summary: We probably don't need `torch._C._AutoDispatchBelowAutograd()`, which is to prevent infinite recursion if the implementation calls itself. Let's remove it and see if anything breaks. The other major change is registering the op to the more general Autograd dispatch key so it can be used on cuda as well. Test Plan: python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel Reviewers: zou3519, bdhirsh Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186 Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel	2024-04-03 18:06:45 +00:00
Zhengxu Chen	dacc73669c	[export] Make quantizer compatible with the standard nn_module_stack. (#122819 ) Summary: When we migrate to torch.export, we won't put L['self'] as the prefix for all the fqn in nn_module_stack. This diff adds the branch to handle the new case. Test Plan: buck test mode/opt caffe2/test/quantization:test_quantization -- -r set_module_name Differential Revision: D55436617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122819 Approved by: https://github.com/tugsbayasgalan	2024-03-28 19:36:46 +00:00
Zhicheng Yan	07f94df1a6	[torch quantization]fix HistogramObserver OOM when (self.max_val - self.min_val) is too small (#122659 ) Differential Revision: D55347133 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122659 Approved by: https://github.com/jerryzh168	2024-03-28 17:41:21 +00:00
Jerry Zhang	5af839f86d	[quant][pt2e] Enable observer sharing between different quantization specs (#122734 ) Summary: Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly, since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by adding is_dynamic to all quantization specs. Note: SharedQuantizationSpec should probably be its own type in the future TODO later: (1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase, (2). make SharedQuantizationSpec a separate type (3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py Test Plan: python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734 Approved by: https://github.com/andrewor14	2024-03-27 16:45:19 +00:00
haozhe.zhu	e0329cba8a	[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 ) Summary Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #122266	2024-03-26 08:03:42 +00:00
Guang Yang	c677221798	remove torchao dependency (#122524 ) Test Plan: CI ``` buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp32 --pt2e_quantize "xnnpack_dynamic" -2 ``` ``` buck run //executorch/backends/xnnpack/test:test_xnnpack_ops -- executorch.backends.xnnpack.test.ops.linear.TestLinear.test_qd8_fp32_per_token_weight_per_channel_group_int4 ``` Differential Revision: D55263008 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122524 Approved by: https://github.com/jerryzh168	2024-03-23 03:18:43 +00:00
PyTorch MergeBot	60bc29aa0b	Revert "[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 )" This reverts commit `2c6eeb26d3`. Reverted https://github.com/pytorch/pytorch/pull/122267 on behalf of https://github.com/jeanschmidt due to Not sure if this PR caused breakages in main rocm jobs, I'll remerge if reverting does not fix it ([comment](https://github.com/pytorch/pytorch/pull/122267#issuecomment-2015294491))	2024-03-22 15:04:30 +00:00
andrewor14	ea8e0c75c7	[quant][pt2] Fix create FQ with FixedQParamsQSpec (#122104 ) Summary: Before we just returned a _PartialWrapper object when using FixedQParamsQuantizationSpec in QAT. This is wrong and we should return a FQ object instead. Differential Revision: [D55021106](https://our.internmc.facebook.com/intern/diff/D55021106) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122104 Approved by: https://github.com/jerryzh168	2024-03-22 14:23:05 +00:00
haozhe.zhu	2c6eeb26d3	[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 ) Summary Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #122266	2024-03-22 08:12:23 +00:00
Jerry Zhang	901ba2be86	[quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ (#122046 ) Summary: also added some utils in xnnpack_quantizer_utils.py * annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT * annotate_conv_transpose_relu conv_transpose + bn weights fusion is performed automatically and can not be disabled currently we can add support to allow disable this fusion later if needed Test Plan: python test/test_quantization.py -k test_conv_transpose_bn_fusion Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046 Approved by: https://github.com/andrewor14	2024-03-19 21:00:57 +00:00
Tugsbayasgalan Manlaibaatar	53d2188df9	Update get_aten_graph_module (#121937 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121937 Approved by: https://github.com/andrewor14	2024-03-15 20:35:55 +00:00
Le-Zheng	25e00545bb	[Quant][PT2E] Enable linear and linear-unary post-op gelu quant recipe for x86 inductor quantizer (#114853 ) Summary Add Gelu for linear-unary post-op quantization recipe to x86 inductor quantizer. Test plan python -m pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_unary_gelu python test/test_quantization.py -k test_linear_unary_with_quantizer_api Co-authored-by: leslie-fang-intel <leslie.fang@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114853 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168	2024-03-14 01:46:35 +00:00
Manuel Candales	c53e3f57b5	allow fp16 in quant/dequant decompositions (#121738 ) Test Plan: ``` buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp16 --pt2e_quantize "xnnpack_dynamic" -2 ``` Reviewed By: kirklandsign Differential Revision: D54785950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121738 Approved by: https://github.com/jerryzh168	2024-03-13 21:45:08 +00:00
Manuel Candales	6d8a7d6e58	[pytorch] optional zero points on dequantize per channel (#121724 ) Summary: X-link: https://github.com/pytorch/executorch/pull/2364 bypass-github-export-checks Test Plan: sandcastle Reviewed By: mikekgfb Differential Revision: D54709217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121724 Approved by: https://github.com/mikekgfb	2024-03-12 19:54:11 +00:00
kausik	edf22f3a48	Modify signature of dequantize ops for decomposed quantized Tensor (#119173 ) (#121450 ) Summary: X-link: https://github.com/pytorch/executorch/pull/2308 Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any. At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization. This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel Reviewed By: digantdesai Differential Revision: D53590486 Pulled By: manuelcandales Co-authored-by: kausik <kmaiti@habana.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450 Approved by: https://github.com/jerryzh168	2024-03-12 12:36:31 +00:00
Shen Xu	159f30331f	[quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548 ) Test Plan: ``` buck run caffe2/test:quantization_pt2e ``` Differential Revision: D54454707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548 Approved by: https://github.com/jerryzh168	2024-03-12 02:59:12 +00:00
Jerry Zhang	a6a67da333	[quant] Add error check for input_edge annotation (#121536 ) Summary: Raises error when an input edge contains non-Node elements like constant values etc in annotation. Test Plan: python test/test_quantization.py -k test_input_edge_sanity_check Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/121536 Approved by: https://github.com/andrewor14	2024-03-09 06:13:04 +00:00
leslie-fang-intel	975d428425	[Quant] Add the operator of decomposed fake quant per channel (#121297 ) Summary Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen. Test Plan ``` python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel ``` Next Step Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2024-03-08 10:51:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b474a523c6	Ban passing in free function into capture_pre_autograd_graph (#120817 ) Summary: Today we don't allow free functions to be tracing callable in torch.export. As a part of migrating capture_preautograd_graph usages to torch.export, we need to ban free functions to capture_preautograd_graph as well Test Plan: CI Differential Revision: D54319597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120817 Approved by: https://github.com/zhxchen17, https://github.com/andrewor14	2024-03-01 19:38:58 +00:00
andrewor14	91190d8087	[quant][pt2e] Relax `model_is_exported` input (#120720 ) Summary: This commit relaxes the `model_is_exported` API to additionally work for `torch.nn.Module`s in addition to just `torch.fx.GraphModule`s, simplifying downstream uses. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_model_is_exported Differential Revision: [D54263935](https://our.internmc.facebook.com/intern/diff/D54263935) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120720 Approved by: https://github.com/tugsbayasgalan	2024-02-28 18:32:03 +00:00
leslie-fang-intel	84de851539	[Inductor] Enable the decomposition of quant/dequant per channel (#119177 ) Summary Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type. Enable decomposition of quant/dequant per channel to make it vectorized code generation. TestPlan ``` python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8 python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8 python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input ``` Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177 Approved by: https://github.com/peterbell10, https://github.com/jansel	2024-02-19 01:30:44 +00:00
andrewor14	6ea4480818	[quant][pt2e] Add `model_is_exported` util function (#119726 ) Summary: This commit adds the `model_is_exported` util function for users to be able to easily tell what APIs to call to move their models between train and eval modes. This has the additional advantage of hiding the implementation of how we detect a model is exported, in case the metadata format changes in the future. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_model_is_exported Differential Revision: [D53812972](https://our.internmc.facebook.com/intern/diff/D53812972) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119726 Approved by: https://github.com/tugsbayasgalan, https://github.com/albanD	2024-02-16 19:29:36 +00:00
andrewor14	8ec8d78ef2	[quant][pt2e][be] Rename eval_utils -> export_utils (#119725 ) It's not really eval_utils anymore, since we added some training related utils. Instead it should be util functions that are related to general export use cases. Differential Revision: [D53711494](https://our.internmc.facebook.com/intern/diff/D53711494) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119725 Approved by: https://github.com/tugsbayasgalan	2024-02-13 19:10:06 +00:00
andrewor14	830ed6d9b2	[quant][pt2] Fix _disallow_eval_train error message (#119694 ) Fix the message to use the right function name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119694 Approved by: https://github.com/tugsbayasgalan	2024-02-13 00:17:53 +00:00
Riley Dulin	44796682d0	[torch][ao] Fix module name filter for pytorch2 quantization for underscores (#119344 ) Summary: There was a bug in the module name filter for modules that had an underscore already in them, as it was replaced with a "dot" notation. This is because it was thought that underscores always meant a module separator, but this isn't the case for modules whose name contains an underscore. Test Plan: Added a unit test. Before this change, that test failed (due to applying the wrong qscheme). Now it passes. Differential Revision: D53502771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119344 Approved by: https://github.com/jerryzh168	2024-02-10 00:29:08 +00:00
Jerry Zhang	7082e24ce8	[quant][pt2e][bc-breaking] Set `fold_quantize` to True in `convert_pt2e` (#119425 ) Summary: This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to set `fold_quantize` flag to True in `convert_pt2e` Test Plan: CI Differential Revision: D53550237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119425 Approved by: https://github.com/andrewor14	2024-02-09 18:13:43 +00:00
PyTorch MergeBot	81abc2b249	Revert "[quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701 )" This reverts commit `482d952e88`. Reverted https://github.com/pytorch/pytorch/pull/118701 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/118701#issuecomment-1932866964))	2024-02-07 20:56:16 +00:00
Jerry Zhang	482d952e88	[quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701 ) Summary: This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to remove `fold_quantize` flag from `convert_pt2e` Test Plan: CI Differential Revision: D53247301 BC Breaking Note: flag `fold_quantize` set to True `convert_pt2e` and now we'll fold the quantize op in the weight by default, so users will see model size reduction by default after pt2e quantization. 2.2 ``` folded_model = convert_pt2e(model, fold_quantize=True) non_folded_model = convert_pt2e(model) ``` 2.3 ``` folded_model = convert_pt2e(model) non_folded_model = convert_pt2e(model, fold_quantize=False) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118701 Approved by: https://github.com/andrewor14, https://github.com/leslie-fang-intel	2024-02-07 19:10:51 +00:00
andrewor14	6c1cca153e	[quant][pt2e] Allow users to override train/eval behavior (#119091 ) Summary: This commit adds a util for PT2E quantization users to call `model.train()` and `model.eval()` without error. Instead, these will automatically call the equivalent `move_exported_model_to_train/eval` for the user, which only switch behavior for special ops like dropout and batchnorm. This enables users to onboard to the PT2E flow more easily. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_allow_exported_model_train_eval Reviewers: jerryzh168, tugsbayasgalan, zhxchen17 Subscribers: jerryzh168, tugsbayasgalan, zhxchen17, supriyar Differential Revision: [D53426636](https://our.internmc.facebook.com/intern/diff/D53426636) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119091 Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan, https://github.com/zhxchen17	2024-02-06 22:19:58 +00:00
andrewor14	70605d150b	[quant][pt2] Add `move_exported_model_to_train` (#113492 ) Summary: This is the equivalent API to `model.train()` for exported models, analogous to `move_exported_model_to_eval`. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_inplace python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_bn Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/113492 Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan	2024-02-02 17:39:47 +00:00
Jiaxu Zhu	b97ab47619	[pytorch][ao] Update `PerChannelMinMaxObserver` default `_load_from_state_dict` (#118659 ) Summary: When `version` is missing in the metadata, use `min_val/max_val` as keys instead of `max_vals/min_vals` ## Reasons 1. It's been almost 2 years since this change D30003700, which means now most checkpoints are using the `max_val/min_val` keys 2. most checkpoints dumps using `model.state_dict()` don't have version info, which will lead a fake `missing keys` error when loading state_dict Test Plan: CI Differential Revision: D53233012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118659 Approved by: https://github.com/jerryzh168	2024-02-01 04:39:31 +00:00
suo	5586d7797e	fix up batchnorm folding in pt2 quant (#118720 ) Changes to how attributes are structured messed this pass up, fix it Differential Revision: [D53253601](https://our.internmc.facebook.com/intern/diff/D53253601/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118720 Approved by: https://github.com/SherlockNoMad	2024-01-31 17:40:47 +00:00
Jerry Zhang	82a7460b67	[quant][bc-breaking] Turn on fold_quantize by default (#118605 ) Summary: Previously by default we don't generate quantized weight, that is, we'll have fp32 weight, and `fp32 weight -> q -> dq -> linear -> ...` in the quantized model After this PR, we'll produce a graph with int8 weight by default after convert_pt2e: `int8 weight -> dq -> linear -> ...` We'll remove the fold_quantize flag in the next PR Test Plan: CI Differential Revision: D51730862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118605 Approved by: https://github.com/andrewor14	2024-01-30 21:42:29 +00:00
Catherine Lee	4f5785b6b3	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Catherine Lee <csl@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 21:07:01 +00:00
PyTorch MergeBot	40ece2e579	Revert "Enable possibly-undefined error code (#118533 )" This reverts commit `4f13f69a45`. Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))	2024-01-30 19:00:34 +00:00
Edward Z. Yang	4f13f69a45	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 05:08:10 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
Edward Z. Yang	9bce208dfb	Replace follow_imports = silent with normal (#118414 ) This is a lot of files changed! Don't panic! Here's how it works: * Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file. * When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded. * The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors. * Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list. * Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves. * torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state. * There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many. In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file. The codemod was done with this script authored by GPT-4: ``` import glob exclude_patterns = [ ... ] for pattern in exclude_patterns: for filepath in glob.glob(pattern, recursive=True): if filepath.endswith('.py'): with open(filepath, 'r+') as f: content = f.read() f.seek(0, 0) f.write('# mypy: ignore-errors\n\n' + content) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414 Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD	2024-01-27 02:44:11 +00:00
le-zheng	94f0472579	[Quant] [PT2] Add Hardswish into X86InductorQuantizer Conv2d Unary Annotation (#117488 ) Summary Add `hardswish` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117488 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #117487	2024-01-20 01:37:33 +00:00
Jerry Zhang	8f1bc876b2	[quant] Support custom qmin/qmax for activation and weight for xnnpack quantizer (#117305 ) Summary: att, this allows us to experiment with 4 bit quant in xnnpack Test Plan: python test/test_quantization.py -k test_dynamic_linear_int4_weight Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117305 Approved by: https://github.com/digantdesai	2024-01-17 03:22:49 +00:00
Xia, Weiwen	94db6578cc	[Quant] Add dynamic quantization config for x86 inductor backend (#115337 ) Description Add dynamic quantization config for x86 inductor backend. To support the QKV structure in self-attention, we removed an assertion in port-metadata-pass that requires single dequantize node after quantize node. Test plan ``` python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_dynamic_quant_linear python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_qat_dynamic_quant_linear ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115337 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-01-10 11:33:37 +00:00
Max Ren	d2033a0639	[quant][pt2e][xnnpack_quantizer] add support for linear_relu (#117052 ) Add support for linear_relu annotation for XNNPACKQuantizer, this allows the input to linear and the output to relu to share the same quantization parameter.s Differential Revision: [D52574086](https://our.internmc.facebook.com/intern/diff/D52574086/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117052 Approved by: https://github.com/jerryzh168, https://github.com/digantdesai	2024-01-09 23:19:52 +00:00
Zhengxu Chen	5ac57a06eb	[export] Refactor ExportPassBase. (#116778 ) Summary: X-link: https://github.com/pytorch/executorch/pull/1532 as title. This diff decouple the pass base library from torch export and exir, so that different layers can evolve in their own fashion, and we have more head room to divide and conquer in the future. Test Plan: CI Reviewed By: angelayi Differential Revision: D52514517 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116778 Approved by: https://github.com/angelayi	2024-01-04 21:32:14 +00:00
Aaron Gokaslan	3fe437b24b	[BE]: Update flake8 to v6.1.0 and fix lints (#116591 ) Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling. - Replace `assert(0)` with `raise AssertionError()` - Remove extraneous parenthesis i.e. - `assert(a == b)` -> `assert a == b` - `if(x > y or y < z):`->`if x > y or y < z:` - And `return('...')` -> `return '...'` Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591 Approved by: https://github.com/albanD, https://github.com/malfet	2024-01-03 06:04:44 +00:00
Jerry Zhang	41f265b06a	[quant][pt2e] Preserve numeric_debug_handle in quantization flows (#116477 ) Summary: We introduced `node.meta["numeric_debug_handle"]` in https://github.com/pytorch/pytorch/pull/114315 to indicate the numeric debug handle for values in the graph, in this PR we supported preserving this field in prepare and convert so that we can use these for numerical debugging Next: we also want to preserve these in deepcopy of GraphModule as well Test Plan: python test/test_quantization.py -k test_quantize_pt2e_preserve_handle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116477 Approved by: https://github.com/tugsbayasgalan	2024-01-03 03:39:00 +00:00
Aaron Gokaslan	bd10fea79a	[BE]: Enable F821 and fix bugs (#116579 ) Fixes #112371 I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579 Approved by: https://github.com/ezyang	2024-01-01 08:40:46 +00:00
Jerry Zhang	8173d98c57	[quant][be] Skip conv-bn folding when there are no batchnorm ops (#116440 ) Summary: `_fold_conv_bn_qat` is taking a long time currently, so skipping it when it's not necessary, we can have follow up fixes to actually reduce the patterns or cache the patterns if possible Test Plan: uncomment the print in `test_speed`, run python test/test_quantization.py -k test_speed and make sure the convert time is low, e.g. 0.1s instead of 8-9 seconds Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116440 Approved by: https://github.com/andrewor14	2023-12-28 23:33:21 +00:00
Aaron Gokaslan	bbe3261dd3	[BE]: Use `iterable.chain.from_iterable` where possible (#116376 ) This is more readable and more efficient when dealing with lots of sequences to chain together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116376 Approved by: https://github.com/albanD	2023-12-27 19:20:07 +00:00
Xuehai Pan	199e07f108	[pytree][BE] update treespec `num_children` access (#116370 ) Change `len(treespec.children_spes) -> treespec.num_children`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116370 Approved by: https://github.com/Skylion007	2023-12-24 20:54:32 +00:00

1 2 3 4 5 ...

1153 Commits