pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Vasiliy Kuznetsov	0db4324ea9	dbr quant function fusion [2/x]: use fusion for observation and inference (#71781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781 The previous PR added information about fusions found in the subgraphs. This PR uses that information for: 1. inserting observers at the end of fusions and not in the middle 2. during inference, replacing the original op with the fused op. The way this is implemented is that the base op is replaced with the fused op, and all other ops are replaced with identity functions. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775097 Pulled By: vkuzo fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf	2022-02-07 05:59:03 -08:00
Vasiliy Kuznetsov	4ad1ca1abc	dbr quant function fusion [1/x]: record matches for functions (#71764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71764 For DBR quant, adds the code for matching seen ops to function fusion patterns. After we have the full DAG, we have a separate pass over the dag and add matched fusion patterns to the seen op data structure. This is the first PR in the stack which implements matching and recording the match results. Future PRs in this stack will use the match results to modify observer insertion and inference. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775098 Pulled By: vkuzo fbshipit-source-id: 488aac902bf568d41c863ee49248990411ed9c53	2022-02-07 05:58:57 -08:00
Vasiliy Kuznetsov	b0d48a8e66	dbr quant: record dag of non-quantizeable ops (#71551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71551 This PR makes DBR quant record the DAG of non-quantizeable ops. Having this will enable us to analyze the entire traced graph of pytorch ops, regardless of whether they support quantization or not. That, in turn, will enable analysis of uses of each op, allowing us to safely determine whether a subgraph of ops can be fused or not. In future PRs, this functionality will be used to implement function fusion. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33684130 Pulled By: vkuzo fbshipit-source-id: 497d9882f0670a36eef2a0900ea2517c82addf66	2022-02-07 05:58:54 -08:00
Vasiliy Kuznetsov	6d85745810	dbr quant: rename seen_op_info to seen_q_op_info (#71312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71312 Renames `seen_op_info` to `seen_q_op_info` and `SeenOpInfo` to `SeenQOpInfo`. This is to make it clear that the op here is a quantizeable op. This is useful for a future PR where we will start recording the DAG of non-quantizeable ops. That will be needed to properly support function fusion. Test Plan: CI and mypy Reviewed By: albanD Differential Revision: D33584751 Pulled By: vkuzo fbshipit-source-id: 0b659d4ecefc96d532c451abac410c638e457dcb	2022-02-07 05:57:25 -08:00
Peter Bell	f7b1884845	[quant][fx] Don't assume bias is a keyword-argument (#71426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71426 dbr quantization makes faulty assumptions about which arguments are passed as keyword arguments and which are passed as positional arguments. This happens to work currently due to a quirk of how `__torch_function__` is implemented in python functions, but will break when the operators are moved to C++. Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D33754262 Pulled By: albanD fbshipit-source-id: 63515d7a166449726e1beaba6659443b6261742d	2022-02-01 08:56:16 -08:00
Vasiliy Kuznetsov	7fdec92b9c	dbr quant: make SeenOpInfo a dataclass (#71267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71267 Refactors `SeenOpInfo` to be a dataclass, to be consistent with `QTensorInfo`, so we can get real typing. Fixes the type errors. No logic change. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: HDCharles Differential Revision: D33567129 Pulled By: vkuzo fbshipit-source-id: 55f89d7a497b6db1fd9956255d964663032a0401	2022-01-24 06:23:31 -08:00
Vasiliy Kuznetsov	2985849916	dbr quant: split observer insertion to a separate pass (#71253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71253 Before this PR, observers were inserted at the same time as we recorded ops seen while tracing with example input. This is not ideal because for function fusion (not yet implemented), we need to be able to look ahead from the current op to properly insert observers. This PR refactors observer insertion in DBR quantization to happen in a separate pass after the ops are recorded. There is no functionality change in this diff, but this PR will make it easier to implement function fusion in a future PR. Note: the qconfig is still used during tracing to assign each op an inference dtype. This is not ideal, in the future we may move this step to happen as a separate pass as well. The reason we keep it as is in this PR because some more refactoring would be needed to allow this to both happen in a separate pass as well as survive module boundaries. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: wenleix Differential Revision: D33558280 Pulled By: vkuzo fbshipit-source-id: 54e9cea6ad05317a8c7c92be005d33653617bed6	2022-01-24 06:23:28 -08:00
Vasiliy Kuznetsov	3c8db24360	dbr quant: make QTensorInfo a dataclass and add orig_dtype (#71245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71245 This is a refactor to make a future PR of making observer insertion be a separate pass easier. 1. adds orig_dtype, so we always record what was seen while tracing 2. switches from namedtuple to dataclass, so we can have more explicit types Test Plan: CI and mypy Reviewed By: HDCharles Differential Revision: D33558281 Pulled By: vkuzo fbshipit-source-id: b9f87e25a3538fee145f020916a31699046a9c11	2022-01-24 06:23:24 -08:00
Vasiliy Kuznetsov	9c455d7086	dbr quant: add limited support for `torch.nn.ModuleList` (#70372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70372 Enables basic support for `torch.nn.ModuleList` in DBR quant by stopping it from being a leaf. For now, we require the user to check for `AutoQuantizationState` if they are looping over the contents without any bounds checking. In future PRs, we can explore how to solve this without requiring user code changes. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_module_list ``` Reviewed By: VitalyFedyunin Differential Revision: D33302329 Pulled By: vkuzo fbshipit-source-id: 1604748d4b6c2b9d14b50df46268246da807d539	2022-01-06 13:25:13 -08:00
Vasiliy Kuznetsov	b12852eb41	dbr quant: support for custom leaf modules, part 1/x (#70330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70330 Starts adding support for custom leaf modules, part 1/x. In this PR, we ensure that leaf modules and all of their children do not get `AutoQuantizationState` objects attached to them. The API is matching prepare_fx, using the `prepare_custom_config_dict` argument and the `non_traceable_module_class` key within that dict. The next couple of PRs will ensure that modules and functions in leaves do not get quantized, keeping it separate to make PRs smaller. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_prepare_custom_config_dict_non_traceable_module_class ``` Reviewed By: jerryzh168 Differential Revision: D33285310 Pulled By: vkuzo fbshipit-source-id: 532025fda5532b420fad0a4a0847074d1ac4ad93	2022-01-06 13:25:04 -08:00
Vasiliy Kuznetsov	dfb807d65e	dbr quant: do not attach auto_quant_state to observers (#70256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70256 Somewhere in previous PRs we started attaching AutoQuantState to observers. This PR removes this, as that has not purpose and makes model debugging more complicated. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33262299 Pulled By: vkuzo fbshipit-source-id: a3543b44c517325d57f5ed03b961a8955049e682	2022-01-06 13:23:43 -08:00
Vasiliy Kuznetsov	adaf383837	dbr quant: better fix for bug with recursion on dequantize (#70128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70128 Previous code disabled torch_function when dequantizing arguments to an unquantizeable function. This PR blocklists the dequantize method from the dequantize hook instead, so we can remove the previous hack. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: ejguan Differential Revision: D33194396 Pulled By: vkuzo fbshipit-source-id: 6175c2da637c1d0c93b3fea0ef8218eaee6a2872	2021-12-21 06:25:37 -08:00
Vasiliy Kuznetsov	a4173fc887	dbr quant: extend qconfig_dict support to functions, part 1 (#69758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69758 Extends DBR quant `qconfig_dict['object_type']` support to function types, with the restriction that a parent module must have a qconfig. A future PR will remove the restriction above (it is due to some technical debt), to keep PR sizes small. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33020217 Pulled By: vkuzo fbshipit-source-id: ce8a8185f9c87d437e1319ff6f19e8f6adf41e02	2021-12-17 05:59:52 -08:00
Vasiliy Kuznetsov	4f450f44bf	dbr quant: initial support of qconfig_dict for modules (#69719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69719 This PR changes the API signature of DBR quant to use `qconfig_dict`, similar to FX graph mode quantization. In this first PR, only basic functionality is implemented: * qconfig=None or static quantization with quint8 only is tested * non-default qconfig for modules only is tested * targeting ops by order is not implemented Expanding this support will be done in future PRs. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33003475 Pulled By: vkuzo fbshipit-source-id: f5af81e29c34ea57c2e23333650e44e1758102e4	2021-12-17 05:59:44 -08:00
Vasiliy Kuznetsov	5b37ac54cb	dbr quant overhead [14/x]: cache whether an op is a module (#68877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877 Saves whether an op type is a module during tracing, so we can avoid recalculating this when validating the op during inference. This leads to a small speedup. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` ``` // MobileNetV2, 1x3x224x224, function level profiling // before validate_cur_op - 1.77% // after validate_cur_op - 1.41% ``` Reviewed By: jerryzh168 Differential Revision: D32646149 Pulled By: vkuzo fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892	2021-11-30 06:13:06 -08:00
Vasiliy Kuznetsov	f253370bb9	dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841 Caches the current module's hook type as an attribute on the module. This requires the assumption that the current module's hook type does not change during inference, which is an assumption we can commit to. Test Plan: correctness ``` python test/test_quantization.py TestQuantizeDBR ``` performance ``` // MobileNetV2, 1x3x224x224, function profiling // before get_module_hook_type -> 2.58% // after get_module_hook_type -> 0.73% ``` Reviewed By: jerryzh168 Differential Revision: D32630881 Pulled By: vkuzo fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65	2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov	e1c449ff34	dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432 Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op based on informationg gathered while tracing, and skipping execution when this flag is true. ``` MobileNetV2, function level profiling, 1x3x224x224 // before op_convert_before_hook = 3.25% // after op_convert_before_hook = 1.35% ``` Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463752 Pulled By: vkuzo fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b	2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov	f1021bcf38	dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373 Removes redundant logic in `op_needs_quantization`, for a small speedup. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before cur_op_needs_hooks - 0.76% op_needs_quantizaion - 0.41% // after cur_op_needs_hooks - 0.70% op_needs_quantization - 0.36% ``` Reviewed By: jerryzh168 Differential Revision: D32463762 Pulled By: vkuzo fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952	2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov	16a6e0612d	dbr quant: clean up key types in AutoQuantizationState mappings (#68369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369 `AutoQuantizationState` has various mappings keyed on IDs. Only `tensor_id_to_observer` actually needs string keys because it is an `torch.nn.ModuleDict`. This PR changes the other mappings to have integer keys, for simplicity and performance. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463765 Pulled By: vkuzo fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856	2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov	3fc9bc43c6	dbr quant overhead[4/x]: speed up hook type calculations (#68351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351 Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by bypassing the expensive `torch.nn.Module` getters and setters and fetching `_auto_quant_state` directly. Test Plan: Model level benchmarking is noisy. Individual `cProfile` results: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before get_module_hook_type - 5.96% get_torch_function_hook_type - 2.24% // after get_module_hook_type - 2.10% get_torch_function_hook_type - 0.57% ``` Reviewed By: jerryzh168 Differential Revision: D32463756 Pulled By: vkuzo fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58	2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov	9fba8971a7	dbr quant: move model level utils into own file (#68346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346 Some utility functions for DBR quant need to be aware of `AutoQuantizationState`. This PR moves them into their own file, so they can use the type directly without circular imports, and removes the mypy ignores which are no longer necessary after this change. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463763 Pulled By: vkuzo fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1	2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov	52cc9cb0ee	dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344 Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info` instead of the current op. This will make future performance improvements easier. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463758 Pulled By: vkuzo fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e	2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov	2755cf457c	dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343 Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to use less internal state, makes the function have no side effects by passing the state in the arguments, and moves the function to utils file. This will help with a future refactor to cache this info at runtime. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463760 Pulled By: vkuzo fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b	2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov	57472ec414	dbr quant: refactor `get_quantized_op` to only use `seen_op_info` (#68342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342 Before this PR, `get_quantized_op` required the current callable. After this PR, `get_quantized_op` only requires `seen_op_info`. The signature was changed slightly to return `None` if the original callable does not need replacement for quantization. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463768 Pulled By: vkuzo fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f	2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov	9cf4779ec9	dbr quant: refactor `get_func_output_obs_type` to only use `seen_op_info` (#68341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341 Before this PR, `get_func_output_obs_type` used information from the incoming op and its arguments, which makes it hard to cache. This PR refactors `get_func_output_obs_type` to only use information collected during tracing. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463755 Pulled By: vkuzo fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c	2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov	ed6ef0eec4	dbr quantization: inline scale and zp (#68251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251 Before this PR, DBR quantization used to recalculate scale and zero_point in the converted model every time it was needed, which is slow. This PR creates a pass during the convert function to go through every observer in the model and cache its scale and zero_point. Note: only doing this for observers which correspond to int8 operations is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: VitalyFedyunin Differential Revision: D32463769 Pulled By: vkuzo fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29	2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov	ca499567d2	barebones numeric suite for quantization with dynamic tracing (#67776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776 This adds a barebones `add_loggers` and `extract_logger_info` API to analyze intermediate activations of models using quantization with dynamic tracing. The API generally matches the NS for FX tool, with some omissions. For now, this is moving fast to help us debug real models, and the API will be 100% aligned before this is marketed to users, in future PRs. Note: the current approach couples Numeric Suite with the quantization logic. This is not the best for composability, and may be changed at a future time. Test Plan: ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` Differential Revision: D32231332 D32231332 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a	2021-11-20 15:15:48 -08:00
Vasiliy Kuznetsov	4466ba8f30	Working POC of define-by-run quantization (#64676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676 We implement a working eager mode quantization flow which uses tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization. Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs. Control flow over quantizeable ops is not supported, but general control flow is supported. In particular: * `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function * `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc. * please see `README.md` for more details Test Plan: ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` Differential Revision: D31992281 D31992281 Reviewed By: HDCharles Pulled By: vkuzo fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967	2021-11-11 06:25:24 -08:00

28 Commits