pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Vasiliy Kuznetsov	5b37ac54cb	dbr quant overhead [14/x]: cache whether an op is a module (#68877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877 Saves whether an op type is a module during tracing, so we can avoid recalculating this when validating the op during inference. This leads to a small speedup. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` ``` // MobileNetV2, 1x3x224x224, function level profiling // before validate_cur_op - 1.77% // after validate_cur_op - 1.41% ``` Reviewed By: jerryzh168 Differential Revision: D32646149 Pulled By: vkuzo fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892	2021-11-30 06:13:06 -08:00
Vasiliy Kuznetsov	f253370bb9	dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841 Caches the current module's hook type as an attribute on the module. This requires the assumption that the current module's hook type does not change during inference, which is an assumption we can commit to. Test Plan: correctness ``` python test/test_quantization.py TestQuantizeDBR ``` performance ``` // MobileNetV2, 1x3x224x224, function profiling // before get_module_hook_type -> 2.58% // after get_module_hook_type -> 0.73% ``` Reviewed By: jerryzh168 Differential Revision: D32630881 Pulled By: vkuzo fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65	2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov	e1c449ff34	dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432 Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op based on informationg gathered while tracing, and skipping execution when this flag is true. ``` MobileNetV2, function level profiling, 1x3x224x224 // before op_convert_before_hook = 3.25% // after op_convert_before_hook = 1.35% ``` Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463752 Pulled By: vkuzo fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b	2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov	f1021bcf38	dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373 Removes redundant logic in `op_needs_quantization`, for a small speedup. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before cur_op_needs_hooks - 0.76% op_needs_quantizaion - 0.41% // after cur_op_needs_hooks - 0.70% op_needs_quantization - 0.36% ``` Reviewed By: jerryzh168 Differential Revision: D32463762 Pulled By: vkuzo fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952	2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov	16a6e0612d	dbr quant: clean up key types in AutoQuantizationState mappings (#68369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369 `AutoQuantizationState` has various mappings keyed on IDs. Only `tensor_id_to_observer` actually needs string keys because it is an `torch.nn.ModuleDict`. This PR changes the other mappings to have integer keys, for simplicity and performance. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463765 Pulled By: vkuzo fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856	2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov	3fc9bc43c6	dbr quant overhead[4/x]: speed up hook type calculations (#68351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351 Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by bypassing the expensive `torch.nn.Module` getters and setters and fetching `_auto_quant_state` directly. Test Plan: Model level benchmarking is noisy. Individual `cProfile` results: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before get_module_hook_type - 5.96% get_torch_function_hook_type - 2.24% // after get_module_hook_type - 2.10% get_torch_function_hook_type - 0.57% ``` Reviewed By: jerryzh168 Differential Revision: D32463756 Pulled By: vkuzo fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58	2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov	9fba8971a7	dbr quant: move model level utils into own file (#68346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346 Some utility functions for DBR quant need to be aware of `AutoQuantizationState`. This PR moves them into their own file, so they can use the type directly without circular imports, and removes the mypy ignores which are no longer necessary after this change. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463763 Pulled By: vkuzo fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1	2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov	52cc9cb0ee	dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344 Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info` instead of the current op. This will make future performance improvements easier. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463758 Pulled By: vkuzo fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e	2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov	2755cf457c	dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343 Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to use less internal state, makes the function have no side effects by passing the state in the arguments, and moves the function to utils file. This will help with a future refactor to cache this info at runtime. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463760 Pulled By: vkuzo fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b	2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov	57472ec414	dbr quant: refactor `get_quantized_op` to only use `seen_op_info` (#68342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342 Before this PR, `get_quantized_op` required the current callable. After this PR, `get_quantized_op` only requires `seen_op_info`. The signature was changed slightly to return `None` if the original callable does not need replacement for quantization. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463768 Pulled By: vkuzo fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f	2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov	9cf4779ec9	dbr quant: refactor `get_func_output_obs_type` to only use `seen_op_info` (#68341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341 Before this PR, `get_func_output_obs_type` used information from the incoming op and its arguments, which makes it hard to cache. This PR refactors `get_func_output_obs_type` to only use information collected during tracing. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463755 Pulled By: vkuzo fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c	2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov	ed6ef0eec4	dbr quantization: inline scale and zp (#68251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251 Before this PR, DBR quantization used to recalculate scale and zero_point in the converted model every time it was needed, which is slow. This PR creates a pass during the convert function to go through every observer in the model and cache its scale and zero_point. Note: only doing this for observers which correspond to int8 operations is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: VitalyFedyunin Differential Revision: D32463769 Pulled By: vkuzo fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29	2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov	ca499567d2	barebones numeric suite for quantization with dynamic tracing (#67776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776 This adds a barebones `add_loggers` and `extract_logger_info` API to analyze intermediate activations of models using quantization with dynamic tracing. The API generally matches the NS for FX tool, with some omissions. For now, this is moving fast to help us debug real models, and the API will be 100% aligned before this is marketed to users, in future PRs. Note: the current approach couples Numeric Suite with the quantization logic. This is not the best for composability, and may be changed at a future time. Test Plan: ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` Differential Revision: D32231332 D32231332 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a	2021-11-20 15:15:48 -08:00
Vasiliy Kuznetsov	4466ba8f30	Working POC of define-by-run quantization (#64676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676 We implement a working eager mode quantization flow which uses tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization. Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs. Control flow over quantizeable ops is not supported, but general control flow is supported. In particular: * `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function * `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc. * please see `README.md` for more details Test Plan: ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` Differential Revision: D31992281 D31992281 Reviewed By: HDCharles Pulled By: vkuzo fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967	2021-11-11 06:25:24 -08:00

14 Commits