pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Vasiliy Kuznetsov	09965957cd	quantization: align observer dtype with reference model spec (#85345 ) Summary: Before this PR, the `dtype` attribute of observers was not clearly defined. It originally meant `interface_dtype` in the eager mode workflow, which is how the codebase before this PR is using it. In the new reference model spec, `dtype` attribute of an observer represents the `dtype` value which needs to be passed into a `quantize` function in the reference model spec. This PR aligns the codebase to this definition of dtype. In detail: 1. change util functions to interpret `dtype` using the reference model definition 2. change `prepare` to interpret `dtype` using the reference model definition 3. change observers for dynamic quantization to interpret `dtype` using the reference model definition. A future PR (left out of this one to keep LOC small) will deprecate the `compute_dtype` field and instead expose `is_dynamic` on observers. " Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345 Approved by: https://github.com/z-a-f, https://github.com/jerryzh168	2022-09-21 06:34:26 +00:00
Vasiliy Kuznetsov	1dabb51a16	quant: add `extra_repr` to HistogramObserver (#84760 ) Summary: Adds `extra_repr` to `HistogramObserver`. This is useful when debugging PTQ models because it allows to quickly check whether a `HistogramObserver` has received data or not. Test plan: ``` >>> import torch >>> obs = torch.ao.quantization.HistogramObserver() >>> obs(torch.randn(1, 3, 224, 224)) ... >>> print(obs) // before - hard to tell if observer has seen data HistogramObserver() // after HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607) >>> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760 Approved by: https://github.com/andrewor14	2022-09-09 21:21:03 +00:00
Kimish Patel	5c7e801c50	[pytorch][on device quant] Finalize method for ondevice quant (#83571 ) Summary: After inserting quant dequant nodes in the graph, we need 1. Insert packed param creation and quantized op 2. Create packed_params attribute in the top module. For this we need graph that inlined except for calculate_qparams method calls. But they can be inlined too. So perhaps we need to make sure no other callmethods exist. 3. Insert SetAttr for the packed param 4. Insert GetAttr for the packed param 5. Use GetAttr output for quantized op where applicable, e.g. linear_dynamic The above is added to quantize_<method-name> method created inprevious step. Once the above steps are done clone the method into quantized_<method-name> Modify quantize_<method-name>: 1. Remove all outputs from the method. 2. Run dce 3. Remove all inputs from the method except self. Modify quantized_<method-name>: 1. Remove all packed_param setAttr nodes. 2. Run dce. This should result in removal of all nodes that generate packed param. Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571 Approved by: https://github.com/jerryzh168	2022-08-29 17:53:11 +00:00
XiaobingSuper	31f151767b	add qscheme check for quantization observer (#80126 ) Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with torch.per_channel_affine, there will have a runtime error at the running the calibration step: ``` AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126 Approved by: https://github.com/jerryzh168	2022-08-25 10:03:19 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Andrew Or	c44317704a	[Quant][fx] Add default configs for fixed qparams ops (#80184 ) Summary: This commit adds qconfigs with special observers for fixed qparams ops in get_default_qconfig_mapping and get_default_qat_qconfig_mapping. For correctness, we also require users to use these special observers if we detect these fixed qparams ops in prepare. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184 Approved by: https://github.com/jerryzh168	2022-06-29 23:07:26 +00:00
dzdang	e2aa28a2d0	[quant][fx][improvement] Renamed default_affine_fixed_qparams_observer and default_symmetric_fixed_qparams_observer (#76637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637 The previous naming convention `default_affine_fixed_qparams_observer` and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read the definition in order to understand what these observers are. The new naming convention reveals information about the range of the observers The analogous changes were also made for `default_symmetric_fixed_qparams_fake_quant` and `default_affine_fixed_qparams_fake_quant` Test Plan: ``` python test/test_quantization.py ``` ``` python test/test_quantization.py ``` Differential Revision: D36054169 D36054169 Reviewed By: vkuzo Pulled By: dzdang fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9 (cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)	2022-05-04 02:39:20 +00:00
Vasiliy Kuznetsov	04369f637c	quant: rename _ObserverBase to UniformQuantizationObserverBase (#76461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461 Renaming as the old name was confusing. The name represents better what this class is doing. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D35976350 Pulled By: vkuzo fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c (cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)	2022-05-03 05:53:54 +00:00
Vasiliy Kuznetsov	31d5a300ac	quant: make RecordingObserver inherit from ObserverBase (#76460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460 `RecordingObserver` inherits from `_ObserverBase` but does not use any functionality from it. Making it inherit from `ObserverBase` instead. This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D35976351 Pulled By: vkuzo fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6 (cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)	2022-05-03 05:53:54 +00:00
lkct	9fae0762b0	fix typing in `Module.state_dict` and `load_state_dict` Fixes #72707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73483 Approved by: https://github.com/albanD, https://github.com/jbschlosser	2022-05-02 17:27:54 +00:00
Digant Desai	09f32eba7a	[quant] Add default symmetric qat qconfig for qnnpack (#74507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507 * This is the default symmetric qat qconfigs for qnnpack. * Support for symmetric quantization is not available from other backends. * Observers are similar to symmetric PTQ qconfigs for qnnpack. Reviewed By: jerryzh168 Differential Revision: D34804808 fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164 (cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)	2022-03-24 16:19:28 +00:00
Digant Desai	cfe1a41b01	[quant] Add default symmetric qconfig for qnnpack (#74396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value*2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828 (cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)	2022-03-18 13:42:41 +00:00
Charles David Hernandez	39605a5632	[ao] Removing memoryless observer args for MovingAverage (#73947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947 The original implementation of memoryless observers used MinMaxObservers and a memoryless argument to manipulate the behavior of the observer such that it wouldn't keep track of previously observed min and max's. It was later pointed out that this was equivalent to a movingaverageobserver with averaging_constant=1 which is requires less overhead and no 1 off args (memoryless) so this PR refactors the memoryless arg and uses MovingAverage observers instead, although the memoryless adjective is still used, a complete definintion was also added to clarify error messages given these changes. TestPlan python test/test_quantization.py TestQuantizeEagerQAT python test/test_quantization.py TestObserver Test Plan: Imported from OSS Reviewed By: andrewor14 Differential Revision: D34732080 Pulled By: HDCharles fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a (cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)	2022-03-11 00:21:49 +00:00
Terry Chen	f67cf03526	[Quant] Add qint32 quantization support (#72472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72472 Add dtype=int32 support for observer Test Plan: python3 test/test_quantization.py TestObserver.test_per_tensor_observers Imported from OSS Reviewed By: jerryzh168 Differential Revision: D34056640 fbshipit-source-id: 4fa15a7274cfbb6a7dd4e698e3989cc0c0626e7b (cherry picked from commit `bf4351de45`)	2022-02-16 03:45:15 +00:00
Mike Ruberry	7680a0ae9d	Deprecates _aminmax (#71576 ) Summary: Replaces https://github.com/pytorch/pytorch/pull/62432. Existing callsites are updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71576 Reviewed By: ngimel Differential Revision: D33689960 Pulled By: mruberry fbshipit-source-id: fad1ba78347ecec7fd48f21862c3eb606662b8f4 (cherry picked from commit `6cd438e9a1`)	2022-01-21 09:23:29 +00:00
Terry Chen	33a5905cc6	[quant] fix reduce_range warning (#71027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71027 Fix issue #61054. remove warning reduce_range=True which caused the error message "UserWarning: Please use quant_min and quant_max to specify the range for observers". Test Plan: python test/test_quantization.py TestFakeQuantizeOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D33484341 fbshipit-source-id: 97c3d4658926183f88a0c4665451dd7f913d30e6	2022-01-10 20:05:36 -08:00
Vasiliy Kuznetsov	574dbb584d	quant tests: fix log spew for HistogramObserver (#70107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107 Histogram observer used floor division on tensors, which is a deprecated behavior. There was a warning printed: ``` /Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo or'). ``` This PR fixes the warning. Test Plan: ``` python test/test_quantization.py TestHistogramObserver ``` Reviewed By: ejguan Differential Revision: D33187926 Pulled By: vkuzo fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753	2021-12-28 06:27:51 -08:00
Charles David Hernandez	fc2614537b	Updating quantization documentation (#68907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68907 Added information about symmetric qschemes and corrected an error in reference to https://github.com/pytorch/pytorch/issues/68540 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32662033 fbshipit-source-id: 9052c597f61991934b86850fea8b6eab78397450	2021-12-08 08:32:33 -08:00
Jerry Zhang	ca945d989a	[quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249 This PR added default_replay_qconfig and default_replay_observer which is used when we want to configure an operator to reuse the observer from input, if the input Tensor for the operator is not observed, we will not observe the output of this operator either, if the input Tensor is observed, we will observe the output of the operator with the same observer. e.g. ``` x1 = x0.reshape() ``` if reshape is configured with default_replay_qconfig: 1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance 2. if x0 is not observed, we won't observe x1 either Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_replay_qconfig ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32774723 fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2	2021-12-06 22:56:14 -08:00
andrewor	79b67d9a4a	[Quant] Refactor handling of FixedQParams operators (#68143 ) Summary: Summary: FixedQParams operators do not need fake quantization in the prepare step. This commit introduces FixedQParamsObserver and makes FixedQParamsFakeQuantize a simple wrapper around this observer. It also removes the fake quantize logic in forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143 Test Plan: Added two tests: python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns python3 test/test_quantization.py TestQuantizeFx.test_register_patterns Reviewers: Jerry Zhang Subscribers: Jerry Zhang, Supriya Rao Tasks: T104942885 Tags: pytorch Reviewed By: albanD Differential Revision: D32484427 Pulled By: andrewor14 fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662	2021-11-23 15:26:10 -08:00
Charles David Hernandez	f455030931	Adding a docstring for memoryless in observer args (#67690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690 see title [skip ci] Test Plan: python setup.py develop Imported from OSS Reviewed By: ejguan Differential Revision: D32107512 fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e	2021-11-03 12:46:44 -07:00
Vasiliy Kuznetsov	8b1258698e	Improve quantization API docs (#66379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379 Description: Creates a quantization API reference and fixes all the docblock errors. This is #66122 to #66210 squashed together Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, looks good ``` Reviewed By: ejguan Differential Revision: D31543172 Pulled By: vkuzo fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9	2021-10-11 18:46:11 -07:00
Mike Ruberry	b85fd4c54f	Revert D31447613: Create separate documentation pages for quantization observers and fake_quants Test Plan: revert-hammer Differential Revision: D31447613 (`f0fa3d1110`) Original commit changeset: 63b4cf518bad fbshipit-source-id: 67de592d1e12a5149cdb22b0725caad063f94476	2021-10-10 01:51:11 -07:00
Vasiliy Kuznetsov	f0fa3d1110	Create separate documentation pages for quantization observers and fake_quants (#66125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125 Before this PR, the documentation for observers and fake_quants was inlined in the Eager mode quantization page. This was hard to discover, especially since that page is really long, and we now have FX graph mode quantization reusing all of this code. This PR moves observers and fake_quants into their own documentation pages. It also adds docstrings to all user facing module attributes such as the default observers and fake_quants, so people can discover them from documentation without having to inspect the source code. For now, enables autoformatting (which means all public classes, functions, members with docstrings will get docs). If we need to exclude something in these files from docs in the future, we can go back to manual docs. Test Plan: ``` cd docs make html python -m server.http // inspect docs on localhost, renders correctly ``` Reviewed By: dagitses Differential Revision: D31447613 Pulled By: vkuzo fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599	2021-10-09 06:45:56 -07:00
Supriya Rao	8a974a482c	[quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674 Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules. With this PR they can use either the static or dynamic quantization APIs for Embedding quantization The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float method of the quantized Embedding/Embedding modules. To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type. The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32. Addresses Issue #65185 ghstack-source-id: 139935419 Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: gchanan Differential Revision: D31211199 fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4	2021-10-06 23:19:38 -07:00
Zafar	0d020effab	[quant] Fix the parts that were missing after initial migration (#66058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058 After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change. This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location. This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace. Test Plan: `python test/test_quantization.py` Reviewed By: vkuzo Differential Revision: D31366066 Pulled By: z-a-f fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5	2021-10-05 11:45:37 -07:00
Charles David Hernandez	6d4b93bd96	[quant] adding memoryless observers for embeddingbag QAT work (#65699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699 related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425 The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters. This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory) In addition to the above, I altered the reset_min_max_vals function for MinMaxObserver so that it would preserve the device of the existing self.min_val and self.max_val which was not preserved previously compared to how it is initialized (using factory_kwargs) Test Plan: python test/test_quantization.py TestObserver (added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver) Imported from OSS Reviewed By: supriyar Differential Revision: D31209773 fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a	2021-09-30 00:55:32 -07:00
Zafar Takhirov	02dec91212	[quant] AO migration of the `torch/quantization/utils.py` (phase 1) (#64919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities. ghstack-source-id: 138303325 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: jerryzh168 Differential Revision: D30899082 fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9	2021-09-16 21:30:18 -07:00
Charles David Hernandez	f309f8fbd4	[quant] ao migration of observer and qconfig (#64982 ) Summary: (Had to recreate this diff so it wasn't dependent on the stack) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982 migration of qconfig.py and observer.py to torch/ao/quantization using new test format ghstack-source-id: 138215256 Test Plan: buck test mode/opt //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/ buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/3940649742829796 Reviewed By: z-a-f Differential Revision: D30982534 fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9	2021-09-16 10:33:16 -07:00

29 Commits