Summary:
Before this PR, the `dtype` attribute of observers was not clearly
defined. It originally meant `interface_dtype` in the eager mode
workflow, which is how the codebase before this PR is using it.
In the new reference model spec, `dtype` attribute of an observer
represents the `dtype` value which needs to be passed into a `quantize`
function in the reference model spec. This PR aligns the codebase
to this definition of dtype. In detail:
1. change util functions to interpret `dtype` using the reference model definition
2. change `prepare` to interpret `dtype` using the reference model definition
3. change observers for dynamic quantization to interpret `dtype` using the reference
model definition.
A future PR (left out of this one to keep LOC small) will deprecate the
`compute_dtype` field and instead expose `is_dynamic` on observers.
"
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345
Approved by: https://github.com/z-a-f, https://github.com/jerryzh168
Summary:
Adds `extra_repr` to `HistogramObserver`. This is useful when debugging
PTQ models because it allows to quickly check whether a `HistogramObserver`
has received data or not.
Test plan:
```
>>> import torch
>>> obs = torch.ao.quantization.HistogramObserver()
>>> obs(torch.randn(1, 3, 224, 224))
...
>>> print(obs)
// before - hard to tell if observer has seen data
HistogramObserver()
// after
HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607)
>>>
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760
Approved by: https://github.com/andrewor14
Summary:
After inserting quant dequant nodes in the graph, we need
1. Insert packed param creation and quantized op
2. Create packed_params attribute in the top module. For this we need
graph that inlined except for calculate_qparams method calls. But they
can be inlined too. So perhaps we need to make sure no other callmethods
exist.
3. Insert SetAttr for the packed param
4. Insert GetAttr for the packed param
5. Use GetAttr output for quantized op where applicable, e.g.
linear_dynamic
The above is added to quantize_<method-name> method created inprevious
step. Once the above steps are done clone the method into
quantized_<method-name>
Modify quantize_<method-name>:
1. Remove all outputs from the method.
2. Run dce
3. Remove all inputs from the method except self.
Modify quantized_<method-name>:
1. Remove all packed_param setAttr nodes.
2. Run dce.
This should result in removal of all nodes that generate packed param.
Test Plan: To be written
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571
Approved by: https://github.com/jerryzh168
Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with **torch.per_channel_affine**, there will have a runtime error at the running the calibration step:
```
AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126
Approved by: https://github.com/jerryzh168
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637
The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers
The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`
Test Plan:
```
python test/test_quantization.py
```
```
python test/test_quantization.py
```
Differential Revision:
D36054169
D36054169
Reviewed By: vkuzo
Pulled By: dzdang
fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461
Renaming as the old name was confusing. The name represents
better what this class is doing.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976350
Pulled By: vkuzo
fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c
(cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460
`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.
This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976351
Pulled By: vkuzo
fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507
* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.
Reviewed By: jerryzh168
Differential Revision: D34804808
fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396
# New qconfig `default_symmetric_qnnpack_qconfig`
Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.
## Restrictions on weights
Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.
This is driven, in part, by the desire to achieve better performance by XNNPACK ops.
## qengine/backend = `qnnpack` and XNNPACK ops
Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.
## Updated EPS value:
* From PyTorch:
eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`
* Requirement from XNNPACK
For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)
* New minimum allowed scale value
With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,
```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```
With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.
Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.
* Impact on accuracy is unclear as of writing this.
Reviewed By: kimishpatel
Differential Revision: D34625300
fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947
The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.
TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver
Test Plan: Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34732080
Pulled By: HDCharles
fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71027
Fix issue #61054. remove warning
reduce_range=True which caused the error message "UserWarning: Please use quant_min and quant_max to specify the range for observers".
Test Plan:
python test/test_quantization.py TestFakeQuantizeOps
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D33484341
fbshipit-source-id: 97c3d4658926183f88a0c4665451dd7f913d30e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107
Histogram observer used floor division on tensors, which is a deprecated
behavior. There was a warning printed:
```
/Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i
ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct
ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use
torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo
or').
```
This PR fixes the warning.
Test Plan:
```
python test/test_quantization.py TestHistogramObserver
```
Reviewed By: ejguan
Differential Revision: D33187926
Pulled By: vkuzo
fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249
This PR added default_replay_qconfig and default_replay_observer which is used
when we want to configure an operator to reuse the observer from input, if the input
Tensor for the operator is not observed, we will not observe the output of this operator either,
if the input Tensor is observed, we will observe the output of the operator with the same observer.
e.g.
```
x1 = x0.reshape()
```
if reshape is configured with default_replay_qconfig:
1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance
2. if x0 is not observed, we won't observe x1 either
Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_replay_qconfig
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32774723
fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2
Summary:
**Summary**: FixedQParams operators do not need fake quantization
in the prepare step. This commit introduces FixedQParamsObserver
and makes FixedQParamsFakeQuantize a simple wrapper around this
observer. It also removes the fake quantize logic in forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143
Test Plan:
Added two tests:
python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns
python3 test/test_quantization.py TestQuantizeFx.test_register_patterns
**Reviewers**: Jerry Zhang
**Subscribers**: Jerry Zhang, Supriya Rao
**Tasks**: T104942885
**Tags**: pytorch
Reviewed By: albanD
Differential Revision: D32484427
Pulled By: andrewor14
fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379
Description:
Creates a quantization API reference and fixes all the docblock errors.
This is #66122 to #66210 squashed together
Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```
Reviewed By: ejguan
Differential Revision: D31543172
Pulled By: vkuzo
fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125
Before this PR, the documentation for observers and fake_quants was inlined in the
Eager mode quantization page. This was hard to discover, especially
since that page is really long, and we now have FX graph mode quantization reusing
all of this code.
This PR moves observers and fake_quants into their own documentation pages. It also
adds docstrings to all user facing module attributes such as the default observers
and fake_quants, so people can discover them from documentation without having
to inspect the source code.
For now, enables autoformatting (which means all public classes, functions, members
with docstrings will get docs). If we need to exclude something in these files from
docs in the future, we can go back to manual docs.
Test Plan:
```
cd docs
make html
python -m server.http
// inspect docs on localhost, renders correctly
```
Reviewed By: dagitses
Differential Revision: D31447613
Pulled By: vkuzo
fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674
Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules.
With this PR they can use either the static or dynamic quantization APIs for Embedding quantization
The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float
method of the quantized Embedding/Embedding modules.
To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type.
The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32.
Addresses Issue #65185
ghstack-source-id: 139935419
Test Plan:
python test/test_quantization.py
Imported from OSS
Reviewed By: gchanan
Differential Revision: D31211199
fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058
After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change.
This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location.
This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace.
Test Plan: `python test/test_quantization.py`
Reviewed By: vkuzo
Differential Revision: D31366066
Pulled By: z-a-f
fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699
related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425
The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters.
This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory)
In addition to the above, I altered the reset_min_max_vals
function for MinMaxObserver so that it would preserve the device of the
existing self.min_val and self.max_val which was not preserved
previously compared to how it is initialized (using factory_kwargs)
Test Plan:
python test/test_quantization.py TestObserver
(added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver)
Imported from OSS
Reviewed By: supriyar
Differential Revision: D31209773
fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id: 138303325
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: jerryzh168
Differential Revision: D30899082
fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9