Summary: This PR does 2 things:
1) Previously this would simply error, now it will ignore any
torch.inf values that it recieves. note: The code checks for torch.inf after
aminmax that way if there are no torch.inf values found, the perf is a
relatively unchanged
2) as mentioned in https://github.com/pytorch/pytorch/issues/100051,
values close to (but not quite at) the maximum/minimum float value could
overflow to infinity in the course of _adjust_min_max() (when this large
value would be multiplied by something in the middle of a calculation
that would otherwise result in a non inf value). This was fixed by
rearranging the order of operations for the lines in question without
altering the actual equations. Specifically, where operations in lines
1095, 1098 and 1100 have multiplication and division of large values,
its better to divide the two large values before multiplying, rather
than multiplying the two large values together (creating overflow) before dividing like it had been.
Test Plan: python test/test_quantization.py
TestObserver.test_histogram_observer_ignore_infinity
python test/test_quantization.py TestObserver.test_histogram_observer_handle_close_to_infinity
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D51489345](https://our.internmc.facebook.com/intern/diff/D51489345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103467
Approved by: https://github.com/andrewor14
Summary:
FX graph mode quant workflow and also pt2e flow relies on the `is_dynamic` flag in observer/quantizationspec to
convert an observer to dynamic quantization patterns (choose_qparams -> q -> dq), this PR added is_dynamic flag
for all observers so that it's possible to convert these observers to the pattern.
However, this dynamic quantization pattern (choose_qparams -> q -> dq) is actually only valid for MovingAverageObserver(averaging_constant=1)
for the computation before convert and after convert to match in the context of QAT. So we'll have some sanity
checks in other observers to make sure the is_dynamic is False.
Test Plan:
python test/test_quantization.py TestXNNPACKQuantizer.test_qat_dynamic_linear
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D51124725](https://our.internmc.facebook.com/intern/diff/D51124725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113288
Approved by: https://github.com/kimishpatel
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.
Test Plan:
python test/test_quantization.py TestQuantizePT2E
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
Summary: D41985889 removed the cast to int for the inputs to torch.histc below, allowing the inputs to still be tensors. These tensors still have require_grad_ set to True, causing issues with the call to torch.histc.
Test Plan: buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/modules/low_level_modules/tests:stat_collector_test -- --exact 'dper3/dper3/modules/low_level_modules/tests:stat_collector_test - test_scripted_module (dper3.dper3.modules.low_level_modules.tests.stat_collector_test.StatCollectorTest_1)'
Differential Revision: D48800879
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108232
Approved by: https://github.com/jerryzh168
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at
let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```
let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph
(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.
(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization
the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here
Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well
Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels
Imported from OSS
Differential Revision: D45198323
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767
Approved by: https://github.com/kimishpatel
Summary:
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at
let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```
let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph
(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.
(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization
the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]
there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here
Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well
Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Approved by: https://github.com/kimishpatel
Summary:
Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization
for the input in dynamic quantization
Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic"
Reviewed By: digantdesai
Differential Revision: D43134794
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854
Approved by: https://github.com/digantdesai
This reverts commit 59071ab1e7.
It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.
Summary: A cast to int was added in
https://github.com/pytorch/pytorch/pull/45630 to make mypy not complain.
However this leads to unexpected behavior where the histogram doesn't
actually capture the full range of activation values.
note1: the test_histogram_observer_against_reference test was secretly
broken, on master. The random parameters that normally get run apparently don't cause a test failure but if you make a loop repeatedly run the test, it would
eventually fail. This was due to in some cases
sum(<tensor>)!=torch.sum(<tensor>).item(). I was not able to reproduce
this with a toy example but running this test in a loop and editing
either observer to print the calculation for 'total' would break the
test and show different behaviors. Fixing this test was necessary to
land this PR since the changing histogram bounds changed things enough
that this test would error.
note2: updating histogram observer breaks some BC tests unless I regenerate the
model using the HistogramObserver from this PR
Test Plan: python test/test_quantization.py TestHistogramObserver.test_histogram_observer_correct_numel
python test/test_quantization -k histogram
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90355
Approved by: https://github.com/vkuzo
Summary:
This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers. This is better aligned
with the reference model spec.
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
Summary:
This is needed for choose qparams, but previously it is not configurable, and in the reference quantization flow
with decomposed Tensor, we are making this explicit
Test Plan:
tested in future PR
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89267
Approved by: https://github.com/vkuzo
Summary: same function in observer and quantize, consolidated to a
single function. Note the definitions were slightly different, I've
changed the definition to be maximally inclusive so that the name of the
function is more accurate
Test Plan: python test/test_public_bindings.py
python test/test_quantization.py
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520
Approved by: https://github.com/jcaip
Summary:
As titled, HistogramObserver may fail in a certain scenario.
Specifically, we originally compute `hist_bin_width` as `(self.max_val - self.min_val) / (self.bins * upsample_rate)`. It's possible that the numerator part is close the the FP32 threshold (1.4e-45) and conducting the division will cause overflow.
Bring some redundent computations to avoid such scenario.
Test Plan: https://pxl.cl/2ggD4 (04490e90ea)
Differential Revision: D40149594
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86522
Approved by: https://github.com/jerryzh168
Summary: `per_channel_weight_observer_range_neg_127_to_127` now correctly uses `PerChannelMinMaxObserver` instead of `MinMaxObserver`
Test Plan:
Adds a new test `quantization.core.test_top_level_apis
` to instansiate and run `forward()` on all `default` observers
Differential Revision: D39916482
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85883
Approved by: https://github.com/salilsdesai
Summary:
Before this PR, the `dtype` attribute of observers was not clearly
defined. It originally meant `interface_dtype` in the eager mode
workflow, which is how the codebase before this PR is using it.
In the new reference model spec, `dtype` attribute of an observer
represents the `dtype` value which needs to be passed into a `quantize`
function in the reference model spec. This PR aligns the codebase
to this definition of dtype. In detail:
1. change util functions to interpret `dtype` using the reference model definition
2. change `prepare` to interpret `dtype` using the reference model definition
3. change observers for dynamic quantization to interpret `dtype` using the reference
model definition.
A future PR (left out of this one to keep LOC small) will deprecate the
`compute_dtype` field and instead expose `is_dynamic` on observers.
"
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345
Approved by: https://github.com/z-a-f, https://github.com/jerryzh168
Summary:
Adds `extra_repr` to `HistogramObserver`. This is useful when debugging
PTQ models because it allows to quickly check whether a `HistogramObserver`
has received data or not.
Test plan:
```
>>> import torch
>>> obs = torch.ao.quantization.HistogramObserver()
>>> obs(torch.randn(1, 3, 224, 224))
...
>>> print(obs)
// before - hard to tell if observer has seen data
HistogramObserver()
// after
HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607)
>>>
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760
Approved by: https://github.com/andrewor14
Summary:
After inserting quant dequant nodes in the graph, we need
1. Insert packed param creation and quantized op
2. Create packed_params attribute in the top module. For this we need
graph that inlined except for calculate_qparams method calls. But they
can be inlined too. So perhaps we need to make sure no other callmethods
exist.
3. Insert SetAttr for the packed param
4. Insert GetAttr for the packed param
5. Use GetAttr output for quantized op where applicable, e.g.
linear_dynamic
The above is added to quantize_<method-name> method created inprevious
step. Once the above steps are done clone the method into
quantized_<method-name>
Modify quantize_<method-name>:
1. Remove all outputs from the method.
2. Run dce
3. Remove all inputs from the method except self.
Modify quantized_<method-name>:
1. Remove all packed_param setAttr nodes.
2. Run dce.
This should result in removal of all nodes that generate packed param.
Test Plan: To be written
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571
Approved by: https://github.com/jerryzh168
Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with **torch.per_channel_affine**, there will have a runtime error at the running the calibration step:
```
AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126
Approved by: https://github.com/jerryzh168
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637
The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers
The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`
Test Plan:
```
python test/test_quantization.py
```
```
python test/test_quantization.py
```
Differential Revision:
D36054169
D36054169
Reviewed By: vkuzo
Pulled By: dzdang
fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461
Renaming as the old name was confusing. The name represents
better what this class is doing.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976350
Pulled By: vkuzo
fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c
(cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460
`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.
This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976351
Pulled By: vkuzo
fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507
* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.
Reviewed By: jerryzh168
Differential Revision: D34804808
fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396
# New qconfig `default_symmetric_qnnpack_qconfig`
Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.
## Restrictions on weights
Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.
This is driven, in part, by the desire to achieve better performance by XNNPACK ops.
## qengine/backend = `qnnpack` and XNNPACK ops
Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.
## Updated EPS value:
* From PyTorch:
eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`
* Requirement from XNNPACK
For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)
* New minimum allowed scale value
With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,
```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```
With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.
Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.
* Impact on accuracy is unclear as of writing this.
Reviewed By: kimishpatel
Differential Revision: D34625300
fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947
The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.
TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver
Test Plan: Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34732080
Pulled By: HDCharles
fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71027
Fix issue #61054. remove warning
reduce_range=True which caused the error message "UserWarning: Please use quant_min and quant_max to specify the range for observers".
Test Plan:
python test/test_quantization.py TestFakeQuantizeOps
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D33484341
fbshipit-source-id: 97c3d4658926183f88a0c4665451dd7f913d30e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107
Histogram observer used floor division on tensors, which is a deprecated
behavior. There was a warning printed:
```
/Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i
ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct
ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use
torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo
or').
```
This PR fixes the warning.
Test Plan:
```
python test/test_quantization.py TestHistogramObserver
```
Reviewed By: ejguan
Differential Revision: D33187926
Pulled By: vkuzo
fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249
This PR added default_replay_qconfig and default_replay_observer which is used
when we want to configure an operator to reuse the observer from input, if the input
Tensor for the operator is not observed, we will not observe the output of this operator either,
if the input Tensor is observed, we will observe the output of the operator with the same observer.
e.g.
```
x1 = x0.reshape()
```
if reshape is configured with default_replay_qconfig:
1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance
2. if x0 is not observed, we won't observe x1 either
Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_replay_qconfig
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32774723
fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2
Summary:
**Summary**: FixedQParams operators do not need fake quantization
in the prepare step. This commit introduces FixedQParamsObserver
and makes FixedQParamsFakeQuantize a simple wrapper around this
observer. It also removes the fake quantize logic in forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143
Test Plan:
Added two tests:
python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns
python3 test/test_quantization.py TestQuantizeFx.test_register_patterns
**Reviewers**: Jerry Zhang
**Subscribers**: Jerry Zhang, Supriya Rao
**Tasks**: T104942885
**Tags**: pytorch
Reviewed By: albanD
Differential Revision: D32484427
Pulled By: andrewor14
fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662