Commit Graph

58 Commits

Author SHA1 Message Date
HDCharles
b5d3d3ebf0 [ao] making hist_obs handle torch.inf and closeby values (#103467)
Summary: This PR does 2 things:

1) Previously this would simply error, now it will ignore any
torch.inf values that it recieves. note: The code checks for torch.inf after
aminmax that way if there are no torch.inf values found, the perf is a
relatively unchanged

2) as mentioned in https://github.com/pytorch/pytorch/issues/100051,
values close to (but not quite at) the maximum/minimum float value could
overflow to infinity in the course of _adjust_min_max() (when this large
value would be multiplied by something in the middle of a calculation
that would otherwise result in a non inf value). This was fixed by
rearranging the order of operations for the lines in question without
altering the actual equations. Specifically, where operations in lines
1095, 1098 and 1100 have multiplication and division of large values,
its better to divide the two large values before multiplying, rather
than multiplying the two large values together (creating overflow) before dividing like it had been.

Test Plan: python test/test_quantization.py
TestObserver.test_histogram_observer_ignore_infinity

python test/test_quantization.py TestObserver.test_histogram_observer_handle_close_to_infinity
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51489345](https://our.internmc.facebook.com/intern/diff/D51489345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103467
Approved by: https://github.com/andrewor14
2023-12-08 21:41:31 +00:00
Jerry Zhang
1474dad28c [quant][pt2e][xnnpack] Add support for QAT dynamic quantization for linear in XNNPACKQuantizer (#113288)
Summary:
FX graph mode quant workflow and also pt2e flow relies on the `is_dynamic` flag in observer/quantizationspec to
convert an observer to dynamic quantization patterns (choose_qparams -> q -> dq), this PR added is_dynamic flag
for all observers so that it's possible to convert these observers to the pattern.

However, this dynamic quantization pattern (choose_qparams -> q -> dq) is actually only valid for MovingAverageObserver(averaging_constant=1)
for the computation before convert and after convert to match in the context of QAT. So we'll have some sanity
checks in other observers to make sure the is_dynamic is False.

Test Plan:
python test/test_quantization.py TestXNNPACKQuantizer.test_qat_dynamic_linear

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51124725](https://our.internmc.facebook.com/intern/diff/D51124725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113288
Approved by: https://github.com/kimishpatel
2023-12-04 23:06:38 +00:00
Aaron Gokaslan
d9f2cf9974 [BE]: Enable ruff rule PIE800 - unnecessary nested dict expansion (#113880)
Adds an additional list which removes unnecessary dict literal unpacking, also applies the fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113880
Approved by: https://github.com/albanD
2023-11-16 22:34:38 +00:00
Paul Zhang
51c2b587c9 Back out "[PyPer][BE] Fix test_scripted_module in StatCollector" (#108588)
Differential Revision: D48908507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108588
Approved by: https://github.com/jerryzh168
2023-09-08 14:33:58 +00:00
Jerry Zhang
32a16d4999 [quant][pt2e] Support int16 quantization (#108453)
Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453
Approved by: https://github.com/kimishpatel
2023-09-06 19:31:20 +00:00
Paul Zhang
4a9c6f1b73 [PyPer][BE] Fix test_scripted_module in StatCollector (#108232)
Summary: D41985889 removed the cast to int for the inputs to torch.histc below, allowing the inputs to still be tensors. These tensors still have require_grad_ set to True, causing issues with the call to torch.histc.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/modules/low_level_modules/tests:stat_collector_test -- --exact 'dper3/dper3/modules/low_level_modules/tests:stat_collector_test - test_scripted_module (dper3.dper3.modules.low_level_modules.tests.stat_collector_test.StatCollectorTest_1)'

Differential Revision: D48800879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108232
Approved by: https://github.com/jerryzh168
2023-09-01 04:23:57 +00:00
Justin Chu
c0d8a4af0a [BE] Enable ruff's UP rules and autoformat ao/ (#105430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105430
Approved by: https://github.com/albanD, https://github.com/malfet
2023-07-19 13:44:37 +00:00
Xuan Xie
6261055471 dst_bin_of_end_center is defined twice (#102755)
(line 995 and line 1011)
both definations are the same.
Delete one of them.

Fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102755
Approved by: https://github.com/janeyx99
2023-06-06 21:17:07 +00:00
Jerry Zhang
df3455b716 [reland][quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220) (#99767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220

Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at

let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```

let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph

(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.

(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization

the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here

Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well

Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Imported from OSS

Differential Revision: D45198323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99767
Approved by: https://github.com/kimishpatel
2023-04-25 16:53:02 +00:00
PyTorch MergeBot
75e754800f Revert "[quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)"
This reverts commit d56adb1b54.

Reverted https://github.com/pytorch/pytorch/pull/99220 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-04-21 18:04:21 +00:00
Jerry Zhang
d56adb1b54 [quant][pt2e][refactor] Cleanup the logic for deciding whether to insert observer/fq or not (#99220)
Summary:
Previously we have two places we need to decide whether to insert observer or fake quantizer or not:
(1) input arguments of a node (2) output of a node, and right now we have separate code to do this
in this PR, the logic is unified in `_needs_obs_or_fq` helper function that takes the target_dtype and is_dynamic from previous output
and target_dtype and is_dynamic for the current Tensor we are looking at

let's use an example for conv node:
```
conv = convolution(input, weight, bias, ...)
```

let's say we have `input_node` object for argument `input`, and `conv_node` for `conv` node in the graph

(1) input arguments, e.g. `input`
the target_dtype/is_dynamic from previous output is the node that produces `input`, we get this from
input_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

the taregt_dtype/is_dynamic for the current argument `input`, comes from conv_node.meta["target_dtype_info"]["input_act_obs_or_fq"]
similarly for weight it comes from conv_node.meta["target"]["weightobs_or_fq"] etc.

(2) output for conv node
the target_dtype/is_dynamic from previous output will be the floating point output from the fp32 convolution operator, so it
is hardcoded to be (torch.float, False), however, technically we should get this from node.meta["val"], but since the
current code base is shared by fx graph mode quantization and pytorch 2.0 export quantization, we cannot do that, we can revisit
after we decide to deprecate fx graph mode quantization

the target_dtype/is_dynamic for the current output comes from conv_node.meta["target_dtype_info"]["output_act_obs_or_fq"]

there is one caveat here about dynamic quantization, that is explained in the comment, so I won't repeat here

Note: also fixed some places in `_get_arg_target_dtype_as_input_to_node` and `_get_arg_target_is_dynamic_as_input_to_node` to make sure "not specified" == specifying a fp32 placeholder observer as well

Next: we can merge the two get target dtype and get is_dynamic function to reduce code duplication

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestQuantizePT2E
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D45167585](https://our.internmc.facebook.com/intern/diff/D45167585)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99220
Approved by: https://github.com/kimishpatel
2023-04-21 16:58:35 +00:00
Kazuaki Ishizaki
a13a63ae9a Fix typos under torch/ao directory (#97679)
This PR fixes typos in comments and messages of `.py` files under `torch/ao` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97679
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-04-10 22:25:15 +00:00
yiliu30
2ea0cb1207 Fix the typo for the docstring of args in the observer (#95887)
This PR fixes the typo in `torch.ao.quantization.observer.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95887
Approved by: https://github.com/kit1980
2023-03-13 23:03:57 +00:00
Jacob Szwejbka
fc324d3485 [quant][pt2e] Add support for dynamic quantization with symmetric quant for input (#94854)
Summary:
Previously we assumed asymmetric quantization for dynamic quantization, this diff adds the support of symmetric quantization
for the input in dynamic quantization

Test Plan: buck run executorch/exir/tests:quant_lowering_custom_backend_pass -- "executorch.exir.tests.test_quant_lowering_custom_backend_pass.TestQuantLoweringCustomBackendPass.test_quantized_linear_dynamic"

Reviewed By: digantdesai

Differential Revision: D43134794

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94854
Approved by: https://github.com/digantdesai
2023-02-28 19:39:31 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Jacob Szwejbka
bb48d90b00 [Executorch][Quant][BE] Refactor Choose_Qparams (#94338)
Summary: Refactor so that it can be decomposed

Test Plan: ci

Differential Revision: D42681268

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94338
Approved by: https://github.com/jerryzh168
2023-02-09 01:20:17 +00:00
Aaron Gokaslan
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Nikita Shulga
c0dd9b3b67 Revert "[Executorch][Quantization][BE] Refactor Choose Qparams (#92592)"
This reverts commit 59071ab1e7.

It breaks `quantization.jit.test_ondevice_quantization.TestOnDeviceDynamicPTQFinalize`, which is not run in OSS, but is mandatory for internal CI.
2023-01-23 09:13:02 -08:00
Jacob Szwejbka
59071ab1e7 [Executorch][Quantization][BE] Refactor Choose Qparams (#92592)
Summary: Should hopefully be a little faster. Definitely cleaner to not create an observer inside the op

Test Plan: ci

Differential Revision: D42154677

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92592
Approved by: https://github.com/jerryzh168
2023-01-20 01:36:47 +00:00
HDCharles
a01c1ee594 [ao] making _is_activation_post_process private with BC (#90554)
same function in observer and quantize, consolidated to a
single function

note: this is a recreation of D40709276 which caused severa breakages due to not maintaining BC for models with cached code with calls to the old function name

Differential Revision: [D41793604](https://our.internmc.facebook.com/intern/diff/D41793604/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41793604/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90554
Approved by: https://github.com/jcaip
2022-12-16 08:09:33 +00:00
HDCharles
e11650887e [ao] fix incorrect integer cast on histogram observer bounds (#90355)
Summary: A cast to int was added in
https://github.com/pytorch/pytorch/pull/45630 to make mypy not complain.
However this leads to unexpected behavior where the histogram doesn't
actually capture the full range of activation values.

note1: the test_histogram_observer_against_reference test was secretly
broken, on master. The random parameters that normally get run apparently don't cause a test failure but if you make a loop repeatedly run the test, it would
eventually fail. This was due to in some cases
sum(<tensor>)!=torch.sum(<tensor>).item(). I was not able to reproduce
this with a toy example but running this test in a loop and editing
either observer to print the calculation for 'total' would break the
test and show different behaviors. Fixing this test was necessary to
land this PR since the changing histogram bounds changed things enough
that this test would error.

note2: updating histogram observer breaks some BC tests unless I regenerate the
model using the HistogramObserver from this PR

Test Plan: python test/test_quantization.py TestHistogramObserver.test_histogram_observer_correct_numel

python test/test_quantization -k histogram

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90355
Approved by: https://github.com/vkuzo
2022-12-12 20:30:44 +00:00
Vasiliy Kuznetsov
22a1b5e243 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)
Summary:

This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers.  This is better aligned
with the reference model spec.

Test plan:

```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
2022-11-24 07:07:34 +00:00
PyTorch MergeBot
9d209e7834 Revert "[ao] making _is_activation_post_process private (#87520)"
This reverts commit 45c62a3377.

Reverted https://github.com/pytorch/pytorch/pull/87520 on behalf of https://github.com/bigfootjon due to Diff reverted internally
2022-11-21 16:48:26 +00:00
Jerry Zhang
940959ebbf [quant][fix] Add quant_min/quant_max for default dynamic quantization observer (#89267)
Summary:
This is needed for choose qparams, but previously it is not configurable, and in the reference quantization flow
with decomposed Tensor, we are making this explicit

Test Plan:
tested in future PR

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89267
Approved by: https://github.com/vkuzo
2022-11-19 16:08:31 +00:00
Kazuaki Ishizaki
1cd6ebe095 Fix typos in messages under torch (#89049)
This PR fixes typos of messages in `.py` files under torch directory.
Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049
Approved by: https://github.com/lezcano
2022-11-17 04:18:14 +00:00
HDCharles
45c62a3377 [ao] making _is_activation_post_process private (#87520)
Summary: same function in observer and quantize, consolidated to a
single function. Note the definitions were slightly different, I've
changed the definition to be maximally inclusive so that the name of the
function is more accurate

Test Plan: python test/test_public_bindings.py
python test/test_quantization.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520
Approved by: https://github.com/jcaip
2022-11-16 21:31:57 +00:00
Alan Lin
4fc0d5341c [PyTorch][Fix] Improve numerical stability of HistogramObserver (#86522)
Summary:
As titled, HistogramObserver may fail in a certain scenario.
Specifically, we originally compute `hist_bin_width` as `(self.max_val - self.min_val) / (self.bins * upsample_rate)`. It's possible that the numerator part is close the the FP32 threshold (1.4e-45) and conducting the division will cause overflow.

Bring some redundent computations to avoid such scenario.

Test Plan: https://pxl.cl/2ggD4 (04490e90ea)

Differential Revision: D40149594

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86522
Approved by: https://github.com/jerryzh168
2022-10-11 01:21:16 +00:00
Digant Desai
071f875046 [quant] Fix per channel weight observer (#85883)
Summary: `per_channel_weight_observer_range_neg_127_to_127` now correctly uses `PerChannelMinMaxObserver` instead of `MinMaxObserver`

Test Plan:
Adds a new test `quantization.core.test_top_level_apis
` to instansiate and run `forward()` on all `default` observers

Differential Revision: D39916482

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85883
Approved by: https://github.com/salilsdesai
2022-09-30 22:02:44 +00:00
Vasiliy Kuznetsov
09965957cd quantization: align observer dtype with reference model spec (#85345)
Summary:

Before this PR, the `dtype` attribute of observers was not clearly
defined.  It originally meant `interface_dtype` in the eager mode
workflow, which is how the codebase before this PR is using it.

In the new reference model spec, `dtype` attribute of an observer
represents the `dtype` value which needs to be passed into a `quantize`
function in the reference model spec. This PR aligns the codebase
to this definition of dtype.  In detail:
1. change util functions to interpret `dtype` using the reference model definition
2. change `prepare` to interpret `dtype` using the reference model definition
3. change observers for dynamic quantization to interpret `dtype` using the reference
   model definition.

A future PR (left out of this one to keep LOC small) will deprecate the
`compute_dtype` field and instead expose `is_dynamic` on observers.
"

Test plan:

```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345
Approved by: https://github.com/z-a-f, https://github.com/jerryzh168
2022-09-21 06:34:26 +00:00
Vasiliy Kuznetsov
1dabb51a16 quant: add extra_repr to HistogramObserver (#84760)
Summary:

Adds `extra_repr` to `HistogramObserver`. This is useful when debugging
PTQ models because it allows to quickly check whether a `HistogramObserver`
has received data or not.

Test plan:
```
>>> import torch
>>> obs = torch.ao.quantization.HistogramObserver()
>>> obs(torch.randn(1, 3, 224, 224))
  ...
>>> print(obs)
// before - hard to tell if observer has seen data
HistogramObserver()
// after
HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607)
>>>
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760
Approved by: https://github.com/andrewor14
2022-09-09 21:21:03 +00:00
Kimish Patel
5c7e801c50 [pytorch][on device quant] Finalize method for ondevice quant (#83571)
Summary:
After inserting quant dequant nodes in the graph, we need
1. Insert packed param creation and quantized op
2. Create packed_params attribute in the top module. For this we need
graph that inlined except for calculate_qparams method calls. But they
can be inlined too. So perhaps we need to make sure no other callmethods
exist.
3. Insert SetAttr for the packed param
4. Insert GetAttr for the packed param
5. Use GetAttr output for quantized op where applicable, e.g.
linear_dynamic

The above is added to quantize_<method-name> method created inprevious
step. Once the above steps are done clone the method into
quantized_<method-name>

Modify quantize_<method-name>:
1. Remove all outputs from the method.
2. Run dce
3. Remove all inputs from the method except self.

Modify quantized_<method-name>:
1. Remove all packed_param setAttr nodes.
2. Run dce.

This should result in removal of all nodes that generate packed param.

Test Plan: To be written

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571
Approved by: https://github.com/jerryzh168
2022-08-29 17:53:11 +00:00
XiaobingSuper
31f151767b add qscheme check for quantization observer (#80126)
Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with **torch.per_channel_affine**, there will have a runtime error at the running the calibration step:

```
AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126
Approved by: https://github.com/jerryzh168
2022-08-25 10:03:19 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
Andrew Or
c44317704a [Quant][fx] Add default configs for fixed qparams ops (#80184)
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
2022-06-29 23:07:26 +00:00
dzdang
e2aa28a2d0 [quant][fx][improvement] Renamed default_affine_fixed_qparams_observer and default_symmetric_fixed_qparams_observer (#76637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637

The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers

The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`

Test Plan:
```
python test/test_quantization.py
```

```
python test/test_quantization.py
```

Differential Revision:
D36054169
D36054169

Reviewed By: vkuzo

Pulled By: dzdang

fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
2022-05-04 02:39:20 +00:00
Vasiliy Kuznetsov
04369f637c quant: rename _ObserverBase to UniformQuantizationObserverBase (#76461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461

Renaming as the old name was confusing. The name represents
better what this class is doing.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D35976350

Pulled By: vkuzo

fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c
(cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)
2022-05-03 05:53:54 +00:00
Vasiliy Kuznetsov
31d5a300ac quant: make RecordingObserver inherit from ObserverBase (#76460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460

`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.

This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D35976351

Pulled By: vkuzo

fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
2022-05-03 05:53:54 +00:00
lkct
9fae0762b0 fix typing in Module.state_dict and load_state_dict
Fixes #72707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73483
Approved by: https://github.com/albanD, https://github.com/jbschlosser
2022-05-02 17:27:54 +00:00
Digant Desai
09f32eba7a [quant] Add default symmetric qat qconfig for qnnpack (#74507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507

* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.

Reviewed By: jerryzh168

Differential Revision: D34804808

fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
2022-03-24 16:19:28 +00:00
Digant Desai
cfe1a41b01 [quant] Add default symmetric qconfig for qnnpack (#74396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396

# New qconfig `default_symmetric_qnnpack_qconfig`

Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.

## Restrictions on weights

Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.

This is driven, in part, by the desire to achieve better performance by XNNPACK ops.

## qengine/backend = `qnnpack` and XNNPACK ops

Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.

## Updated EPS value:
* From PyTorch:

eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`

* Requirement from XNNPACK

For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)

* New minimum allowed scale value

With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,

```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```

With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.

Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.

* Impact on accuracy is unclear as of writing this.

Reviewed By: kimishpatel

Differential Revision: D34625300

fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
2022-03-18 13:42:41 +00:00
Charles David Hernandez
39605a5632 [ao] Removing memoryless observer args for MovingAverage (#73947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947

The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.

TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver

Test Plan: Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34732080

Pulled By: HDCharles

fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
2022-03-11 00:21:49 +00:00
Terry Chen
f67cf03526 [Quant] Add qint32 quantization support (#72472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72472

Add dtype=int32 support for observer

Test Plan:
python3 test/test_quantization.py TestObserver.test_per_tensor_observers

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34056640

fbshipit-source-id: 4fa15a7274cfbb6a7dd4e698e3989cc0c0626e7b
(cherry picked from commit bf4351de45)
2022-02-16 03:45:15 +00:00
Mike Ruberry
7680a0ae9d Deprecates _aminmax (#71576)
Summary:
Replaces https://github.com/pytorch/pytorch/pull/62432. Existing callsites are updated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71576

Reviewed By: ngimel

Differential Revision: D33689960

Pulled By: mruberry

fbshipit-source-id: fad1ba78347ecec7fd48f21862c3eb606662b8f4
(cherry picked from commit 6cd438e9a1)
2022-01-21 09:23:29 +00:00
Terry Chen
33a5905cc6 [quant] fix reduce_range warning (#71027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71027

Fix issue #61054. remove warning
reduce_range=True which caused the error message "UserWarning: Please use quant_min and quant_max to specify the range for observers".

Test Plan:
python test/test_quantization.py TestFakeQuantizeOps

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D33484341

fbshipit-source-id: 97c3d4658926183f88a0c4665451dd7f913d30e6
2022-01-10 20:05:36 -08:00
Vasiliy Kuznetsov
574dbb584d quant tests: fix log spew for HistogramObserver (#70107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107

Histogram observer used floor division on tensors, which is a deprecated
behavior.  There was a warning printed:

```
/Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i
ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct
ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use
torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo
or').
```

This PR fixes the warning.

Test Plan:
```
python test/test_quantization.py TestHistogramObserver
```

Reviewed By: ejguan

Differential Revision: D33187926

Pulled By: vkuzo

fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753
2021-12-28 06:27:51 -08:00
Charles David Hernandez
fc2614537b Updating quantization documentation (#68907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68907

Added information about symmetric
qschemes and corrected an error in reference to https://github.com/pytorch/pytorch/issues/68540

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32662033

fbshipit-source-id: 9052c597f61991934b86850fea8b6eab78397450
2021-12-08 08:32:33 -08:00
Jerry Zhang
ca945d989a [quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249

This PR added default_replay_qconfig and default_replay_observer which is used
when we want to configure an operator to reuse the observer from input, if the input
Tensor for the operator is not observed, we will not observe the output of this operator either,
if the input Tensor is observed, we will observe the output of the operator with the same observer.

e.g.

```
x1 = x0.reshape()
```
if reshape is configured with default_replay_qconfig:
1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance
2. if x0 is not observed, we won't observe x1 either

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_replay_qconfig
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32774723

fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2
2021-12-06 22:56:14 -08:00
andrewor
79b67d9a4a [Quant] Refactor handling of FixedQParams operators (#68143)
Summary:
**Summary**: FixedQParams operators do not need fake quantization
in the prepare step. This commit introduces FixedQParamsObserver
and makes FixedQParamsFakeQuantize a simple wrapper around this
observer. It also removes the fake quantize logic in forward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143

Test Plan:
Added two tests:
python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns
python3 test/test_quantization.py TestQuantizeFx.test_register_patterns

**Reviewers**: Jerry Zhang

**Subscribers**: Jerry Zhang, Supriya Rao

**Tasks**: T104942885

**Tags**: pytorch

Reviewed By: albanD

Differential Revision: D32484427

Pulled By: andrewor14

fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662
2021-11-23 15:26:10 -08:00
Charles David Hernandez
f455030931 Adding a docstring for memoryless in observer args (#67690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690

see title [skip ci]

Test Plan:
python setup.py develop

Imported from OSS

Reviewed By: ejguan

Differential Revision: D32107512

fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e
2021-11-03 12:46:44 -07:00