Commit Graph

1079 Commits

Author SHA1 Message Date
Andrew Or
c7b4eec233 [Quant][fx][bc-breaking] Replace qconfig_dict with a config object (#78452)
**Summary:** Previously, FX graph mode quantization configurations
were specified through a dictionary of qconfigs. However, this
API was not in line with other core APIs in PyTorch. This commit
replaces this dictionary with a config object that users will
create and pass to prepare and convert. This leads to better
type safety and better user experience in notebook settings
due to improved auto completion.

The new API is as follows:

```
from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.quantize_fx import prepare_fx

qconfig_mapping = QConfigMapping()
    .set_global(qconfig)
    .set_object_type(torch.nn.Linear, qconfig)
    .set_module_name_regex("foo.*bar", qconfig)
    .set_module_name("mod", qconfig)

prepare_fx(model, qconfig_mapping)
```

For backwards compatibility, `prepare_fx`, `prepare_qat_fx`,
and `convert_fx` will continue to accept qconfig_dicts, which
will be converted to QuantizationConfigs internally.

Note that this commit does not modify existing tests to use the
new API; they will continue to pass in qconfig_dict as before,
which still works but triggers a deprecation warning. This will
be handled in a future commit.

**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

**Reviewers:** jerryzh168, vkuzo

**Subscribers:** jerryzh168, vkuzo

Differential Revision: D36747998

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452
Approved by: https://github.com/jerryzh168
2022-05-30 18:30:07 +00:00
Jerry Zhang
8225f42a8a [quant][fx][equalization] Fix example_inputs follow ups in test_equalize_fx
Summary:
as a followup to https://github.com/pytorch/pytorch/pull/76496, we defined model specific example_inputs
for the test models in common_quantization.py and used these in test_equalize_fx

Test Plan:
python test/test_quantization.py TestEqualizeFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78314

Approved by: https://github.com/vkuzo
2022-05-26 01:42:24 +00:00
Jerry Zhang
7ea5fa3dd4 [reland][quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286

Approved by: https://github.com/dzdang
2022-05-25 23:31:51 +00:00
Jerry Zhang
716f76716a [quant] Skip some broken tests due to hypothesis
Summary:
Some quantization tests failed when we didn't touch any code related to the tests, all of them
are using hypothesis, it's likely that hypothesis is the problem. We will skip these tests for now and
gradually remove all hypothesis tests from quantization test code, or skip running the hypothesis tests in CI

Test Plan:
ossci

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78302

Approved by: https://github.com/suo, https://github.com/dzdang
2022-05-25 21:46:11 +00:00
Vasiliy Kuznetsov
53e05ad4b2 ns for fx: remove restriction on nodes with no args and only kwargs
Summary:

Removes the restriction from NS for FX on handling nodes which have
no positional arguments, such as `F.linear(input=x, weight=w, bias=b).

In order to achieve this, we delete all places in the code which
were doing things like

```
node.args[0]
```

And replace them with

```
_get_normalized_nth_input(node, gm, 0)
```

The `_get_normalized_nth_input` function is a best effort way to
get the n'th normalized input.

This is needed because some FX tools output nodes normalized to
be kwargs only, and we need to be able to handle this in NS.

Test plan:

```
python test/test_quantization.py -k test_linear_kwargs_shadow
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78181

Approved by: https://github.com/z-a-f, https://github.com/hx89
2022-05-25 17:00:39 +00:00
PyTorch MergeBot
87148f2b59 Revert "[quant] Add utility function get_fqn_to_example_inputs"
This reverts commit 50a44fe461.

Reverted https://github.com/pytorch/pytorch/pull/78146 on behalf of https://github.com/suo due to as it broke master
2022-05-25 06:37:32 +00:00
Jerry Zhang
50a44fe461 [quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146

Approved by: https://github.com/vkuzo
2022-05-25 03:07:16 +00:00
dzdang
2aad28a539 [quant][core][gpu][feature] Implemented quantized cuda gelu
Summary:
Support for quantized cuda gelu has been provided by using
`dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this
is not equivalent to doing int8 gelu, so we have opted for this approach
for now. It might be possible to write a variant of the int8 gelu that's
equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which
can be a topic for future work.

Test function `test_qgelu` was amended to test gelu for quantized cuda
backends.

Test Plan:
```
python test/test_quantization.py -k test_qgelu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77212

Approved by: https://github.com/jerryzh168
2022-05-24 22:31:45 +00:00
Jerry Zhang
416899d1a9 [quant][fx][bc-breaking] Add required example_args argument to prepare_fx and prepare_qat_fx (#249) (#77608)
Summary:
X-link: https://github.com/facebookresearch/d2go/pull/249

X-link: https://github.com/fairinternal/ClassyVision/pull/104

X-link: https://github.com/pytorch/benchmark/pull/916

X-link: https://github.com/facebookresearch/ClassyVision/pull/791

X-link: https://github.com/facebookresearch/mobile-vision/pull/68

FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to
insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors.
Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base.

As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args
so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide
example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but
it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now.

If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to
pass the arguments by keyword

BC-breaking Note:
Before:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict)
# or
m = prepare_qat_fx(m, qconfig_dict)
```
After:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
# or
m = prepare_qat_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Imported from OSS

**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D35984526/V30/classyvision/)|

|**Modified Pages**|

Reviewed By: vkuzo, andrewor14

Differential Revision: D35984526

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77608
Approved by: https://github.com/dzdang
2022-05-21 21:03:48 +00:00
Zafar
44c91383d3 [quant][ao_migration] Base package in tests
Adding a base package as an argument to the testing routines.
That will allow us to test other locations that are being migrated.
For example

```
AOMigrationTestCase._test_package_import('my_mackage', base='quantization')
```

would check if `torch.quantization.my_package` and `torch.ao.quantization.my_package` are the same.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77064

Approved by: https://github.com/jerryzh168
2022-05-20 18:43:37 +00:00
Xiang Gao
f274558018 Bitwise ops improvements (#77621)
- Bitwise shift remove floating point support
- Bitwise and, or, xor add (scalar, tensor) overload
- Use `test_ops.py` to test these ops, including error cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77621
Approved by: https://github.com/ngimel
2022-05-17 21:16:42 +00:00
PyTorch MergeBot
981719fe5a Revert "[quant][core][gpu][feature] Implemented quantized cuda gelu"
This reverts commit b892b85b88.

Reverted https://github.com/pytorch/pytorch/pull/77212 on behalf of https://github.com/facebook-github-bot
2022-05-14 00:17:51 +00:00
dzdang
b892b85b88 [quant][core][gpu][feature] Implemented quantized cuda gelu
Summary:
Support for quantized cuda gelu has been provided by using
`dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this
is not equivalent to doing int8 gelu, so we have opted for this approach
for now. It might be possible to write a variant of the int8 gelu that's
equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which
can be a topic for future work.

Test function `test_qgelu` was amended to test gelu for quantized cuda
backends.

Test Plan:
```
python test/test_quantization.py -k test_qgelu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77212

Approved by: https://github.com/jerryzh168
2022-05-13 20:59:24 +00:00
Vasiliy Kuznetsov
d8479098a6 ns for fx: remove quantized ReLU6 from mapping
Summary:

This module is no longer swapped by FX graph mode quantization,
because it can take quantized inputs. Removing it from NS for FX
mappings.

Test plan:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76992

Approved by: https://github.com/jerryzh168
2022-05-13 20:38:31 +00:00
Vasiliy Kuznetsov
6a33b80191 ns for fx: remove GroupNorm from mapping
Summary:

GroupNorm quantization is defined but it looks like FX graph
mode quantization does not have it enabled.

Removing it from NS for FX.

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76991

Approved by: https://github.com/jerryzh168
2022-05-13 20:33:27 +00:00
Vasiliy Kuznetsov
20b75e3e5f ns for fx: clean up convtranspose mappings
Summary:

Fixes a couple of problems with `ConvTranspose` in NS mappings:
1. deletes the dynamic versions, as they do not work yet
2. deletes `ConvTranspose3d`, as it's not swapped yet in the quantization workflow
3. removes a duplicate set

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76980

Approved by: https://github.com/jerryzh168
2022-05-13 20:22:42 +00:00
Jiayi Sun
e867831b84 extend replaceConvolutionWithAtenConv to handle conv_transpose3d (#76888)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76888
Approved by: https://github.com/eellison
2022-05-13 16:40:12 +00:00
dzdang
af80329ca9 [quant][core][gpu][feature] Implemented quantized conv1d cudnn op
Summary:
Previously, only quantized conv2d cudnn op has been implemented. This PR
implements the 1d variant. Because cuDNN does not have direct support
for conv1d, we have cast the 1d case to a 2d case by adding a dummy
weight dimension of 1 for the input and weight tensors. This is
analogous to how it was done for quantized cpu conv1d (e.g., see
`quantized/cpu/qconv.cpp`)

A corresponding test case was added in `test_quantized_op.py`. This
function should ideally be merged with `test_qconv1d` when cuDNN flags are
enabled and available in pytorch.

Test Plan:
```
python test/test_quantization.py -k test_qconv1d_cudnn
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77175

Approved by: https://github.com/jerryzh168
2022-05-12 03:25:30 +00:00
dzdang
1d7b294574 [quant][better-engineering][bc-breaking] Removed quant_min/quant_max from fake_quant modules
Summary:
FakeQuantize class has quant_min/quant_max and activation_post_process
attributes, the latter of which already includes quant_min/max. As such,
we can remove quant_min/quant_max from FakeQuantize and use
FakeQuantize.activation_post_process.quant_m* directly.

Test plan:
```
python test/test_quantization.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76674

Approved by: https://github.com/vkuzo
2022-05-11 14:23:05 +00:00
Vasiliy Kuznetsov
3a8752db86 ns for fx: skip shadowing ops if copy subgraph is not implemented (#76663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76663

Subgraph copy does not handle all edge cases. It's high eng time
to handle them all, and currently an unhandled edge case crashes
the script.

This PR adds a function to check if the subgraph copy is supported,
and skips shadowing if it is not supported. This way the model
can still go through the shadowing APIs without an exception.

Test Plan:
```
python test/test_quantization.py -k FXNumericSuite
```

Reviewed By: hx89

Differential Revision: D36069304

Pulled By: vkuzo

fbshipit-source-id: 6b38b8d8e43396a4cf2373b247223a19d451d096
(cherry picked from commit e2322ca0635c51a4701e60fa90f77915a3c46d0f)
2022-05-05 13:19:53 +00:00
Vasiliy Kuznetsov
d3e338935a ns for fx: skip shadowing for torch.cat, and also for nodes with only kwargs (#76561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76561

User model had syntax like `torch.cat(tensors=[x])`. This PR fixes two errors
to unbreak this in NS shadow model:
1. skip nodes which only have kwargs (instead of throwing an exception)
2. explicitly skip shadowing of `torch.cat` (since it's not supported anyways)

Test Plan:
```
python test/test_quantization.py -k test_op_with_only_kwargs_skips_shadowing
python test/test_quantization.py -k test_op_mul_add_cat_skips_shadowing
```

Reviewed By: hx89

Differential Revision: D36017356

Pulled By: vkuzo

fbshipit-source-id: 0da4840a62c2dac183f8294c2cec4fce262474b3
(cherry picked from commit 88409c1576e7f690708957b2baa285fc7961e9d6)
2022-05-05 13:19:53 +00:00
dzdang
e2aa28a2d0 [quant][fx][improvement] Renamed default_affine_fixed_qparams_observer and default_symmetric_fixed_qparams_observer (#76637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637

The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers

The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`

Test Plan:
```
python test/test_quantization.py
```

```
python test/test_quantization.py
```

Differential Revision:
D36054169
D36054169

Reviewed By: vkuzo

Pulled By: dzdang

fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
2022-05-04 02:39:20 +00:00
Vasiliy Kuznetsov
e155e2584a ns for fx: skip operator.add and operator.mul when shadowing (#76504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76504

Shadowing for add and mul is not implemented, this PR fixes the skipping
logic to also skip the `operator.add` and `operator.mul` flavor of these
operators.

Test Plan:
```
python test/test_quantization.py -k test_mul_add_skips_shadowing
```

Reviewed By: dzdang

Differential Revision: D35985997

Pulled By: vkuzo

fbshipit-source-id: f832e54a5461d3b182df4bb905357d6c66742e98
(cherry picked from commit 93ae9592f68873865ebfdc438bffb1c9486dd1c1)
2022-05-03 05:58:46 +00:00
Vasiliy Kuznetsov
31d5a300ac quant: make RecordingObserver inherit from ObserverBase (#76460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460

`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.

This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D35976351

Pulled By: vkuzo

fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
2022-05-03 05:53:54 +00:00
dzdang
8c47e9dc81 [quant][core][gpu][improvement] Added support for padding quantized cudnn conv2d operator
Summary:
cudnn v8.4.0 expects input channels for conv2d to be a multiple of 4. If
it is not, we need to explicitly pad it to a multiple of 4 ourselves as
cudnn does not currently support padding intriniscally.
The padding implemented here is limited to groups=1; however, this
should be a straightforward adaption to groups > 1 since we're only
padding a single dimension.

When cudnn enables support for padding, we can remove the padding on our
end.

Test plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76184

Approved by: https://github.com/jerryzh168
2022-04-29 00:13:48 +00:00
dzdang
bbc263eb5d [quant][core][gpu][feature] Implemented quantized cuda adaptive average pool2d op (#76081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76081

The current implementation of quantized cuda adaptive average pooling uses the following:
dequant -> fp32 adaptive average pooling -> quant. This is the same numerically as quantized adaptive average pooling. This is not the ideal implementation, as we desire to operate on the quantized values directly. However, we are currently blocked on this as we are waiting for cudnn's 8.5.0 release, which is anticipated to support adaptive average pooling. When that support is made available, we will use it directly.

Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_adaptive_avg_pool
```

```
python test/test_quantization.py TestQuantizedOps.test_adaptive_avg_pool
```

Differential Revision:
D35768751
D35768751

Reviewed By: jerryzh168

Pulled By: dzdang

fbshipit-source-id: ad06fd06d6941b92105bcabf0fd54b9e27a029d5
(cherry picked from commit 4e1805dd62a9d5e94c61340ac46bcd7aa4e49dd9)
2022-04-28 12:37:20 +00:00
dzdang
ad88816c86 [quant][core][gpu][feature] Added support for float->quantized cuda tensor copying (#76177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76177

Previously, support for copying a fp tensor to a quantized tensor was
limited to CPU tensors. This PR extends the support to GPU tensors.
A corresponding test was added to test_qtensor_float_assignment for cuda
tensors

Test Plan:
```
python test/test_quantization.py -k test_qtensor_float_assignment
```
Imported from OSS

Differential Revision:
D35817832
D35817832

Reviewed By: jerryzh168

Pulled By: dzdang

fbshipit-source-id: e5a4a0bb2d8a56f3f1a88806a534b5cb38275cf2
(cherry picked from commit 9173e07b51bb1b853244b205ddf3e36000f01b64)
2022-04-28 02:23:14 +00:00
Vasiliy Kuznetsov
35545d85dc fx quant: add quantized Softmax workflow integration (#75106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75106

In https://github.com/pytorch/pytorch/pull/75017 a quantized softmax
kernel was added. This PR adds the FX graph mode quantization workflow
integration to swap `nn.Softmax` to `nnq.Softmax`.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_fixed_qparams_ops
```

Reviewed By: kimishpatel, andrewor14

Differential Revision: D35324817

Pulled By: vkuzo

fbshipit-source-id: 710ae3bedf8a6ad1dc411cd9808fdd0ce743e757
(cherry picked from commit d67603c0fbb1d3469d97bd538cec38aa8b03324b)
2022-04-20 21:54:26 +00:00
dzdang
e20793b054 [quant][core][gpu][cudnn] Added support for nhwc tensors in quantized cudnn add_relu op (#75806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75806

When using the the quantized cudnn add operator, if the input tensors are 4D,
cudnn requires NHWC format in v. 8.4.0 (older versions may have relaxed this constraint).
Previously, all tensors defaulted to NCHW format.

Test Plan:
```
python test/test_quantization.py -k test_qadd_relu_cudnn
```

Reviewed By: vkuzo

Differential Revision: D35651368

Pulled By: dzdang

fbshipit-source-id: b6ce49cf100b88c6fa29513ec50b38d445c3c02f
(cherry picked from commit 5936fe6783a02827bd93feb80d137da508d6facc)
2022-04-20 13:48:40 +00:00
Salil Desai
c358c5d7d8 [PyTorch Edge] Using Qnnpack in Quantized Softmax Op (#75799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75799

Use Qnnpack's quantized softmax in quantized::softmax op when available

Test Plan:
From fbcode
```buck test caffe2/test:quantization -- test_qsoftmax```

# Benchmarking

(Naive Quantized from D35469257 v1, rest D34996486 v14)

|Shape|Fp32|Naive Quantized|Qnnpack Quantized|Qnnpack Quantized with Permute|
|(1, 5, 49, 49)|[6.6757](https://www.internalfb.com/intern/aibench/details/916894241135767)|[7.5981](https://www.internalfb.com/intern/aibench/details/504937774229694)|[1.5579](https://www.internalfb.com/intern/aibench/details/197716001861453)|[2.8446](https://www.internalfb.com/intern/aibench/details/59311708375203)|
|(1, 9, 16, 128)|[7.8485](https://www.internalfb.com/intern/aibench/details/135980349949180)|[9.0499](https://www.internalfb.com/intern/aibench/details/10150813869685)|[1.8865](https://www.internalfb.com/intern/aibench/details/58396904565184)|[3.5282](https://www.internalfb.com/intern/aibench/details/24583753477273)|
|(1, 5, 49, 64)|[7.0626](https://www.internalfb.com/intern/aibench/details/232201930202347)|[8.1091](https://www.internalfb.com/intern/aibench/details/57639118425406)|[1.801](https://www.internalfb.com/intern/aibench/details/656994017385942)|[3.2989](https://www.internalfb.com/intern/aibench/details/518979104130992)|
|(1, 3, 196, 64)|[16.4717](https://www.internalfb.com/intern/aibench/details/895795134460898)|[18.1987](https://www.internalfb.com/intern/aibench/details/909875420196348)|[3.5657](https://www.internalfb.com/intern/aibench/details/206864227381228)|[8.4519](https://www.internalfb.com/intern/aibench/details/84462467166362)|
|(1, 6, 49, 128)|[15.9872](https://www.internalfb.com/intern/aibench/details/417436371026264)|[17.4556](https://www.internalfb.com/intern/aibench/details/183113464145486)|[3.3912](https://www.internalfb.com/intern/aibench/details/616978041358188)|[8.019](https://www.internalfb.com/intern/aibench/details/849820562672950)|
|(1, 3, 196, 196)|[47.3636](https://www.internalfb.com/intern/aibench/details/633568439089073)|[52.0079](https://www.internalfb.com/intern/aibench/details/742080402804069)|[8.5009](https://www.internalfb.com/intern/aibench/details/685773806433926)|[13.5807](https://www.internalfb.com/intern/aibench/details/871998384861927)|
|(1, 6, 16, 64)|[4.0205](https://www.internalfb.com/intern/aibench/details/380419433454222)|[4.5973](https://www.internalfb.com/intern/aibench/details/923432861470595)|[1.0569](https://www.internalfb.com/intern/aibench/details/176718883676884)|[2.0519](https://www.internalfb.com/intern/aibench/details/303780226597723)|
|(1, 6, 16, 16)|[1.8299](https://www.internalfb.com/intern/aibench/details/599824935422385)|[2.3109](https://www.internalfb.com/intern/aibench/details/669753943440643)|[0.808](https://www.internalfb.com/intern/aibench/details/956331973568963)|[1.6406](https://www.internalfb.com/intern/aibench/details/924887465284668)|
|(1, 9, 16, 49)|[4.5134](https://www.internalfb.com/intern/aibench/details/946070183169117)|[5.2282](https://www.internalfb.com/intern/aibench/details/623403709385332)|[2.8195](https://www.internalfb.com/intern/aibench/details/635876531473203)|[2.2251](https://www.internalfb.com/intern/aibench/details/507256033953952)|
|(1, 6, 49, 196)|[23.9811](https://www.internalfb.com/intern/aibench/details/605021113223196)|[26.2834](https://www.internalfb.com/intern/aibench/details/991778071254930)|[4.5338](https://www.internalfb.com/intern/aibench/details/626603993142478)|[9.3877](https://www.internalfb.com/intern/aibench/details/962263658487065)|

*table made with https://www.internalfb.com/intern/anp/view/?id=1714217&revision_id=686803042569716*

Reviewed By: kimishpatel

Differential Revision: D34953197

fbshipit-source-id: 57418757fce17903583c04dffd51c886f9e1bc0e
(cherry picked from commit 8978222623f0cbacdb0373c405136ec94c035da6)
2022-04-19 22:29:57 +00:00
arindamroy-eng
7478ce187a ROCM:Unskip more tests for ROCM5.0
Re-enabling more tests which are working on ROCM5.0

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
dzdang
982be19638 [quant][core][gpu][improvement] Suported int8 matmul for quantized linear cudnn op
Summary:
This PR requires cudnn v8.4.0, which enables support for int8 matmul.
Previous implementation of quantized linear cudnn operator did used cudnn v8.3.3,
which did not have have support for int8 matmul (we had to convert our int8 matmul to fp matmul)

Test plan:
```
python test/test_quantization.py -k test_qlinear_cudnn
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75418

Approved by: https://github.com/jerryzh168
2022-04-19 17:24:34 +00:00
Jerry Zhang
74454bdb46 [quant][fx] Move backend_config folder to torch.ao.quantization
Summary:
Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented
the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different
workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that
it can be shared by different workflows
Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py)

Test Plan:
python test/teset_quantization.py TestQuantizeFx
python test/teset_quantization.py TestQuantizeFxOps
python test/teset_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestAOMigrationQuantization
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823

Approved by: https://github.com/vkuzo
2022-04-19 15:38:57 +00:00
dzdang
6dc71461e1 [quant][core][gpu][bug-fix] Added additional caching support in quantized cudnn add_relu op
Summary:
Previous caching strategy for quantized cudnn add_relu operator was insufficient
as it did not properly record all the necesary information. This PR adds several
items to the CacheKey (e.g., input sizes, input dimensions, etc..) that enables
proper caching

Test plan:
```
python test/test_quantization.py -k test_qadd_relu_cudnn
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75772

Approved by: https://github.com/jerryzh168
2022-04-18 16:53:18 +00:00
dzdang
7d8b366223 [quant][improvement][gpu] Fixed errors in test_qlinear_cudnn
Summary:
Previously, test_qlinear_cudnn had some hard coded parameters that are now removed, and bias and relu
are now enabled.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75446

Approved by: https://github.com/jerryzh168
2022-04-15 23:14:41 +00:00
Andrew Or
5dcbcc6de8 [Quant][fx] Fix get_default_qconfig_dict for fused modules
Summary: Calling `prepare_fx` with `get_default_qconfig_dict`
failed for models with fused modules, such as `ConvReLU2d`.
This commit fixes this by adding qconfig entries for ReLU
and BatchNorm as well.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_dict_with_fused_modules

Reviewers: jerryzh168

Subscribers: jerryzh168, vkuzo

Issue: https://github.com/pytorch/pytorch/issues/75825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75838

Approved by: https://github.com/jerryzh168
2022-04-15 22:37:26 +00:00
dzdang
515d61f2fc [quant][core][bug fix] Corrected at::to(memory_format=...) support for quantized tensors
Summary:
Previously, at::to support for quantized tensors did not work properly,
and we had to, instead, use at::contiguous. This PR allows us to use
at::to(memory_format=...) or torch.Tensor.to(memory_format=....)
on the back- and front-ends.

Test plan:
python test/test_quantization.py -k test_qtensor_to_memory_format

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75540

Approved by: https://github.com/jerryzh168
2022-04-15 03:20:50 +00:00
Thiago Crepaldi
9bbe1d632e Fix ONNX ATen fallback for non-caffe2 engines
This PR introduces 3 BC changes:

First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.

Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.

Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
2022-04-14 23:18:45 +00:00
Jerry Zhang
0c08fcff32 [quant][fx] Cleanup some unused states and args
Summary:
* Removed "patterns" from observed module since it's no longer needed
* Removed an arg from insert_observer
* Removed some unused keys in checking the validity of qconfig_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75521

Approved by: https://github.com/andrewor14
2022-04-14 13:18:00 +00:00
Vasiliy Kuznetsov
63c6209d09 ns for fx: reenable tests disabled by #62608
Summary:

In https://github.com/pytorch/pytorch/pull/62608 various tests in FX NS
were disabled due to lack of dtype inference.

https://github.com/pytorch/pytorch/pull/75471 fixes some of these issues,
the issue fixed by this PR is probably why the tests were disabled.

This PR reenables the tests and adjusts them for the new behavior in
https://github.com/pytorch/pytorch/pull/62608.

Test plan:

```
python test/test_quantization.py -k NumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75511

Approved by: https://github.com/jerryzh168
2022-04-13 19:44:47 +00:00
Vasiliy Kuznetsov
f1f185f6f9 ns for fx: fix bug to enable again on torchvision models
Summary:

The tests were disabled by https://github.com/pytorch/pytorch/pull/61687, but
this specific behavior broke some time after while these tests were disabled.

The issue was that:
1. `torch.add` is present in these models
2. In the common codepath of comparing fp32 to int8, torch.ops.quantized.add was already filtered out because it did not have a dtype specified
3. In the less common codepath of comparing fp32 to fp32, torch.add was eligible for shadowing, but the logic was broken

This PR fixes (3) by disabling shadowing on ops which do not support it, by op type.
The support may be built later, if needed.

Test plan:

```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_mobilenet_v2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75472

Approved by: https://github.com/jerryzh168
2022-04-13 19:44:46 +00:00
Vasiliy Kuznetsov
ae3210420e ns for fx: fix issue with shadowing nodes of unknown dtype
Summary:

In https://github.com/pytorch/pytorch/pull/61687, a couple of FX Numeric Suite
tests were disabled.

This PR reenables one of these tests. We update the dtype inference logic
of NS to always return a specific type instead of sometimes returning
"fp32 or int8". When the type cannot be deduced by the current logic,
we do not shadow the node.

As a better version of dtype inference becomes available in FX Graph Mode Quantization,
we could migrate this code to use it.

Future PRs in the stack will unbreak other things to enable NS for FX to
work on torchvision again.

Test plan:

```
python test/test_quantization.py -k NumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75471

Approved by: https://github.com/jerryzh168
2022-04-13 19:44:46 +00:00
Jerry Zhang
761bb06292 [quant][fx] Use native backend_config_dict in convert
Summary:
Previously the list of qat modules, fused modules etc. are hardcoded in the convert code, in this PR we get these information
from backend_config_dict instead

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75520

Approved by: https://github.com/vkuzo
2022-04-12 17:59:24 +00:00
Jerry Zhang
f83d047338 [quant][fx] Use native backend_config_dict in prepare
Summary:
Previously we are still relying on the registration mechnism and get the default quantize handlers that are registered,
now we have moved all registration to backend_config_dict we can get all quant patterns just from backend_config_dict now.

This PR enables using native backend_config_dict everywhere in prepare when the backend_config_dict is None, we'll also
do similar changes in convert as well

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75469

Approved by: https://github.com/vkuzo
2022-04-12 17:05:31 +00:00
Salil Desai
ca0ef52382 [PyTorch Edge] Add Quantized Softmax Op (Naive Implementation) (Re-land)
Summary: Reland of D34943147 (8d7242a18b) + Revert of D35404312, after mitigation of S267077

Test Plan: ```buck test caffe2/test:quantization -- test_qsoftmax```

Differential Revision: D35432475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75415
Approved by: https://github.com/kimishpatel
2022-04-11 22:39:50 +00:00
Yulv-git
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
Jerry Zhang
72d3d160fb [quant][fx] Remove additional_object_mapping from the docs (#75389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75389

This seems to be removed before, so won't mark this PR as bc-breaking, this use case
is now enabled with backend_config_dict api

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451960

fbshipit-source-id: 21a8f19c1968af44bf4fa603f16ee8c6f5080e5a
(cherry picked from commit 2862f17b57f846b55736bc6b5d10df4256567adf)
2022-04-11 10:40:11 +00:00
Jerry Zhang
dd667b6e97 [quant][fx] Move all fusion registrations to backend_config_dict (#75318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75318

This PR moves the registrations for fusion patterns to backend_config_dict

Also fixed one issue in numeric suite graph matcher, since now (torch.nn.ReLU, torch.nn.BatchNorm3d)
would appear in quant patterns, (previously only in fusion pattern), and we need to match sure (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d))
can match before (torch.nn.ReLU, torch.nn.BatchNorm3d), but previously, it looks like (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d)) is not
really matched since `end_node_matches_reversed_fusion` is expecting a flattened pattern like (torch.nn.ReLU, torch.nn.BatchNorm3d, torch.nn.Conv3d),
for now we'll manually flatten this pattern, but in the future I think we might want to use the matching function `is_match` under torch.ao.quantization.fx.match_utils
to do this matching.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: vkuzo, andrewor14

Differential Revision: D35423788

fbshipit-source-id: a54093ccebae9c59aeee9399669ddb2c48bfb9aa
(cherry picked from commit 6a55ea8eb2740cedafb9972888fedf68e927586d)
2022-04-09 05:08:37 +00:00
Andrew Or
0bdf9a9833 [Quant][fx] Decouple prepare_*fx from training/eval modes (#75401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75401

This commit removes asserts that require prepare_fx to
be run in eval mode and prepare_qat_fx to be run in training mode.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_prepare_mode

Imported from OSS

Reviewed By: vkuzo, jerryzh168

Differential Revision: D35457100

fbshipit-source-id: 13a55b13d9e389991f69c06c6a70bc51cdebba36
(cherry picked from commit fb0685e0873dc8e807da3213be403b51e8b4a687)
2022-04-08 15:34:08 +00:00
Jerry Zhang
9905b1f29a [quant][fx] Move rnn ops to backend_config_dict (#75316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75316

att, similar to previous PRs, this one moves dynamically quantized rnn ops
to backend_config_dict
Currently the dtype check is not yet enabled, so we provided the dtype_configs but it is not really used yet,
we will enable it a bit later after we moved everything to backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: malfet

Differential Revision: D35423792

fbshipit-source-id: ef862ea1be5bfb4c28130775c3b2158df28d3e22
(cherry picked from commit 0247f3a768a2c165f482a66c4225b3357e33e966)
2022-04-08 08:58:50 +00:00