Commit Graph

681 Commits

Author SHA1 Message Date
macandro96
f87d8c2f62 [ao][sparsity] Basic implementation of activation sparsifier (#80886)
The Activation sparsifier class aims to sparsify/prune activations in a neural
network. The idea is to attach the sparsifier to a layer (or layers) and it
zeroes out the activations based on the mask_fn (or sparsification function)
input by the user.
The mask_fn is applied once all the inputs are aggregated and reduced i.e.
mask = mask_fn(reduce_fn(aggregate_fn(activations)))

Note::
    The sparsification mask is computed on the input **before it goes through the attached layer**.

Test Plan:
```python test/test_ao_sparsity.py TestActivationSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80886
Approved by: https://github.com/HDCharles
2022-07-22 21:43:33 +00:00
vspenubarthi
75aab6540e [ao] Update DynamicStatic Detector to account for Conv (#81972)
Summary: This updates the DynamicStatic Detector to also provide insight
into whether Conv layers should use dynamic or static quantization.
Before, this was not included because as of now, Dynamic quantization is
not supported for Conv layers, but this adds a check for Conv layers and
if dynamic is recommended, it will also give a disclaimer that it is not
currently supported but will be in the future.

Test Plan: python test/test_quantization.py TestFxModelReportDetectDynamicStatic

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81972
Approved by: https://github.com/jerryzh168
2022-07-22 21:00:29 +00:00
vspenubarthi
0cacaf070f [ao] Fix to InputWeightEqualization detector to handle Conv groups (#81971)
Summary: The current implementation of the InputWeightEqualization
detector broke when it was tested on MobileNetV2, and the reason for
this is that it wasn't able to properly handle groups in Conv layers,
and there also had to be some minor reshaping of the weights to handle
this as well.

In addition, the output was correspondingly tuned so that instead of
giving on output for each channel on each layer, it gives a single
suggestion per module and just lets it know how many of the channels
could benefit from input-weight equalization, and suggests it if it's
more than half.

There was also the realization that the test class didn't do a good job
of testing different dimensions for the batch vs. height vs. width, so
this was updated to be more comprehensive as well.

Test Plan: python test/test_quantization TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81971
Approved by: https://github.com/jerryzh168
2022-07-22 20:56:15 +00:00
macandro96
e66986421d [ao][sparsity] Training-aware data sparsity callback for lightning (#80371)
This callback aims to sparsify the model inside lightning module after training.
**Note that the model is copied and then sparsified, so the existing model is not modified**

The sparsified model can be used for comparison and can be accessed using
<callback_obj>.sparsified

Test Plan:
```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestTrainingAwareCallback```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80371
Approved by: https://github.com/z-a-f
2022-07-21 16:41:43 +00:00
macandro96
eecf34fbe7 [ao][sparsity] Post training data sparsifier callback for lightning (#80370)
Lightning callback that enables post-training sparsity.

This callback aims to sparsify the model inside lightning module after training.
**Note that the model is copied and then sparsified, so the existing model is not modified**

The sparsified model can be used for comparison and can be accessed using <callback_obj>.sparsified

Test Plan
```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestPostTrainingCallback```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80370
Approved by: https://github.com/z-a-f
2022-07-21 16:39:13 +00:00
Weiwen Xia
2edd6aaeaa Add prelu op and module for quantized CPU backend (#73491)
Add prelu op and module for quantized CPU backend.
The PR includes:
- Quantized version of prelu op
- Native prelu kernel for quantized CPU
- Prelu modules in `nn` and `nn.quantized`
- FX support for prelu
- Unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491
Approved by: https://github.com/jerryzh168
2022-07-20 07:48:15 +00:00
vspenubarthi
589e8a1da5 [ao] Get feature and module names from ModelReportVisualizer class (#81647)
Summary: Added the functionality to be able to get the feature names and
module_fqns from the ModelReportVisualizer class. The purpose of this
addition is so that users can see the exact set of module_fqns or
feature names that they can filter based on, and use this information to
perform their filtering.

Test Plan: python test/test_quantization.py
TestFxModelReportVisualizer.test_get_modules_and_features

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81647
Approved by: https://github.com/andrewor14
2022-07-20 03:03:03 +00:00
vspenubarthi
1d3935a77d [ao] Add method in ModelReport to generate visualizer (#81589)
Summary: We created a ModelReportVisualizer class, and the primary
way it is envisioned that it is accessed is:

```
model_report_visualizer = model_reporter.generate_visualizer()
```

This method only works after reports have been generated and it takes in
the generated reports and reformats them to be ordered by module, into
the format required by the ModelReportVisualization. It then generates
the visualizer instance and returns that to the user.

Test Plan: python test/test_quantization.py TestFxModelReportClass.test_generate_visualizer

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81589
Approved by: https://github.com/andrewor14
2022-07-20 02:58:52 +00:00
vspenubarthi
d0ce1fbbe2 [ao] Created Skeleton for ModelReportVisualizer class (#81523)
Summary: This introduces the skeleton for the ModelReportVisualizer
class. This class helps visualize the information generated by the
ModelReport class `generate_report()` output. This class aims to provide
visualizations in a table, plot (line graph) and histogram view.

This also introduces an empty test class for testing visualizations. As
implementations start occuring for this class, tests will also be
approrpriately added.

This includes the high level descriptions for each of the methods as
well. Expected use cases will be added to the class description in a
future commit as that gets finalized.

Test Plan: python test/test_quantization.py TestFxModelReportVisualizer

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81523
Approved by: https://github.com/andrewor14
2022-07-20 02:39:14 +00:00
vspenubarthi
8a6d1289d8 [ao] Revised ModelReport API to take in model at initialization (#81588)
Summary: Currently, the ModelReport API only takes in detectors at the
beginning and for each of its methods, you have to pass in the model
each time, which doesn't really make sense because:

1. you will always want to be working on the same model
2. passing in a different model could break things, so more
fault-tolerant if we keep the model internally and make calls on it

Therefore, now the model will be passed in in intialization, and will
just be used for the rest of the operations with the local link.

All the ModelReport tests have been adjusted to account for this, and
this change must pass all the tests to ensure a successful API
transition.

If you wish to see how the updated API looks, the Expected Usage in the
ModelReport clas description has been updated to reflect the changes.

The README has also been updated with these changes as well.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81588
Approved by: https://github.com/jerryzh168
2022-07-19 16:11:46 +00:00
vspenubarthi
e907a8d966 [ao] Updated dict keys of detectors to have consistent naming scheme (#81587)
Summary: Currently, all the detectors have pretty accurate naming
schemes that give an idea of what they do. However, since now there are
more and more detectors being developed, there is a need to make sure
that the naming scheme for detectors are consistent for their keys.

This updates the keys of the returned dictionary keys to better
highlight if something is an activation stat or weight stat, etc.

Test Plan:

python test/test_quantization.py TestFxModelReportDetector

python test/test_quantization.py TestFxModelReportObserver

python test/test_quantization.py TestFxModelReportDetectDynamicStatic

python test/test_quantization.py TestFxModelReportClass

python test/test_quantization.py TestFxDetectInputWeightEqualization

python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81587
Approved by: https://github.com/jerryzh168
2022-07-19 08:50:30 +00:00
asl3
368018530e [quant] Implement forward and backward autograd functions for fake quantize (#81438)
### Summary:
This PR implements custom autograd functions for forward and backward to be used in APoT fake quantization. The implementation follows this doc about custom autograd functions: https://pytorch.org/tutorials/beginner/examples_autograd/polynomial_custom_function.html

### Test Plan:
Run tests with: `python test/quantization/core/experimental/test_fake_quantize.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81438
Approved by: https://github.com/jerryzh168
2022-07-19 02:15:30 +00:00
vspenubarthi
8a3f88b5e0 [ao] Standardized InputWeightEqualizationDetector output to single level (#81586)
Summary: Currently the InputWeightEqualizationDetector has a
multi-layered output.

Example
```
{'block1.linear': {'channel_axis_selected': 1,
                   'channel_comparison_metrics': tensor([0.8736, 0.6594, 0.2916], grad_fn=<DivBackward0>),
                   'input_range_info': {'global_max': tensor(9.),
                                        'global_min': tensor(-10.),
                                        'per_channel_max': tensor([9., 9., 9.]),
                                        'per_channel_min': tensor([-10., -10., -10.])},
                   'input_weight_equalization_recommended': [True,
                                                             False,
                                                             False],
                   'threshold': 0.8,
                   'weight_range_info': {'global_max': tensor(0.5618, grad_fn=<UnbindBackward0>),
                                         'global_min': tensor(-0.2211, grad_fn=<UnbindBackward0>),
                                         'per_channel_max': tensor([0.3764, 0.5618, 0.2894], grad_fn=<NotImplemented>),
                                         'per_channel_min': tensor([-0.2211,  0.2213,  0.2228], grad_fn=<NotImplemented>)}},
}
```

With all the levels, it can be hard to parse the information for
anything, especially the planned visualization feature where the data
has to be reorganized. Therefore, to make it standardized across all
detectors, all outputs will be limited to one level.

The new format is:
```
{'block1.linear': { 'channel_axis_selected': 1,
                    'channel_comparison_metrics': tensor([0.5705, 0.9457, 0.8891], grad_fn=<DivBackward0>),
                    'activation_global_max': tensor(9.),
                    'activation_global_min': tensor(-10.),
                    'activation_per_channel_max': tensor([9., 9., 9.]),
                    'activation_per_channel_min': tensor([-10., -10., -10.]),
                    'input_weight_equalization_recommended': [False, True, True],
                    'threshold': 0.8,
                    'weight_global_max': tensor(0.4258, grad_fn=<UnbindBackward0>),
                    'weight_global_min': tensor(-0.4958, grad_fn=<UnbindBackward0>),
                    'weight_per_channel_max': tensor([0.1482, 0.3285, 0.4258], grad_fn=<NotImplemented>),
                    'weight_per_channel_min': tensor([-0.1517, -0.4958, -0.3027], grad_fn=<NotImplemented>)},
}
```

The README will also be updated to reflect this change.

Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81586
Approved by: https://github.com/jerryzh168
2022-07-19 01:00:40 +00:00
vspenubarthi
2ddb722bc6 [ao] Standardize PerChannelDetector Output to be single level (#81585)
Summary: Currently, the PerChannelDetector has a multi-layered output.

Example:
```
{'backend': 'qnnpack',
 'per_channel_status': {'block1.linear': {'per_channel_supported': True,
                                          'per_channel_used': False},
                        'block2.linear': {'per_channel_supported': True,
                                          'per_channel_used': False}}}
```

The issue with this is that when it comes to future features such as
visualizations where we need to go through this dictionary, it can be
hard because of the variable number of layers.

This changes the output format of the PerChannelDetector to have a
standard format.

Ex.)
```
{'block1.linear': {'backend': 'qnnpack',
                   'per_channel_supported': True,
                   'per_channel_used': False},
 'block2.linear': {'backend': 'qnnpack',
                   'per_channel_supported': True,
                   'per_channel_used': False}}
```

Test Plan: python test/test_quantization.py TestFxModelReportDetector

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81585
Approved by: https://github.com/HDCharles
2022-07-18 22:16:08 +00:00
vspenubarthi
845792db3c [ao] Fix for extra lines after return in Outlier Detector (#81499)
Summary: There were accidently two lines added after a return statement
in the OutlierDetecor insertion that was not caught by either the linter
nor the tests nor i, that were harmless, but some odd merge issue. This
removes those two lines.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81499
Approved by: https://github.com/kit1980
2022-07-15 00:10:59 +00:00
vspenubarthi
0f3c8c939f [ao] Added README for ModelReport functionality (#81369)
Summary: This adds a README for the ModelReport functionality that
contains an overview of the class, what it does,
and how it works, an example of usage, information on how to implement a
new detector (since this is how core functionality is added), folder
structure information, and finally information on tests and where they
are located.

The ModelReport class is still in development and will, in the future,
get additional features such as visualizations, and the README will be
updated with this information as it is added.

Test Plan: Just a new README, no code is added, README will be reviewed
for accuracy and ease of use/ easiness to read.

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81369
Approved by: https://github.com/jerryzh168
2022-07-14 19:17:52 +00:00
vspenubarthi
8f743d7a70 [ao] Updated detector observer insert args to be vars not strings (#81382)
Summary: Before for the detectors, the
determine_observer_insert_points() function for all of them would have
hard coded strings as the keys for the dictionary that would be returned
to the ModelReport instance, and those same hard-coded keys would be
used to actually extract information from them. Since all detectors used
the same string keys, these were just made default variables at the top
of the detector.py file, and all detectors just used those. The same
ones are imported and now used in ModelReport file as well. This way,
there is less of a chance of an error because of incorrectly typed
strings.

The test plan primarily tests the ModelReport class because this uses
the same new vars as well for the strings and is the primary one calling
each of the detector instances' determine_observer_insert_points()

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81382
Approved by: https://github.com/jerryzh168
2022-07-14 19:17:28 +00:00
Jerry Zhang
446edadd95 [quant][fx] Follow up fixes for qconfig validations for fixedqparams ops (#81010)
Summary:
This adds a few things on top of https://github.com/pytorch/pytorch/pull/80184,
1). node.target was assumed to be "tanh", torch.nn.Tanh etc. this PR handles that properly
2). adds FixedQParamsFakeQuantize support
3). extends the comparison function _partial_wrapper_equals to work with FakeQuantize.with_args(observer=...)

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D37735193](https://our.internmc.facebook.com/intern/diff/D37735193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81010
Approved by: https://github.com/andrewor14
2022-07-14 18:06:23 +00:00
Andrew Or
c657c3d3ab [Quant][fx] Rename convert_to_reference to convert_to_reference_fx (#81326)
Summary: This commit renames the convert_to_reference function to
convert_to_reference_fx, which is more descriptive and matches
prepare_fx and prepare_qat_fx better.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers: jerryzh168

Subscribers: jerryh168

Differential Revision: [D37787876](https://our.internmc.facebook.com/intern/diff/D37787876)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81326
Approved by: https://github.com/jerryzh168
2022-07-13 22:18:46 +00:00
vspenubarthi
a25df29cc4 [ao] Updated ModelReport function calls to show not dependent on Fx GraphMode (#81252)
Summary: Before, all the function calls for the ModelReport object were
dependent on the Fx Graph Mode workflow. However, in reality, this was
not true and the only requirement that was needed was for the model to
be a traceable GraphModule. This also helped keep the ModelReport class
as detached from the Fx Workflow as possible so that it can be used as a
more all purpose tool in the future.

This updated all the references to make sure that it wasn't specifically
referencing that a Fx Graph Mode workflow is needed, and is instead more
general since all we really need is a traceable model.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81252
Approved by: https://github.com/jerryzh168
2022-07-13 20:24:37 +00:00
vspenubarthi
5eec908700 [ao] Update ModelReport class with class usage in description. (#81251)
Summary: This adds a example usage description to the ModelReport class
so that people can see how it can be used right in the class
documentation without having to consult external sources. The example
usage depicts how it can be used using the QuantizationTracer, which was
a decision taken to illustrate how there is no strict requirement on
using this tool with only Fx Graph Mode workflow.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81251
Approved by: https://github.com/jerryzh168
2022-07-13 20:21:37 +00:00
vspenubarthi
6366c99e5b [ao] Added Collab link for Outlier Detector ratio val choice (#81250)
Summary: A huge part of the work for the Outlier detector was figuring
out what a good nth percentile to compare against the 100th percentile
was while also figuring out what a good comparision ratio would be. This
commit adds the link to a collab to the documentation of the function so
that people can go and see what the calculations used to determine those
values are and realize that they are not just randomly thrown in there.

At a high level, this collab contains work that includes:
- Figuring out whether to use interpolation or lower as the rule for
finding quantile between two indices
- Figuring out what a good value for reference_percentile is
- Figuring out what a good value for ratio_threshold is

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81250
Approved by: https://github.com/jerryzh168
2022-07-13 20:19:24 +00:00
vspenubarthi
9c298fff2e [ao] Added constant channel check to Outlier Detector (#81249)
Summary: The current Outlier detector does a good job of finding whether
data distributions passing through layers have outliers. However,
suppose we have a completely constant channel. The outlier detector
would not detect it as an outlier, but that is still something we want
to highlight because a constant channel usually is a result of a bad
configuration or something really wrong with the data.

To address this there are two additions to the outlier detector that
this commit makes:
- The first is to add whether there are any constant batches at all and
let the user know in the text report
- The second is to let the user know the number of total constant
batches found for each channel, so they can figure out if there are any
unnecessary channels present.

The exisiting outlier detector tests were modified to do a quick check
for this feature.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81249
Approved by: https://github.com/andrewor14
2022-07-13 20:16:33 +00:00
vspenubarthi
229762dcd9 [ao] Added statistical threshold arg in Outlier Detector (#81174)
Summary: The outlier detector has a feature where it's able to notify
the user if below the whole set of batches that passed through were used
in Outlier calculation, which mainly happens as a result of 0-errors.
This changes the code so that instead of comparing against a value like
30 as we were before, we now let the user pass in an optional fractional
value and if the ratio of the batches used was below that value, the
detector alerts the user.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81174
Approved by: https://github.com/andrewor14
2022-07-13 20:13:46 +00:00
vspenubarthi
893d763276 [ao] Implemented Outlier Detection Report Generation (#80937)
Summary: This adds the implementation for the report generation for the
Outlier Detector class. This includes both the generation of a
dictionary containing each module that had an observer attached and any
relavent stats collected by the observer that can help shed light on
outlier relavent data or computed metrics. It also includes a string
denoting specific modules that had outliers and gives a bit of insight
into what channels they are contained in.

This contains both the implementation for the report generation for the
outlier detector as well as a test class to test the report generation
functionality.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80937
Approved by: https://github.com/andrewor14
2022-07-12 19:56:33 +00:00
zaf
55d1b376ea [ao][sparsity] Vectorized WeightNormSparsifier (#80059)
The previous implementation was using loops to compute the sparsity within a block in a mask, as well as across the mask blocks. This implements the vectorized version.

## Vectorization:

A high level overview of the vectorization procedure falls into a two step process:

### Tensor-level masking

A tensor-level masking is a mask generation routine that has a granularity of `sparse_block_shape`. That means that only patches of that shape can be considered sparse/dense. To vectorize:

1. Reshape the data such that one of the dimensions represents the patches of sparse_block_shape.
2. Create a mask of the same shape as the reshaped data
3. Find the smallest `k` elements in the the data, given the dimension of the sparse "patches". `k` represents a derived paramter specifying the sparsity level.
4. Apply the 0/1 to the patches in the mask
5. Reshape the mask back to the original dimensions

Note: because the shape of the mask might not be multiple of the sparse_block_shape, we nudge the sshape of the mask, and truncate it afterwards.

## Block-level masking

A block-level masking is a mask generation routine that concerns itself only with sparsity within a patch of shape `sparse_block_shape`. This is useful when block sparsity allows partial block sparsification.

To vectorize:

Overall the block-level masking follows the same routine as the tensor-level algorithm described above. One distinction is that when reshaping the data/mask tensors we aim for creating a dimension that captures the internals of each patch. For example, if a `sparse_block_shape` is `(2, 2)`, we want to reshape the data/mask into `(2, 2, -1)`. That allows us to sort the internal elements on the last axis, and zero-out the ones that obey the sparse logic.

Differential Revision: [D37352494](https://our.internmc.facebook.com/intern/diff/D37352494/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37352494/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80059
Approved by: https://github.com/jerryzh168
2022-07-12 19:16:44 +00:00
PyTorch MergeBot
caee732aa1 Revert "[quant][fx] Support keyword arguments for functional linear (#79095)"
This reverts commit d71fb40d98.

Reverted https://github.com/pytorch/pytorch/pull/79095 on behalf of https://github.com/jerryzh168 due to broken master
2022-07-09 21:45:01 +00:00
Jerry Zhang
d71fb40d98 [quant][fx] Support keyword arguments for functional linear (#79095)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/78117
Fixes: https://github.com/pytorch/pytorch/issues/73463

This PR adds a normalization pass that normalizes all the args to keyword args in positional order and fixes lowering code that previously
only uses node.args to use both args and kwargs instead.

Also tried to add a test for F.conv2d, but since conv2d matches multiple schemas we are doing an extra schema match, and because we are using symbolic values
in `transform`, we don't have a schema match, so F.conv2d still fails with runtime errors. we can resolve this issue later when there is a need.

Another thing I'm considering is to do the normalization with real inputs instead of symbolic inputs and not rely on operator_schemas (which is based on torchscript),
and rely on inspect.signature, I tried this briefly but didn't get too far, it looks like we cannot get the python signature for `torch._C._nn.linear`, it might be possible to fix as well, but will need follow up discussions.

The goal for this PR is just to introduce normalization in our codebase so that we can adapt some downstream code to this, and also fix the F.linear issue.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_normalize_args

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D37163228](https://our.internmc.facebook.com/intern/diff/D37163228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79095
Approved by: https://github.com/andrewor14
2022-07-09 20:01:09 +00:00
Zafar
68ec793cfd [ao] Moving the sparsity/experimental to sparsity/_experimental (#81149)
The experimental code in the sparsity does not have user-facing api,
and should reside under the proivate package. This involves pruner and
base_sparsifier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81149
Approved by: https://github.com/macandro96
2022-07-09 03:00:11 +00:00
Andrew Or
8fab682e47 [Quant][fx][bc-breaking] Do not move models to CPU in convert (#80555)
Summary: Previously, we automatically moved the model to CPU in
torch.ao.quantization.fx.convert to work around the issue where
certain functions called by convert expect CPU arguments. This
commit pushes this responsibility to the caller since it is the
user's decision of which device to use.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

BC-breaking Notes:

Before:
```
model = resnet18(...)
model = prepare_fx(model, qconfig_mapping, example_inputs)
... # calibrate
model = convert_fx(model)
```
After:
```
model = resnet18(...)
model.cpu()
model = prepare_fx(model, qconfig_mapping, example_inputs)
... # calibrate
model = convert_fx(model)
```

Reviewers: jerryzh168

Subscribers: jerryzh168

Differential Revision: [D37528830](https://our.internmc.facebook.com/intern/diff/D37528830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80555
Approved by: https://github.com/jerryzh168
2022-07-08 19:23:57 +00:00
vspenubarthi
6a7ed56d79 [ao] Added OutlierDetector observer insert implementation (#80880)
Summary: This adds the implementation for observer insertion point
selection for the OutlierDetector. For this detector, the insertion
points are to insert a ModelReportObserver before any leaf level module
to study the distribution of data that passes into the module to detect
outliers.

This commit contains the implementation of the observer insertion as
well as the relavent test case. Some code from the
InputWeightEqualization was abstracted and made more modular so the same
helper function could be used for multiple outlier class tests.

As a part of the work for this, there was testing done to determine what
a good default ratio threshold and reference percentile would be, and
the work to determine this (based on a normal distribution) was then
analyzed to find good paramters.

We still want to keep thresholds and reference percentile as something
the user can input because these were based on a normal distribution,
and it can definately vary depending on the type of data a user has.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80880
Approved by: https://github.com/andrewor14
2022-07-08 15:36:20 +00:00
Salil Desai
5c12cd224f [PyTorch Edge] Add qnnpack bcsr matrix unpacking and use unpacking in Linear module (#80475)
Having unpacking removes the need to store the original dense weights in the python Linear module

Differential Revision: [D34699287](https://our.internmc.facebook.com/intern/diff/D34699287/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34699287/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80475
Approved by: https://github.com/qihqi
2022-07-07 15:32:21 +00:00
Salil Desai
eaf817df3a [PyTorch Edge] Add serialization/deserialization of Sparse Quantize Linear Packed Params (#80474)
Packed Params are serialized/deserialized in sparse form

Differential Revision: [D34392761](https://our.internmc.facebook.com/intern/diff/D34392761/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34392761/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80474
Approved by: https://github.com/qihqi
2022-07-07 15:30:02 +00:00
Salil Desai
523b081a64 [PyTorch Edge] Remove Original Weight Tensor from QNNPack Sparse Quantized Linear Packed Params (#80473)
We plan to add serialization/deserialization wihout the original weight tensor, so we no longer need to store it

Differential Revision: [D34617321](https://our.internmc.facebook.com/intern/diff/D34617321/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80473
Approved by: https://github.com/qihqi
2022-07-07 15:11:44 +00:00
macandro96
daf00e843a [ao][sparsity] Bug Fix: data norm sparsifier not working with 1D tensors/parameters (#80465)
Issue:
Previously, the L1/L2 norm data sparsifier was not supported with
1D tensors or parameters.

Fix:
If the tensor is 1D, then unsqueeze it to make it look 2D and
perform the rest as usual. Also, added some 1D tensor in the
unit test to test this issue.

Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80465
Approved by: https://github.com/z-a-f
2022-07-06 21:04:19 +00:00
macandro96
ec594dd305 [ao][sparsity] Bug fix: data not correctly attached to the sparsifier (#80394)
Issue:
Previously, the data was not "attached" to the data sparsifier. Meaning
the data sparsifier created a copy of the actual data inside it's container. So,
when the data was modified outside of the sparsifier, the changes was not reflected
in the sparsifier.

Fix:
Use register_buffer() instead of nn.Parameter(..) to store the data inside the container.
Also, added a unit-test to reference this issue.

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80394
Approved by: https://github.com/z-a-f
2022-07-06 20:57:32 +00:00
Vasiliy Kuznetsov
ce0786add2 fx quant: fix warning in util function when cloning tensors (#80883)
Summary:

Some of the util functions in FX graph mode quantization throw warnings
such as:

```
/Users/vasiliy/pytorch/torch/ao/quantization/fx/utils.py:410: UserWarning: To copy construct from
a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().
requires_grad_(True), rather than torch.tensor(sourceTensor).
```

This PR fixes the warnings by moving the code to the recommended syntax if the
value is a tensor.

Test plan:

```
python test/test_quantization.py -k test_conv_linear_reference
// warning appeared before this PR and disappeared after this PR
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80883
Approved by: https://github.com/jerryzh168
2022-07-06 12:44:10 +00:00
Jiaxu Zhu
280f4704b7 [torch.fx] Check node type before fetching .users (#80166)
Summary:
as title
currently it fails when `node` is actually a constant instead of `fx.Node`

Test Plan: existing unit tests

Differential Revision: D37389003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80166
Approved by: https://github.com/jerryzh168
2022-07-05 23:32:22 +00:00
asl3
5b493ba18b [quant] Refactor quantize clamping into float_to_apot util function (#80885)
### Summary:
This PR moves the clamping functionality from `quantize` to `float_to_apot` util function to align with the uniform quantize workflow in the codebase.

### Test Plan:
Run unit tests with:
python pytorch/test/quantization/core/experimental/test_quantizer.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80885
Approved by: https://github.com/dzdang
2022-07-05 19:28:37 +00:00
vspenubarthi
e5162dcfa7 [ao] Added framework for ModelReport Outlier Detector (#80743)
Summary: This adds the class framework for the ModelReport
OutlierDetector. This detector will be in charge of looking at
activation data and figuring out whether there are significant oultiers
present in them. It will average this data across batches to make a
recommendation / warning if significant outliers are found.

This commit contains just the class framework and a base test class.
Implementations will follow in following commits.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80743
Approved by: https://github.com/HDCharles
2022-07-01 01:03:31 +00:00
PyTorch MergeBot
b64096a264 Revert "Add prelu op and module for quantized CPU backend (#73491)"
This reverts commit 3a6d6bc3cc.

Reverted https://github.com/pytorch/pytorch/pull/73491 on behalf of https://github.com/malfet due to Broke Windows builds, see 3a6d6bc3cc
2022-06-30 12:54:39 +00:00
Weiwen Xia
3a6d6bc3cc Add prelu op and module for quantized CPU backend (#73491)
Add prelu op and module for quantized CPU backend.
The PR includes:
- Quantized version of prelu op
- Native prelu kernel for quantized CPU
- Prelu modules in `nn` and `nn.quantized`
- FX support for prelu
- Unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491
Approved by: https://github.com/jerryzh168
2022-06-30 06:50:22 +00:00
Jerry Zhang
1a7e560ade [quant] Refactor quantization tracer to a separate file (#80268)
Summary:
att, since we need to reuse the tracer in some other places

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D37435748](https://our.internmc.facebook.com/intern/diff/D37435748)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80268
Approved by: https://github.com/vkuzo
2022-06-30 00:49:57 +00:00
HDCharles
fa6b6842e1 [ao][sparsity] removing leading '.' from fqn in utils (#79774)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):
* __->__ #79774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79774
Approved by: https://github.com/z-a-f
2022-06-30 00:00:56 +00:00
HDCharles
3f1dc7ec00 [quant] Create default custom modules for LSTM and MHA (#79960)
Summary:
Currently we expect the users to provide custom modules for LSTM and MHA. However, as we almost always ask the users to use those modules in the custom context, it is better to make this behavior default. In this case we try to align with the base quantization API, if the user specifies a custom_config_dict then that is used, however if the value is left as None then the default is used. If a user would like to both use the default and modify it, they have to do so manually, however the default is accessible by get_default_custom_config_dict

Additionally, the NS which uses prepare to insert custom observers for
its purposes had to be slightly modified to pass in an empty
custom_config_dict in order to avoid modifying the custom modules.

due to weird CI issues with previous PR,
previous discussion can be found: https://github.com/pytorch/pytorch/pull/71192

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79960
Approved by: https://github.com/z-a-f
2022-06-30 00:00:46 +00:00
Andrew Or
c44317704a [Quant][fx] Add default configs for fixed qparams ops (#80184)
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
2022-06-29 23:07:26 +00:00
Andrew Or
17104d3d7f [Quant][fx][bc-breaking] Replace is_reference with convert_to_reference (#80091)
Summary: This PR removes the is_reference flag from the  existing
convert_fx API and replaces it with a new convert_to_reference
function. This separates (1) converting the prepared model to a
reference model from (2) lowering the reference model to a quantized
model, enabling users to call their custom lowering function for
custom backends. For the native fbgemm backend, for example, the
following are equivalent:

```
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx

prepared = prepare_fx(model, ...)
quantized = convert_fx(prepared, ...)
```

```
from torch.ao.quantization.fx import lower_to_fbgemm
from torch.ao.quantization.quantize_fx import (
    prepare_fx,
    convert_to_reference
)

prepared = prepare_fx(model, ...)
reference = convert_to_reference(prepared, ...)
quantized = lower_to_fbgemm(reference, ...)
```

Note that currently `lower_to_fbgemm` takes in two other arguments
that are difficult for users to provide. A future commit will remove
these arguments to make the helper function more user friendly.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Differential Revision: [D37359946](https://our.internmc.facebook.com/intern/diff/D37359946)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80091
Approved by: https://github.com/jerryzh168
2022-06-29 23:01:27 +00:00
asl3
5070f5d18f [quant] Implement APoT fake quantization (#79845)
### Summary:
This PR implements APoT fake quantization for the purpose of quantization aware training. This implements `calculate_qparams` and `forward `methods to be used in fake quantization.

### Test Plan:
Run unit tests with: `python pytorch/test/quantization/core/experimental/test_fake_quantize.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79845
Approved by: https://github.com/dzdang
2022-06-28 18:15:26 +00:00
zaf
cb5ef130b6 [ao][sparsity] Fixing failing internal pruner tests (#80111)
After a recent change in the base_sparsifier API, the internal pruner started failing. This adopts the testcases to the change:

1. Changed `module_groups` to `groups`
2. Changed the fusion logic from taking care of the whole fused module to handling the submodules individually.

Differential Revision: [D37364801](https://our.internmc.facebook.com/intern/diff/D37364801/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37364801/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80111
Approved by: https://github.com/macandro96
2022-06-28 04:38:58 +00:00
Andrew Or
8aedd8fb25 [Quant][fx] Hide equalization_config from prepare APIs (#80164)
Summary: This PR hides the equalization_config argument from
prepare_fx. This is a private API that we do not wish to expose
to users and have to maintain backward compatibility for.

Test Plan:
python test/test_quantization.py TestEqualizeFx

Reviewers: jerryzh168

Subscribers: jerryzh168

Differential Revision: [D37394353](https://our.internmc.facebook.com/intern/diff/D37394353)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80164
Approved by: https://github.com/jerryzh168
2022-06-28 04:20:34 +00:00
vspenubarthi
d4f8a6d05b [ao] Implemented report generation for InputWeightEqualization Detector (#80191)
Summary: This adds the implementation for the InputWeightEqualization
detector. This includes both the implementation and the relavent test
cases. This detector is meant to be added to initialize a ModelReport
instance and it will keep track of the necessary statistics to decide if
for certain layers of interest (linear and conv for now), it makes sense
to use input weight equalization and gives the suggestion to the user.

This includes the implementation and subsequent tests for the report
generation functionality of the detector. The full detector should now
be fleshed out and complete with this addition. This included
modifications to the ModelReportObserver class as well to capture min
and max per channel values. In addition, instead of passing in the
observer class to instantiate, the detectors now pass the ModelReport
instance the observer instance that they themselves instantiate.

Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80191
Approved by: https://github.com/HDCharles, https://github.com/andrewor14
2022-06-28 01:30:20 +00:00
asl3
2727d88569 [quant] Modify APoT global methods to align with uniform API (#80364)
### Summary:
This PR updates the APoT global API method signatures and parameters for `dequantize_APoT` and `calculate_qparams` to align with their uniform counterparts in the codebase.

### Test Plan:
Run unit tests with:
`python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
`python pytorch/test/quantization/core/experimental/test_quantizer.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80364
Approved by: https://github.com/jerryzh168
2022-06-27 22:48:09 +00:00
vspenubarthi
4f94c8e039 [ao] Implemented InputWeightEqualization Detector observer insertion (#79962)
Summary: This adds the implementation for the InputWeightEqualization
detector. This includes both the implementation and the relavent test
cases. This detector is meant to be added to initialize a ModelReport
instance and it will keep track of the necessary statistics to decide if
for certain layers of interest (linear and conv for now), it makes sense
to use input weight equalization and gives the suggestion to the user.

This implements the functionality of adding observer points for the
input-weight equalization detector and contains the relavent tests for
this functionality. The full Detector functionality will be fleshed out
in a later commit.

Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79962
Approved by: https://github.com/HDCharles, https://github.com/andrewor14
2022-06-27 22:45:42 +00:00
anjali411
f68f77610a Add __all__ to torch.nn.quantized, fx.passes, ao.nn and amp submodules (#80376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80376
Approved by: https://github.com/albanD
2022-06-27 21:36:27 +00:00
anjali411
3bcc19b29a Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80367
Approved by: https://github.com/albanD
2022-06-27 21:27:30 +00:00
asl3
777c12f2df [quant] Modify APoT nonuniform quantization workflow (#80075)
### Summary:
This PR updates the design of APoT Observer, Quantizer, and Tensor to be more consistent with their uniform counterparts in the PyTorch framework. APoT Observer now calculates alpha as the max between the absolute values of the max and min values in the input tensor. APoT Quantizer is modified so its instance methods quantize_APoT and dequantize_APoT are called by their global method counterparts. APoT Tensor is modified to account for the new method definition of the `quantize_APoT` from APoT Quantizer.

### Test Plan:
Run APoT Observer class unit tests with: `python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
Run APoT Quantize class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantizer.py`
Run APoT Tensor class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantized_tensor.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80075
Approved by: https://github.com/jerryzh168
2022-06-27 14:54:06 +00:00
asl3
0eee81aaad [quant] Modify APoT qparam quantization levels calculation (#80303)
### Summary:
This PR updates an error in the the computation for APoT quantization levels to match the formula defined in the APoT paper: https://arxiv.org/pdf/1909.13144.pdf.

### Test Plan:
Run unit tests with:` python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80303
Approved by: https://github.com/dzdang
2022-06-27 13:34:05 +00:00
macandro96
524d181267 [ao][sparsity] Implemented state_dict() and load_state_dict() functions (#79883)
The state_dict() of the DataScheduler contains all the class attributes of
the scheduler other than the data_sparsifier.
Remember to store and restore the state_dict() of the data_sparsifier along
with the scheduler

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataScheduler```

Differential Revision: [D37358607](https://our.internmc.facebook.com/intern/diff/D37358607)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79883
Approved by: https://github.com/HDCharles, https://github.com/z-a-f
2022-06-24 16:55:06 +00:00
macandro96
af4e2b2c42 [ao][sparsity] Implemented the step() function (#79822)
The step() function calls the implemented get_schedule_param() and updates
the data_group dictionary of the data sparsifier.

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataScheduler```

Differential Revision: [D37358609](https://our.internmc.facebook.com/intern/diff/D37358609)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79822
Approved by: https://github.com/z-a-f
2022-06-24 16:53:33 +00:00
macandro96
70b7bca423 [ao][sparsity] Base scheduler class for Data Schedulers (#79817)
The BaseDataScheduler is the abstract scheduler class specifically for the
BaseDataSparsifier class. This class controls a specific hyperparameter of
the sparsifier class and varies it across the training process (or across time).

Args:
    data_sparsifier (instance of BaseDataSparsifier)
        Implemented class data sparsifier class wherein the update_mask is implemented
    schedule_param (str)
        A specific hyperparameter of the passed sparsifier that needs to be scheduled/varied
    last_epoch (int, default=-1)
        This is specifically is passed when training needs to be resumed from a particular
        point.
    verbose (bool, default=False)
        Verbosity of the BaseDataScheduler

The *get_schedule_param()* function needs to be implemented by the user.

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataScheduler```

Differential Revision: [D37358608](https://our.internmc.facebook.com/intern/diff/D37358608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79817
Approved by: https://github.com/jerryzh168, https://github.com/z-a-f
2022-06-24 16:51:52 +00:00
vspenubarthi
845021db2c [ao] Adds framework for InputWeightEqualization Detector (#79916)
Summary: This adds the framework (method signatures and descriptors) for
the InputWeightEqualization Detector. There is no code implemenation yet
so the test suite for this is a simple pass. This Detector will be used
to determine whether input weight equalization should be recommended.

Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79916
Approved by: https://github.com/HDCharles
2022-06-24 14:51:15 +00:00
HDCharles
9cbc692ba8 [ao][sparsity] adding type hints to sparsifier and utils
Summary: no functional changes to the code, only type hints/formatting

Test Plan: python test/test_ao_sparsity.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79598

Approved by: https://github.com/z-a-f
2022-06-23 04:39:28 +00:00
HDCharles
655419fc59 [ao][sparsity] removing old sparsity API and updating tests
Summary: per our design discussion about sparsity API, we're
discontinuing the old API in favor of the new tensor_fqn based one.

The pruning class has not been updated mostly because this change
doesn't cause any further knock on effects.

Test Plan: python test/test_ao_sparsity.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79597

Approved by: https://github.com/z-a-f
2022-06-22 23:55:20 +00:00
vspenubarthi
70be6f8470 [ao] Added generate report capability to ModelReport class
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the implementation and the tests for the generate_report
method which is used on a callibrated fx model to generate reports based
on data collected by the inserted observers during the callibration
phase and also potentially remove those observers if desired.

This also addresses and fixes a revert issue that has been fixed.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80054

Approved by: https://github.com/HDCharles
2022-06-22 21:19:29 +00:00
vspenubarthi
f714d8f574 [ao] Added insertion of observer capability to ModelReport class
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the implementation and tests for the
prepare_detailed_calibration method which is used on a prepared fx model
to insert the desired observers for the different detectors.

This also fixes a revert issue with the applied fix.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80053

Approved by: https://github.com/HDCharles
2022-06-22 21:16:17 +00:00
vspenubarthi
01720ae3b6 [ao] Added ModelReport class outline for Fx Graph Modules
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the init method and the signatures and docs for each
of the proposed helper functions.

This also address and fixes a revert issue.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80052

Approved by: https://github.com/HDCharles
2022-06-22 21:12:58 +00:00
asl3
82a1961129 [quant] Implement APoT_tensor class
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79940

Approved by: https://github.com/dzdang
2022-06-22 18:18:39 +00:00
asl3
184443f1b4 [quant] Modify APoT utility function comments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80028

Approved by: https://github.com/dzdang
2022-06-22 16:10:48 +00:00
PyTorch MergeBot
ea6fa8dc95 Revert "[ao] Added ModelReport class outline for Fx Graph Modules"
This reverts commit 0f95e1846c.

Reverted https://github.com/pytorch/pytorch/pull/79595 on behalf of https://github.com/malfet due to Broke tests on MacOS, see 0f95e1846c
2022-06-22 12:43:07 +00:00
PyTorch MergeBot
4bed8a23e2 Revert "[ao] Added insertion of observer capability to ModelReport class"
This reverts commit 1cff414784.

Reverted https://github.com/pytorch/pytorch/pull/79752 on behalf of https://github.com/malfet due to Part of the stack that broke tests on MacOS, see 0f95e1846c
2022-06-22 12:41:22 +00:00
PyTorch MergeBot
4e8e817dde Revert "[ao] Added generate report capability to ModelReport class"
This reverts commit 7751ed41a6.

Reverted https://github.com/pytorch/pytorch/pull/79792 on behalf of https://github.com/malfet due to Part of the stack that broke  tests on MacOS, see 0f95e1846c
2022-06-22 12:38:55 +00:00
vspenubarthi
7751ed41a6 [ao] Added generate report capability to ModelReport class
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the implementation and the tests for the generate_report
method which is used on a callibrated fx model to generate reports based
on data collected by the inserted observers during the callibration
phase and also potentially remove those observers if desired.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79792

Approved by: https://github.com/HDCharles
2022-06-22 06:27:17 +00:00
vspenubarthi
1cff414784 [ao] Added insertion of observer capability to ModelReport class
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the implementation and tests for the
prepare_detailed_calibration method which is used on a prepared fx model
to insert the desired observers for the different detectors.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79752

Approved by: https://github.com/HDCharles
2022-06-22 06:03:14 +00:00
asl3
0b349f7e69 [quant] Dequantize apot tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79530

Approved by: https://github.com/dzdang, https://github.com/jerryzh168
2022-06-22 05:15:06 +00:00
asl3
d6ec8398a9 [quant] Implement quantize APoT method
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79499

Approved by: https://github.com/dzdang, https://github.com/jerryzh168
2022-06-22 05:15:06 +00:00
asl3
f89e640810 [quant] Add quantizer class skeleton
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79936

Approved by: https://github.com/jerryzh168
2022-06-22 05:11:15 +00:00
vspenubarthi
0f95e1846c [ao] Added ModelReport class outline for Fx Graph Modules
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the init method and the signatures and docs for each
of the proposed helper functions.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79595

Approved by: https://github.com/andrewor14
2022-06-22 02:47:24 +00:00
vspenubarthi
0656e9e595 [ao] Adding model report detector base class and implemented detectors
Summary: the goal is to add a base class to the model report detectors
so that they can contain a lot more specific information compared to the
primary model report class related to the observers and where they are
inserted etc.

Since this is just a base class, the testing will be with the
implemenations of the classes that derive from the base class

The two current detector methods were turned into Detector classes and
the tests were modified to reflect this, but the same functionality was
tested.

As a result, _detector.py was changed to detector.py

Test Plan: python test/test_quantization.py TestFxModelReportDetector
python test/test_quantization.py TestFxModelReportDetectDynamicStatic

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79671

Approved by: https://github.com/andrewor14
2022-06-21 00:21:50 +00:00
HDCharles
644c3cfa0a [ao][sparsity] add option for tensor_fqn to sparsity API
Summary: updated sparsity api to accept tensor_fqn as primary
specification method, i.e. [{'tensor_fqn':
'linear.weight'}]

Pruning API also updated due to knock on changes.

left old api for accepting module_fqns but changed 'fqn' to 'module_fqn'
for clarity (this will break BC)

updated variables in code to use module rather than layer

updated state dict to use tensor_fqn rather than 'fqn' or 'module_fqn'

Test Plan: python test/test_ao_sparsity.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79113

Approved by: https://github.com/z-a-f
2022-06-20 16:08:34 +00:00
asl3
228e082ca9 [quant] Refactor nonuniform quantization mapping functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79790

Approved by: https://github.com/dzdang
2022-06-20 13:06:22 +00:00
Vasiliy Kuznetsov
7b4e92acef fx quant: refactor qconfig setting out of find_matches
Summary:

Refactors `find_matches` function to only find subgraph
matches and not assign qconfigs to them. Moves the qconfig assignment
outside of the function. No logic change.

This will useful for prototyping future tools for quantizing
parts of the model. These tools will need to know the matches
and will reuse the `find_matches` function,
but they will assign their own qconfigs to them using a different
strategy.

Test plan:

```
python test/test_quantization.py -k Fx
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79713

Approved by: https://github.com/jerryzh168
2022-06-17 18:52:00 +00:00
Andrew Or
78144b9f35 [Quant][fx][bc-breaking] Replace *custom_config_dict with config objects
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79066

Following https://github.com/pytorch/pytorch/pull/78452,
this commit replaces the following config dicts with python objects:

- prepare_custom_config_dict -> PrepareCustomConfig
- convert_custom_config_dict -> ConvertCustomConfig
- fuse_custom_config_dict -> FuseCustomConfig

This leads to better type safety and better user experience in
notebook settings due to improved auto completion. The new APIs
are as follows:

```
from torch.ao.quantization.fx.custom_config import PrepareCustomConfig

prepare_custom_config = PrepareCustomConfig() \
    .set_float_to_observed_mapping(float_class, observed_class) \
    .set_non_traceable_module_names(["mod1", "mod2"]) \
    .set_non_traceable_module_classes([class1, class2]) \
    .set_input_quantized_indexes([0, 1]) \
    .set_output_quantized_indexes([0]) \
    .set_preserved_attributes(["attr1", "attr2"])

convert_custom_config = ConvertCustomConfig() \
    .set_observed_to_quantized_mapping(observed_class, quantized_class) \
    .set_preserved_attributes(["attr1", "attr2"])

model = prepare_fx(
    model,
    qconfig_mapping,
    example_inputs,
    prepare_custom_config=prepare_custom_config)

model(data)

model = convert_fx(model, convert_custom_config=convert_custom_config)
```

For backwards compatibility, prepare_fx, prepare_qat_fx, and
convert_fx will continue to accept Dicts, which will be converted
to the relevant *CustomConfig object internally.

Note that this commit does not modify existing tests to use the
new API; they will continue to pass in Dicts as before, which still
works but triggers a deprecation warning. This will be handled in
a future commit.

Differential Revision: [D37088095](https://our.internmc.facebook.com/intern/diff/D37088095/)

Approved by: https://github.com/jerryzh168
2022-06-16 17:50:07 +00:00
macandro96
751fbc4ce4 [ao][sparsity] Support for L2 norm based block data sparsifier
L2-Norm Sparsifier
This sparsifier computes the *L2-norm* of every sparse block and "zeroes-out" the
ones with the lowest norm. The level of sparsity defines how many of the
blocks is removed.
This sparsifier is controlled by three variables:
1. `sparsity_level` defines the number of *sparse blocks* that are zeroed-out
2. `sparse_block_shape` defines the shape of the sparse blocks. Note that
    the sparse blocks originate at the zero-index of the tensor.
3. `zeros_per_block` is the number of zeros that we are expecting in each
    sparse block. By default we assume that all elements within a block are
    zeroed-out. However, setting this variable sets the target number of
    zeros per block. The zeros within each block are chosen as the *smallest
    absolute values*.

Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79535

Approved by: https://github.com/z-a-f
2022-06-16 17:44:57 +00:00
macandro96
419b82e0fa [ao][sparsity] L1 norm based block data sparsifier
L1-Norm Sparsifier
This sparsifier computes the *L1-norm* of every sparse block and "zeroes-out" the
ones with the lowest norm. The level of sparsity defines how many of the
blocks is removed.
This sparsifier is controlled by three variables:
1. `sparsity_level` defines the number of *sparse blocks* that are zeroed-out
2. `sparse_block_shape` defines the shape of the sparse blocks. Note that
    the sparse blocks originate at the zero-index of the tensor.
3. `zeros_per_block` is the number of zeros that we are expecting in each
    sparse block. By default we assume that all elements within a block are
    zeroed-out. However, setting this variable sets the target number of
    zeros per block. The zeros within each block are chosen as the *smallest
    absolute values*.

Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79534

Approved by: https://github.com/z-a-f
2022-06-16 17:43:22 +00:00
macandro96
70fc865237 [ao][sparsity] Support for embeddings and embedding bags in BaseDataSparsifier
Added the embedding and embedding bags in the supported data types. Currently, the base data sparsifier extracts the weight
and stores it as a parameter in the internal module container whose requires_grad=False. The embeddings inside the data sparsifier
are non-trainable

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79254

Approved by: https://github.com/z-a-f
2022-06-16 17:41:38 +00:00
macandro96
490631aaf3 [ao][sparsity] Support for the nn.Parameter in BaseDataSparsifier
Users can now just pass in a nn.Parameter (or layer.weight) to the Data Sparsifier.
Note: The data sparsifier stores the passed nn.Parameter as a new parameter in the internal container module whose requires_grad=False.
So, essentialy when the parameter is trained, it's new values are not reflected inside the data sparsifier class

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79253

Approved by: https://github.com/z-a-f
2022-06-16 17:40:06 +00:00
macandro96
69347e9f64 [ao][sparsity] Implemented state dict and serialization functionalities
The state of the data sparsifier object contains the name->mask mapping, name -> config mapping and the state_dict() of the container.
The load_state_dict() and __set_state__() automatically creates a container moduie and loads the named data internally without having the user to intervene.

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79427

Approved by: https://github.com/z-a-f
2022-06-16 17:38:25 +00:00
macandro96
596095dc82 [ao][sparsity] Support for sparsifying data operations on raw torch tensors.
The users can now pass in raw torch tensors and the base class handles all the parametrizations and masking

Example -
    >>> data_list = [('tensor_1', torch.randn(3,3)), ('tensor_2', torch.randn(4,4))]
    >>> defaults = {'sparsity_level': 0.7}
    >>> sparsifier = DerivedDataSparsifier(data_list = data_list, **defaults) # Some sparsifier that inherits BaseDataSparsifier
    >>> new_tensor_to_add = {'name': 'tensor_3', 'data': torch.randn(5,5), 'sparsity_level': 0.3}
    >>> sparsifier.add_data(**new_tensor_to_add)
    >>> # tensor_1 and tensor_2 will have sparsity_level of 0.7 but tensor_3 will have sparsity_level=0.3

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79252

Approved by: https://github.com/HDCharles, https://github.com/z-a-f
2022-06-16 17:36:45 +00:00
macandro96
15828bcfd7 [ao][sparsity] Base class for Data Sparsifier
Base Data Sparsifier class for all Data sparsifiers.
The abstract class accepts raw torch tensors / embedding / embedding bags (refer to SUPPORTED_TYPES above)
to prepare for sparsification.
In this case, mask (and parametrizations) is owned by the class and not by the user.
Specifically, the container object inside the class maintains the mask and parametrizations of the input data

Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79251

Approved by: https://github.com/z-a-f, https://github.com/HDCharles
2022-06-16 17:31:22 +00:00
Andrew Or
61a1eef7fc [Quant][fx] Add get_default_qconfig_mapping
Summary: This follows https://github.com/pytorch/pytorch/pull/78452,
which replaced the qconfig_dict with QConfigMapping. This PR
additionally replaces get_default_*qconfig_dict with
get_default_*qconfig_mapping. For backward compatibility, we
deprecate the old functions instead of removing them.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo, supriyar

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618

Approved by: https://github.com/jerryzh168
2022-06-16 16:10:14 +00:00
asl3
afc037ae38 [quant] Add quantized levels visualization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79198

Approved by: https://github.com/HDCharles
2022-06-16 06:10:34 +00:00
asl3
81f277002e [quant] Add param calcs and tests for APoT observers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78905

Approved by: https://github.com/dzdang
2022-06-15 23:24:48 +00:00
vspenubarthi
38952d9350 [ao] Added function to inform dynamic vs static appropriate
Summary: The _detect_dynamic_vs_static function was added to take in a
prepared fx graph model that already had ModelReportObservers built into
it and uses the collected information to determine whether input and
output are stationary or non-stationary and provides feedback on whether
to make linear modules static or dynamic based on this information.

This PR will be followed up soon with another PR that will more
rigoursly test the whole end to end performance of this system, which is
primarily how the function in this PR will be tested for functionality,
which is why this one only has 1 test.

Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79326

Approved by: https://github.com/HDCharles
2022-06-15 02:51:27 +00:00
vspenubarthi
8e05513152 [ao] Added ModelReportObserver to inform on dynamic vs static
Summary: The purpose of this is to add to the module report functioality
by creating an observer that will take a prepared fx module and suggest
whether static or dynamic quantization is more appropriate. The tests
for this have been written and included in the location indicated by the
Test Plan

Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportObserver

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79243

Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
2022-06-14 19:08:40 +00:00
vspenubarthi
28c541776c [ao] Added fx model report per_channel detector
Summary: This code is meant to be a tool to help people get the most out
of their backend by hinting them to use per_channel quantization if it's
supported, which will help increase accuracy significantly. The code is
completed and ready to be reviewed.

Test Plan: test/quantization/fx/test_model_report_fx.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79104

Approved by: https://github.com/HDCharles
2022-06-10 08:09:59 +00:00
asl3
6fa202847e Add TODO comment
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79068

Approved by: https://github.com/dzdang
2022-06-09 17:30:52 +00:00
Vasiliy Kuznetsov
d4aa204f11 ns for fx: further fixes for kwargs-only
Summary:

This is a follow-up to https://github.com/pytorch/pytorch/pull/78181,
apparently that PR did not fix all errors in a Meta model using
the NS shadow APIs.

We do not have an OSS repro, so putting the PR up so we can test in fbcode.

Test plan:

```
python test/test_quantization.py -k FXNumericSuite -f
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79015

Approved by: https://github.com/dzdang
2022-06-08 15:29:48 +00:00
Vasiliy Kuznetsov
71e1992b0d quantization: remove most fp16 configs from fbgemm/qnnpack
Summary:

The fbgemm and qnnpack backends mostly support ops with quint8 activations.
Historically, the default backend config has included ops with fp16 activations
for other backends. This PR keeps the old config under a different name to keep
the functionality tested, and makes the default config match fbgemm/qnnpack ops.

Test plan:

```
python test/test_quantization.py -k TestQuantizeFx
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78528

Approved by: https://github.com/andrewor14
2022-06-06 19:02:53 +00:00
macandro96
a3468b7d4a [ao][sparsity] Added the Nearly Diagonal Sparsifier
This sparsifier creates a nearly diagonal mask to be applied to the weight matrix.
    Nearly Diagonal Matrix is a matrix that contains non-zero elements near the diagonal and the rest are zero.
    An example of a nearly diagonal matrix with degree (or nearliness) 3 and 5 are follows respectively.
    1 1 0 0       1 1 1 0
    1 1 1 0       1 1 1 1
    0 1 1 1       1 1 1 1
    0 0 1 1       0 1 1 1
    Note that a nearly diagonal matrix with degree 1 is just a matrix with main diagonal populated

    This sparsifier is controlled by one variable:
    1. `nearliness` defines the number of non-zero diagonal lines that are closest to the main diagonal.
        Currently - supports only odd number

    Note:
        This can be accelerated (vectorized) once the Spdiagonal feature (PR: #78439) is landed or the banded matrix
        feature is landed: https://stackoverflow.com/questions/52463972/generating-banded-matrices-using-numpy

Test Plan:

```
python test/test_ao_sparsity.py TestNearlyDiagonalSparsifier
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78448

Approved by: https://github.com/z-a-f, https://github.com/HDCharles
2022-06-04 04:30:32 +00:00
Jerry Zhang
063c93665c [quant] follow up fixes for prepare_fx/prepare_qat_fx calls in classyvision (#105) (#78660)
Summary:
X-link: https://github.com/fairinternal/ClassyVision/pull/105

As follow up for https://github.com/pytorch/pytorch/pull/76496, we fixes the TODOs in quantization tests
by providing correct example_inputs in the tests

Test Plan:
classyvision sandcastle and ossci

**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D36818665/V1/classyvision/)|

|**Modified Pages**|

Differential Revision: D36818665

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78660
Approved by: https://github.com/vkuzo
2022-06-03 01:08:45 +00:00
asl3
308d813d45 Add nonuniform observer class and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78680

Approved by: https://github.com/dzdang
2022-06-02 16:29:21 +00:00
Jerry Zhang
22fd2f2e05 [quant] Factor out common operator configs from native.py (#78407)
Summary:
Some helper functions that generate operator configs based on dtype_configs are reused in native backend and tensorrt, so we
factor out this part to a util file: common_operator_configs.py

Test Plan: buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt

Differential Revision: D36728359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78407
Approved by: https://github.com/vkuzo, https://github.com/andrewor14
2022-06-01 22:24:36 +00:00
asl3
7390658e80 Add APoT tensor class and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78577

Approved by: https://github.com/dzdang
2022-06-01 18:14:06 +00:00
Andrew Or
e41389f84b [Quant][docs] Replace qconfig_dict with QConfigMapping in docs
Summary: https://github.com/pytorch/pytorch/pull/78452 replaced
qconfig_dict with QConfigMapping as the default API for prepare_fx,
prepare_qat_fx, and convert_fx. We should update the docs to reflect
this change as well.

Test Plan:
```
cd docs
make html
cd build/html
python -m server.http
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78533

Approved by: https://github.com/vkuzo
2022-06-01 15:10:48 +00:00
Nikita Shulga
8f7e3791ef Make PyTorch importable on python-3.7.0 (#78500)
By stringifying "typing.OrderedDict", as [`typing.OrderedDict`](https://docs.python.org/3.10/library/typing.html#typing.OrderedDict) were introduced by Python-3.7.2+

See similar fix in 21a82fb519

Partially addresses https://github.com/pytorch/pytorch/issues/78499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78500
Approved by: https://github.com/atalman
2022-05-31 06:11:30 +00:00
Andrew Or
c7b4eec233 [Quant][fx][bc-breaking] Replace qconfig_dict with a config object (#78452)
**Summary:** Previously, FX graph mode quantization configurations
were specified through a dictionary of qconfigs. However, this
API was not in line with other core APIs in PyTorch. This commit
replaces this dictionary with a config object that users will
create and pass to prepare and convert. This leads to better
type safety and better user experience in notebook settings
due to improved auto completion.

The new API is as follows:

```
from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.quantize_fx import prepare_fx

qconfig_mapping = QConfigMapping()
    .set_global(qconfig)
    .set_object_type(torch.nn.Linear, qconfig)
    .set_module_name_regex("foo.*bar", qconfig)
    .set_module_name("mod", qconfig)

prepare_fx(model, qconfig_mapping)
```

For backwards compatibility, `prepare_fx`, `prepare_qat_fx`,
and `convert_fx` will continue to accept qconfig_dicts, which
will be converted to QuantizationConfigs internally.

Note that this commit does not modify existing tests to use the
new API; they will continue to pass in qconfig_dict as before,
which still works but triggers a deprecation warning. This will
be handled in a future commit.

**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

**Reviewers:** jerryzh168, vkuzo

**Subscribers:** jerryzh168, vkuzo

Differential Revision: D36747998

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452
Approved by: https://github.com/jerryzh168
2022-05-30 18:30:07 +00:00
Jerry Zhang
85f308275e [fx2trt] Fix dummy weight initialization in conv1d converter (#78402)
Summary:
att, currently it errors out with the following error:
```
---> 72         dummy_weight = trt.Weights(weight_shape)
     73         layer = network.add_convolution_nd(
     74             input=input_val,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. tensorrt.tensorrt.Weights(type: tensorrt.tensorrt.DataType = <DataType.FLOAT: 0>)
    2. tensorrt.tensorrt.Weights(a: numpy.ndarray)
```
full error: https://www.internalfb.com/phabricator/paste/view/P503598381
we need to pass arond a numpy ndarray instead of a shape here.

and support conv1d in backend_config_dict for tensorrt

Test Plan:
```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/converters:test_convolution
```

```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt
```

Differential Revision: D36721313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78402
Approved by: https://github.com/842974287
2022-05-27 04:48:45 +00:00
Jerry Zhang
7ea5fa3dd4 [reland][quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286

Approved by: https://github.com/dzdang
2022-05-25 23:31:51 +00:00
Vasiliy Kuznetsov
53e05ad4b2 ns for fx: remove restriction on nodes with no args and only kwargs
Summary:

Removes the restriction from NS for FX on handling nodes which have
no positional arguments, such as `F.linear(input=x, weight=w, bias=b).

In order to achieve this, we delete all places in the code which
were doing things like

```
node.args[0]
```

And replace them with

```
_get_normalized_nth_input(node, gm, 0)
```

The `_get_normalized_nth_input` function is a best effort way to
get the n'th normalized input.

This is needed because some FX tools output nodes normalized to
be kwargs only, and we need to be able to handle this in NS.

Test plan:

```
python test/test_quantization.py -k test_linear_kwargs_shadow
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78181

Approved by: https://github.com/z-a-f, https://github.com/hx89
2022-05-25 17:00:39 +00:00
PyTorch MergeBot
87148f2b59 Revert "[quant] Add utility function get_fqn_to_example_inputs"
This reverts commit 50a44fe461.

Reverted https://github.com/pytorch/pytorch/pull/78146 on behalf of https://github.com/suo due to as it broke master
2022-05-25 06:37:32 +00:00
Jerry Zhang
50a44fe461 [quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146

Approved by: https://github.com/vkuzo
2022-05-25 03:07:16 +00:00
Jerry Zhang
416899d1a9 [quant][fx][bc-breaking] Add required example_args argument to prepare_fx and prepare_qat_fx (#249) (#77608)
Summary:
X-link: https://github.com/facebookresearch/d2go/pull/249

X-link: https://github.com/fairinternal/ClassyVision/pull/104

X-link: https://github.com/pytorch/benchmark/pull/916

X-link: https://github.com/facebookresearch/ClassyVision/pull/791

X-link: https://github.com/facebookresearch/mobile-vision/pull/68

FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to
insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors.
Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base.

As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args
so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide
example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but
it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now.

If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to
pass the arguments by keyword

BC-breaking Note:
Before:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict)
# or
m = prepare_qat_fx(m, qconfig_dict)
```
After:
```python
m = resnet18(...)
m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
# or
m = prepare_qat_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Imported from OSS

**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D35984526/V30/classyvision/)|

|**Modified Pages**|

Reviewed By: vkuzo, andrewor14

Differential Revision: D35984526

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77608
Approved by: https://github.com/dzdang
2022-05-21 21:03:48 +00:00
Vasiliy Kuznetsov
c15fca1137 quant doc: improve rendered documentation for backend_config_dict
Summary:

This improves the documentation page for backend_config_dict to render
the configurations in a human readable format, such as

```
{
  'pattern': torch.nn.modules.pooling.AdaptiveAvgPool1d,
  'dtype_configs': [
    {
      'input_dtype': torch.quint8,
      'output_dtype': torch.quint8,
    },
    {
      'input_dtype': torch.float16,
      'weight_dtype': torch.float16,
      'bias_dtype': torch.float16,
      'output_dtype': torch.float16,
    },
  ],
  'observation_type': ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT,
},
```

The results are also now sorted alphabetically by the normalized name of
the root op in the pattern.

A couple of utility functions are created to help with this. If in the future
we convert backend_config_dict to use typed objects, we can move this logic
to the objects at that time.

Test plan:

```
cd docs
make html
cd build
python -m server.http
// renders correctly, example: https://gist.github.com/vkuzo/76adfc7c89e119c59813a733fa2cd56f
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77535

Approved by: https://github.com/andrewor14
2022-05-18 11:46:07 +00:00
Vasiliy Kuznetsov
b49a4be7ac ns for fx: remove duplicated BNReLU mappings
Summary:

These mappings are already defined for `BatchNorm{n}d` as the root
node, we don't need to specify them again. Removing to clean
up the code.

Test plan:

```
python test/test_quantization.py -k FXNumericSuite
python test/test_quantization.py -k FXGraphMatcher
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76993

Approved by: https://github.com/jerryzh168
2022-05-13 20:41:42 +00:00
Vasiliy Kuznetsov
d8479098a6 ns for fx: remove quantized ReLU6 from mapping
Summary:

This module is no longer swapped by FX graph mode quantization,
because it can take quantized inputs. Removing it from NS for FX
mappings.

Test plan:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76992

Approved by: https://github.com/jerryzh168
2022-05-13 20:38:31 +00:00
Vasiliy Kuznetsov
6a33b80191 ns for fx: remove GroupNorm from mapping
Summary:

GroupNorm quantization is defined but it looks like FX graph
mode quantization does not have it enabled.

Removing it from NS for FX.

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76991

Approved by: https://github.com/jerryzh168
2022-05-13 20:33:27 +00:00
Vasiliy Kuznetsov
6e05f76089 ns for fx: clean up linear in relationship mapping
Summary:

More cleanups in mappings:
1. makes the `nnqatd.Linear` entry be looked up dynamically
2. moves the `NonDynamicallyQuantizableLinear` down and marks it as edge case

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76990

Approved by: https://github.com/jerryzh168
2022-05-13 20:27:53 +00:00
Vasiliy Kuznetsov
9265cc2097 ns for fx: make torch.ops.quantized.dropout mapping dynamic
Summary:

Instead of hardcoding the relationship between `F.dropout` and `toq.dropout`,
read it from the mapping.

The mapping itself might need to be in the lowering file, but that's a separate
issue.

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76989

Approved by: https://github.com/jerryzh168
2022-05-13 20:26:06 +00:00
Vasiliy Kuznetsov
cc59641acb ns for fx: remove torch.ops.quantized.cat
Summary:

FX graph mode quantization no longer uses `torch.ops.quantized.cat`,
instead `torch.cat` can use quantized inputs.

This PR removes the outdated mapping from NS for FX.

Test plan:

```
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76988

Approved by: https://github.com/jerryzh168
2022-05-13 20:24:16 +00:00
Vasiliy Kuznetsov
20b75e3e5f ns for fx: clean up convtranspose mappings
Summary:

Fixes a couple of problems with `ConvTranspose` in NS mappings:
1. deletes the dynamic versions, as they do not work yet
2. deletes `ConvTranspose3d`, as it's not swapped yet in the quantization workflow
3. removes a duplicate set

Test plan:

```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76980

Approved by: https://github.com/jerryzh168
2022-05-13 20:22:42 +00:00
Vasiliy Kuznetsov
0e067e4cc9 ns for fx to backend_config_dict [2/x]: native lowering mappings
Summary:

NS for FX mappings were originally hardcoded, because quantization op
mappings were not easily reusable. Now that we have `backend_config_dict`,
we can start moving NS for FX to use them and delete the hardcoded mappings.

This PR deletes the hardcoded mappings from NS about the lowering step,
and instead reads them from the lowering configs.

Note: for now, there is no way to configure the tool to use lowering
configs from a different lowering pass. That may be added at some
future point, but it's not important now.

Test plan:

```
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76978

Approved by: https://github.com/jerryzh168
2022-05-13 20:21:01 +00:00
Vasiliy Kuznetsov
5419236946 ns for fx to backend_config_dict [1/x]: fused and qat modules
Summary:

NS for FX mappings were originally hardcoded, because quantization op
mappings were not easily reusable. Now that we have `backend_config_dict`,
we can start moving NS for FX to use them and delete the hardcoded mappings.

This first PR deletes the hardcoded mappings for `nni` and `nniqat` modules,
and instead reads these mappings from `backend_config_dict`.

Future PRs will incrementally move more of the mappings over.

Test plan:

```
python test/test_quantization.py -k FXNumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76958

Approved by: https://github.com/jerryzh168
2022-05-13 20:14:39 +00:00
dzdang
1d7b294574 [quant][better-engineering][bc-breaking] Removed quant_min/quant_max from fake_quant modules
Summary:
FakeQuantize class has quant_min/quant_max and activation_post_process
attributes, the latter of which already includes quant_min/max. As such,
we can remove quant_min/quant_max from FakeQuantize and use
FakeQuantize.activation_post_process.quant_m* directly.

Test plan:
```
python test/test_quantization.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76674

Approved by: https://github.com/vkuzo
2022-05-11 14:23:05 +00:00
Mengchi Zhang
d7035c1cbb [FX qconfig] add weighted_op_qint8_dtype_config for int8 TRT and import linear config to get_tensorrt_backend_config_dict (#76877)
Summary:
- Add weighted_op_qint8_dtype_config
- import import linear config to get_tensorrt_backend_config_dict()
- Add qconfig for _get_linear_configs()

Test Plan: CI

Differential Revision: D36152319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76877
Approved by: https://github.com/jerryzh168
2022-05-10 16:26:20 +00:00
Vasiliy Kuznetsov
3a8752db86 ns for fx: skip shadowing ops if copy subgraph is not implemented (#76663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76663

Subgraph copy does not handle all edge cases. It's high eng time
to handle them all, and currently an unhandled edge case crashes
the script.

This PR adds a function to check if the subgraph copy is supported,
and skips shadowing if it is not supported. This way the model
can still go through the shadowing APIs without an exception.

Test Plan:
```
python test/test_quantization.py -k FXNumericSuite
```

Reviewed By: hx89

Differential Revision: D36069304

Pulled By: vkuzo

fbshipit-source-id: 6b38b8d8e43396a4cf2373b247223a19d451d096
(cherry picked from commit e2322ca0635c51a4701e60fa90f77915a3c46d0f)
2022-05-05 13:19:53 +00:00
Vasiliy Kuznetsov
d3e338935a ns for fx: skip shadowing for torch.cat, and also for nodes with only kwargs (#76561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76561

User model had syntax like `torch.cat(tensors=[x])`. This PR fixes two errors
to unbreak this in NS shadow model:
1. skip nodes which only have kwargs (instead of throwing an exception)
2. explicitly skip shadowing of `torch.cat` (since it's not supported anyways)

Test Plan:
```
python test/test_quantization.py -k test_op_with_only_kwargs_skips_shadowing
python test/test_quantization.py -k test_op_mul_add_cat_skips_shadowing
```

Reviewed By: hx89

Differential Revision: D36017356

Pulled By: vkuzo

fbshipit-source-id: 0da4840a62c2dac183f8294c2cec4fce262474b3
(cherry picked from commit 88409c1576e7f690708957b2baa285fc7961e9d6)
2022-05-05 13:19:53 +00:00
dzdang
e2aa28a2d0 [quant][fx][improvement] Renamed default_affine_fixed_qparams_observer and default_symmetric_fixed_qparams_observer (#76637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637

The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers

The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`

Test Plan:
```
python test/test_quantization.py
```

```
python test/test_quantization.py
```

Differential Revision:
D36054169
D36054169

Reviewed By: vkuzo

Pulled By: dzdang

fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
2022-05-04 02:39:20 +00:00
Michael Suo
fb0f285638 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 20:51:34 +00:00
PyTorch MergeBot
3d7428d9ac Revert "[lint] upgrade mypy to latest version"
This reverts commit 9bf18aab94.

Reverted https://github.com/pytorch/pytorch/pull/76753 on behalf of https://github.com/suo
2022-05-03 20:01:18 +00:00
Michael Suo
9bf18aab94 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 19:43:28 +00:00
Vasiliy Kuznetsov
e155e2584a ns for fx: skip operator.add and operator.mul when shadowing (#76504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76504

Shadowing for add and mul is not implemented, this PR fixes the skipping
logic to also skip the `operator.add` and `operator.mul` flavor of these
operators.

Test Plan:
```
python test/test_quantization.py -k test_mul_add_skips_shadowing
```

Reviewed By: dzdang

Differential Revision: D35985997

Pulled By: vkuzo

fbshipit-source-id: f832e54a5461d3b182df4bb905357d6c66742e98
(cherry picked from commit 93ae9592f68873865ebfdc438bffb1c9486dd1c1)
2022-05-03 05:58:46 +00:00
Vasiliy Kuznetsov
385e5ba561 ns for fx: more meaningful error message when creating shadow model (#76468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76468

This makes the error message when copying an unsupported node more verbose.
This is useful to debug where specifically in a user model this is failing.

Test Plan:
1. hardcode this condition to hit
2. run NS tests
3. verify the exception now prints details about the offending node

Reviewed By: jerryzh168

Differential Revision: D35978652

Pulled By: vkuzo

fbshipit-source-id: 9cc93dfa46469bf6ef60aa38d4011041b6709df9
(cherry picked from commit c6e382c2a69aba6ba66740f238bc14446521a433)
2022-05-03 05:58:46 +00:00
Vasiliy Kuznetsov
04369f637c quant: rename _ObserverBase to UniformQuantizationObserverBase (#76461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461

Renaming as the old name was confusing. The name represents
better what this class is doing.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D35976350

Pulled By: vkuzo

fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c
(cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)
2022-05-03 05:53:54 +00:00
Vasiliy Kuznetsov
31d5a300ac quant: make RecordingObserver inherit from ObserverBase (#76460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460

`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.

This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D35976351

Pulled By: vkuzo

fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
2022-05-03 05:53:54 +00:00
lkct
9fae0762b0 fix typing in Module.state_dict and load_state_dict
Fixes #72707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73483
Approved by: https://github.com/albanD, https://github.com/jbschlosser
2022-05-02 17:27:54 +00:00
Jerry Zhang
b69d44daa5 [quant] Fix tensorrt config after the backend_config_dict refactor (#76414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76414

Previously we refactored FX Graph Mode Quantization code base to use a native backend config dict for fbgemmq/qnnpack,
because of this, we need to defien the backend config dict for tensorrt properly as well (previously it was relying on
fbgemm/qnnpack configs), this PR added some configs to enable uru10x10 again

Test Plan: buck run mode/dev-nosan -c fbcode.split-dwarf=true -c fbcode.platform=platform009 accelerators/workloads/models/uru10x10:uru_10x10_to_trt_eval -- --int8

Reviewed By: vkuzo

Differential Revision: D35939944

fbshipit-source-id: c64ade5074f5a8ee74a833bb990cd7a91c2cb152
(cherry picked from commit 02855a5ef8c196fb5b0defdfff58d6f2b94c693e)
2022-04-28 06:12:26 +00:00
Jerry Zhang
8326af0117 [quant] Fix TensorRT tests (#76148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76148

X-link: https://github.com/pytorch/fx2trt/pull/60

Recently, we landed some PRs to enable backend_config_dict by default in the quantization codebase, we also changes
the config to include "fused_module" for a pattern, but we didn't update tensorrt backend config dict,
this PR adds that configuration. Also adds the config for binary ops in TensorRT, since it was relying on fbgemm backend
config dict previously

Test Plan: Facebook internal tests

Reviewed By: andrewor14, frankgt40

Differential Revision: D35789709

fbshipit-source-id: 9dc93b9f454eff6baefb38c4c1567f88da2a1506
(cherry picked from commit 7d30e5ecbfd096c32cdb1b68abde394bcba45f94)
2022-04-21 17:27:05 -07:00
Vasiliy Kuznetsov
35545d85dc fx quant: add quantized Softmax workflow integration (#75106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75106

In https://github.com/pytorch/pytorch/pull/75017 a quantized softmax
kernel was added. This PR adds the FX graph mode quantization workflow
integration to swap `nn.Softmax` to `nnq.Softmax`.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_fixed_qparams_ops
```

Reviewed By: kimishpatel, andrewor14

Differential Revision: D35324817

Pulled By: vkuzo

fbshipit-source-id: 710ae3bedf8a6ad1dc411cd9808fdd0ce743e757
(cherry picked from commit d67603c0fbb1d3469d97bd538cec38aa8b03324b)
2022-04-20 21:54:26 +00:00
Jerry Zhang
74454bdb46 [quant][fx] Move backend_config folder to torch.ao.quantization
Summary:
Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented
the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different
workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that
it can be shared by different workflows
Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py)

Test Plan:
python test/teset_quantization.py TestQuantizeFx
python test/teset_quantization.py TestQuantizeFxOps
python test/teset_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestAOMigrationQuantization
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823

Approved by: https://github.com/vkuzo
2022-04-19 15:38:57 +00:00
Andrew Or
5dcbcc6de8 [Quant][fx] Fix get_default_qconfig_dict for fused modules
Summary: Calling `prepare_fx` with `get_default_qconfig_dict`
failed for models with fused modules, such as `ConvReLU2d`.
This commit fixes this by adding qconfig entries for ReLU
and BatchNorm as well.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_dict_with_fused_modules

Reviewers: jerryzh168

Subscribers: jerryzh168, vkuzo

Issue: https://github.com/pytorch/pytorch/issues/75825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75838

Approved by: https://github.com/jerryzh168
2022-04-15 22:37:26 +00:00
Jerry Zhang
0c08fcff32 [quant][fx] Cleanup some unused states and args
Summary:
* Removed "patterns" from observed module since it's no longer needed
* Removed an arg from insert_observer
* Removed some unused keys in checking the validity of qconfig_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75521

Approved by: https://github.com/andrewor14
2022-04-14 13:18:00 +00:00
Vasiliy Kuznetsov
f1f185f6f9 ns for fx: fix bug to enable again on torchvision models
Summary:

The tests were disabled by https://github.com/pytorch/pytorch/pull/61687, but
this specific behavior broke some time after while these tests were disabled.

The issue was that:
1. `torch.add` is present in these models
2. In the common codepath of comparing fp32 to int8, torch.ops.quantized.add was already filtered out because it did not have a dtype specified
3. In the less common codepath of comparing fp32 to fp32, torch.add was eligible for shadowing, but the logic was broken

This PR fixes (3) by disabling shadowing on ops which do not support it, by op type.
The support may be built later, if needed.

Test plan:

```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_mobilenet_v2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75472

Approved by: https://github.com/jerryzh168
2022-04-13 19:44:46 +00:00
Vasiliy Kuznetsov
ae3210420e ns for fx: fix issue with shadowing nodes of unknown dtype
Summary:

In https://github.com/pytorch/pytorch/pull/61687, a couple of FX Numeric Suite
tests were disabled.

This PR reenables one of these tests. We update the dtype inference logic
of NS to always return a specific type instead of sometimes returning
"fp32 or int8". When the type cannot be deduced by the current logic,
we do not shadow the node.

As a better version of dtype inference becomes available in FX Graph Mode Quantization,
we could migrate this code to use it.

Future PRs in the stack will unbreak other things to enable NS for FX to
work on torchvision again.

Test plan:

```
python test/test_quantization.py -k NumericSuite
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75471

Approved by: https://github.com/jerryzh168
2022-04-13 19:44:46 +00:00
Jerry Zhang
bc371a2cd0 [quant][fx][fix] Add additional checks when tracing back during maybe share output observer function
Summary:
Currently in `maybe_make_input_output_share_observers`  we trace back from a node to find the activation_post_process
of the input node, we have internal use case which would error out during tracing back, this PR is adding a guard
during this process to return False early when the node doesn't have any input

Test Plan:
not sure when this would happen, verify within the internal test case

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75650

Approved by: https://github.com/vkuzo
2022-04-13 00:33:49 +00:00
Jerry Zhang
761bb06292 [quant][fx] Use native backend_config_dict in convert
Summary:
Previously the list of qat modules, fused modules etc. are hardcoded in the convert code, in this PR we get these information
from backend_config_dict instead

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75520

Approved by: https://github.com/vkuzo
2022-04-12 17:59:24 +00:00
Jerry Zhang
f83d047338 [quant][fx] Use native backend_config_dict in prepare
Summary:
Previously we are still relying on the registration mechnism and get the default quantize handlers that are registered,
now we have moved all registration to backend_config_dict we can get all quant patterns just from backend_config_dict now.

This PR enables using native backend_config_dict everywhere in prepare when the backend_config_dict is None, we'll also
do similar changes in convert as well

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75469

Approved by: https://github.com/vkuzo
2022-04-12 17:05:31 +00:00
Jerry Zhang
72d3d160fb [quant][fx] Remove additional_object_mapping from the docs (#75389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75389

This seems to be removed before, so won't mark this PR as bc-breaking, this use case
is now enabled with backend_config_dict api

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451960

fbshipit-source-id: 21a8f19c1968af44bf4fa603f16ee8c6f5080e5a
(cherry picked from commit 2862f17b57f846b55736bc6b5d10df4256567adf)
2022-04-11 10:40:11 +00:00
Jerry Zhang
bcf6974c20 [qunat][fx] Remove "additional_fuser_method_mapping" key from prepare_custom_config_dict (#75388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75388

This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451958

fbshipit-source-id: 86e482d0782470ea02408836755cfc8531b8f66e
(cherry picked from commit 072541824b454e30df2b48758f465ebd814b436e)
2022-04-11 05:30:52 +00:00
Jerry Zhang
55d479aca5 [qunat][fx][bc-breaking] Remove "additional_qat_mapping" key from prepare_custom_config_dict (#75387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75387

This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451955

fbshipit-source-id: 77ede61f1d8f169dc1e1e6d847244ba99a97ab76
(cherry picked from commit 953576259fdc8827437acb6f5d04e584e37a7d64)
2022-04-11 05:03:49 +00:00
Jerry Zhang
f42bdff016 [qunat][fx][bc-breaking] Remove "additional_quant_pattern" key from prepare_custom_config_dict (#75386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75386

This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: ezyang

Differential Revision: D35451957

fbshipit-source-id: 52ebb5fb20cd96c1f21410b07c3d0c448c58cdba
(cherry picked from commit ccb38026f14644f9eb43335b7a7de5568c556394)
2022-04-09 16:43:41 +00:00
Jerry Zhang
f26891c8b7 [quant][fx] Using native backend_config_dict in fusion (#75378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75378

Previously we are still relying on the registration mechnism and get the default fusion patterns that are registered,
now we have moved all registration to backend_config_dict we can get all fusion patterns just from backend_config_dict now.

This PR enables using native backend_config_dict everywhere in fusion when the backend_config_dict is None,
we'll do similar changes for prepare and convert in the future, to fully enable backend_config_dict in
quantization code base.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451962

fbshipit-source-id: 31d51850c669e061b67d6d9e0efec994f7ea79ed
(cherry picked from commit 60cc2dcadce705a923f9279465e3fb0e8fddad48)
2022-04-09 16:12:28 +00:00
Jerry Zhang
689cec9493 [quant][fx] Remove "additional_fusion_pattern" from prepare_custom_config_dict (#75377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75377

This is in `prepare_custom_config_dict` but we never talked about them before, and we didn't find use cases internally
So it should be OK to remove.

We can now serve the same use case with `backend_config_dict` api

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35451961

fbshipit-source-id: 8a44c4518eecd50fab7ea2ff06697527b1cdb049
(cherry picked from commit 964183ed26bd8f367a4cf7fcc991eb519dc31a58)
2022-04-09 05:31:19 +00:00
Jerry Zhang
dd667b6e97 [quant][fx] Move all fusion registrations to backend_config_dict (#75318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75318

This PR moves the registrations for fusion patterns to backend_config_dict

Also fixed one issue in numeric suite graph matcher, since now (torch.nn.ReLU, torch.nn.BatchNorm3d)
would appear in quant patterns, (previously only in fusion pattern), and we need to match sure (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d))
can match before (torch.nn.ReLU, torch.nn.BatchNorm3d), but previously, it looks like (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d)) is not
really matched since `end_node_matches_reversed_fusion` is expecting a flattened pattern like (torch.nn.ReLU, torch.nn.BatchNorm3d, torch.nn.Conv3d),
for now we'll manually flatten this pattern, but in the future I think we might want to use the matching function `is_match` under torch.ao.quantization.fx.match_utils
to do this matching.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: vkuzo, andrewor14

Differential Revision: D35423788

fbshipit-source-id: a54093ccebae9c59aeee9399669ddb2c48bfb9aa
(cherry picked from commit 6a55ea8eb2740cedafb9972888fedf68e927586d)
2022-04-09 05:08:37 +00:00
Andrew Or
0bdf9a9833 [Quant][fx] Decouple prepare_*fx from training/eval modes (#75401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75401

This commit removes asserts that require prepare_fx to
be run in eval mode and prepare_qat_fx to be run in training mode.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_prepare_mode

Imported from OSS

Reviewed By: vkuzo, jerryzh168

Differential Revision: D35457100

fbshipit-source-id: 13a55b13d9e389991f69c06c6a70bc51cdebba36
(cherry picked from commit fb0685e0873dc8e807da3213be403b51e8b4a687)
2022-04-08 15:34:08 +00:00
Jerry Zhang
e4817e6c13 [quant][fx] Move embedding ops to backend_config_dict (#75317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75317

att, similar to previous PRs, this one moves dynamically quantized rnn ops
to backend_config_dict

we have some temporary configs in backend_config_dict, but it will be removed soon, we want to migrate
everything to backend_config_dict so that we can enable this path for all the code in the code base, starting
from prepare, then to convert. We can start this process after this PR

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35423789

fbshipit-source-id: 9391bde6f4cbceb45de4ce9aaee136c9bfde8ab7
(cherry picked from commit 909edb9f131e9ba047b49d51a6c300da77988cb3)
2022-04-08 15:10:12 +00:00
Jerry Zhang
9905b1f29a [quant][fx] Move rnn ops to backend_config_dict (#75316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75316

att, similar to previous PRs, this one moves dynamically quantized rnn ops
to backend_config_dict
Currently the dtype check is not yet enabled, so we provided the dtype_configs but it is not really used yet,
we will enable it a bit later after we moved everything to backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: malfet

Differential Revision: D35423792

fbshipit-source-id: ef862ea1be5bfb4c28130775c3b2158df28d3e22
(cherry picked from commit 0247f3a768a2c165f482a66c4225b3357e33e966)
2022-04-08 08:58:50 +00:00
Terry Chen
37dea0454d [quant] add checking number of args when checking observer in same graph (#75460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75460

add checking for number of args checking observer in same graph

Test Plan:
python3 test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: malfet

Differential Revision: D35479504

fbshipit-source-id: d7dc38a27fdf8e0b236b6976d484b0701c61184c
(cherry picked from commit 45542f796f5e6f6259f3ec647dbd2a9fa69ababc)
2022-04-08 03:56:03 +00:00
Charles David Hernandez
8ac4729105 [ao][sparsity] Composability of fusion and sparsity (#74847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74847

Similar to the other PRs in this stack, the main problem was
that fusion needed to detect the original module type of parametrized
module when sparse prepare was called before fusion. In addition, there
was a potential issue with fusion before sparse_prepare but after the
sparse_config is created. However, in practice fusion moves the
references to the original modules into the fused module without issue.
Thus the original sparse_config that pointed to the original modules
gets automatically updated. If the fusion method changes this may cause
an issue since no explicit handling or updating of these pointers was
needed.

Test Plan:
python test/test_ao_sparsity.py TestComposability

Imported from OSS

Reviewed By: vkuzo, andrewor14, jerryzh168

Differential Revision: D35240273

fbshipit-source-id: 62ed66689b285c3fa68f1e149266ab877f1cdd8e
(cherry picked from commit 2adb002c43f702fa1f18637157264fcbc545002a)
2022-04-08 00:44:12 +00:00
Jerry Zhang
7f0d79625b [quant][fx] Move output share qparam with input ops to backend_config_dict (#75315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75315

att, similar to previous PRs, this one moves the ops whose output tensor shares qparams with input
to backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: malfet

Differential Revision: D35423791

fbshipit-source-id: b24efc31b79bf6f0c98709a760fb9fba55610c0a
(cherry picked from commit 7535d8da1d052b490566ee60eebce26e68f35ea2)
2022-04-07 20:13:54 +00:00
Jerry Zhang
e167244aa4 [quant][fx] Move the remaining fixed qparam ops to backend_config_dict (#75314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75314

this is a refactor to use backend_config_dict for operators with fixed quantization parameters
api is not final yet, we'll update the api after we moved everything to backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35423790

fbshipit-source-id: a69ce19340e2e3c996f1435b887ba122de85f22f
(cherry picked from commit 5d35983a3bac4281f8636f69ffb68adb358e9a5f)
2022-04-06 16:11:14 -07:00
Jerry Zhang
7adf59a3bd [quant][fx] Add BatchNorm ops to backend_config_dict (#75260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75260

This PR adds bn2d, bn3d and corresponding bn - relu to backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35403587

fbshipit-source-id: 0611acd54478030ca8a6dc08e8552b8e04e1777c
(cherry picked from commit 1f6e83dc51810aaafbc0a45812d210f6fe2112ed)
2022-04-06 16:11:13 -07:00
Jerry Zhang
2f3a94996c [quant][fx] Add cat to backend_config_dict (#75259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75259

att

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35403586

fbshipit-source-id: 066ace239a7ca5a49463f6fcc2fa10e3efef8794
(cherry picked from commit e4b7d91cc48f3a4c913940bf292272c5418c5cb0)
2022-04-06 16:11:13 -07:00
Jerry Zhang
86485f61c5 [quant][fx] Remove the remaining registrations in BinaryOpQuantizeHandler (#75258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75258

att, the remaining registrations are for fp16 ops which are no longer used

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35403588

fbshipit-source-id: fc328d42f4cb80901ed545a11fdde49ee7ff8b2e
(cherry picked from commit fbe2db090cf8d1221dd37d19636058d8dd44c728)
2022-04-06 16:11:13 -07:00
Jerry Zhang
53f7233004 [quant][fx] Move all binary op configs to backend_config_dict (#75241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75241

We have a previous PR that enabled operator.add in backend_config_dict, this
PR moved the rest binary ops to backend_config_dict.
There are some ops left, which are not needed (previously fp16 ops), we
will move them in the following PR

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
 python test/test_quantization.py TestFXNumericSuiteCoreAPIs

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D35403589

fbshipit-source-id: 663703b310944a6b7c5ade6d07a4d938a6ca082b
(cherry picked from commit 5a76ce031872c4fed5fcab5bb3c84a9394b01118)
2022-04-06 16:11:13 -07:00
Jerry Zhang
ff7051781f [quant][fx] Remove Standalone and CustomModule QuantizeHandler type checks in prepare (#75202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75202

Instead of checking the type we use a method in the QuantizeHandler to check if a module
is a standalone or custom module, not user facing

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D35379641

fbshipit-source-id: c2f970c7e27f74793fa67f8fd5a16a43525e35aa
(cherry picked from commit 251500f06359c9046dd9067543cc80be24ddee33)
2022-04-06 17:47:33 +00:00
Jerry Zhang
a90bcd2066 [quant][fx] Support override observers and fake quantize module in backend_config_dict (#75135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75135

Some operators have fixed quantization parameters, this PR adds the support to override the
qconfig in the backend_config_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35334279

fbshipit-source-id: 390510bd8fc2d61004c36c54390989583e6519ce
(cherry picked from commit ccf9bcd7eb4564ec97c5e0548b8ee926f640360b)
2022-04-06 07:00:32 +00:00
Jerry Zhang
9817875729 [quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict (#74882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74882

This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo, andrewor14

Differential Revision: D35236032

fbshipit-source-id: 7077f3ccee8a5d8d19b40107cf8ff16cceafc535
(cherry picked from commit a6f7a37f99fc727269d022d35cc5c0157b70c656)
2022-04-06 06:42:03 +00:00
Charles David Hernandez
9bb21fac95 [ao][sparsity] make sparsity compose with PTQ convert (#74846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74846

This PR primarily allows the PTQ convert function to work with
parametrized modules. Given that the parametrized weight is what is used
by default in convert, as long as sparsifier.step() has already been
called, the converted model will use the sparisified weights. There is
currently no way to handle things if sparsifier.step() has not been
called. Lastly, added the is_leaf_or_only_parametrized function because
parametrized modules no longer look like leaves due to the
parametrizations module attached to them

Test Plan:
python test/test_ao_sparsity.py TestComposability

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35240275

fbshipit-source-id: 48529f2a83edfe6d8a2d2dff8ca3d08a3fb0d553
(cherry picked from commit 9d6361482e2885db964e02b0222cd23c9f4d469e)
2022-04-06 04:27:16 +00:00
Jerry Zhang
23bcab19a9 [quant][refactor] Refactor find_matches for easier future extension (#74878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74878

Previously we record the matched node as a list of nodes: `List[Node]`, this does not generalize
to a graph, which is needed for future use cases, in this PR we changed the recorded node as
NodePattern instead, currently defined as
```
NodePattern = Union[Tuple[Node, Node], Tuple[Node, Tuple[Node, Node]], Any]
```
but can be more general.

This will allow us to support more general patterns with backend_config_dict api, and is also needed
for BinaryOpQuantizeHandler refactor

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35203616

fbshipit-source-id: f4bf5b056cfc0955455eea9c2bf1ac9f6dde3974
(cherry picked from commit b290c047e1861bbb62fb1bb576761e801b210220)
2022-04-05 06:53:35 +00:00
Charles David Hernandez
02e30a09f7 [ao][sparsity] make sparsity and PTQ compose (#74845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74845

This PR adds support for quantization flow to detect
parametrized modules and match them using their original module types.
This mainly involved using the new type_before_parametrizations function rather than
type to check for module mathcing

Test Plan:
python test/test_ao_sparsity.py TestComposability

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D35240274

fbshipit-source-id: 7294d89c9c2e069e51d8b9bafa45c15f92bed124
(cherry picked from commit ed5cdb7b636c42e040d1b4a67b6b94604d06e1ff)
2022-04-05 03:35:41 +00:00
Jerry Zhang
88edc21828 [quant][fx] Fix lowering pass for cases when to is not called with positional args (#75146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75146

Previously we assume `to` must be called with positioanl args, but this may not be the case,
e.g. we can do `to(dtype=?)` or `to(memory_format=?)`

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: ejguan

Differential Revision: D35342088

fbshipit-source-id: 22bfe78ae84e74141ae6560285c5c38bc068c999
(cherry picked from commit a3593c0bb658a4615559c951ee68c9a6f55074d5)
2022-04-04 23:52:15 +00:00
Andrew Or
ee9335a608 [Quant][fx] Define native backend_config_dict for linear and conv (#74636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74636

This commit changes how quantization patterns for linear
and conv are set up in prepare. Previously, these were set up
through ConvReluQuantizeHandler and LinearReLUQuantizeHandler.
After this commit, however, these were set up through the
corresponding entries in the native backend_config_dict,
rendering the above quantize handlers no longer necessary.
In future commits, we will do the same for the remaining ops.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: jerryzh168, ngimel

Differential Revision: D35225680

fbshipit-source-id: 4a79f63a11fce46701eb17aaf3619c1e827d72a4
(cherry picked from commit 475f599821cd32d3ba71ba086885ecdc4cbee755)
2022-04-04 14:07:15 +00:00
Jerry Zhang
bd032cd8d6 [quant][fx] Remove is_output_quantized from QuantizeHandler (#74843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74843

is_output_quantized is used to check if we should quantize the op based on the dtype configuration in qconfig and what
is supported by the backend, we'll skip inserting observer if the dtype configuration is not supported by the backend,
this is now supported by backend_config_dict, and we can remove this function now.

Also we previously supported fp16 static quantization for some ops for one of our internal use case, and now it is not required, so
we can remove them

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35190541

fbshipit-source-id: 623d961810737ec01e1f8b269ec48a6a99bb284a
(cherry picked from commit a405998c60c0146dbd5feef60e2d5cb3b0aa289c)
2022-04-02 16:21:54 +00:00
Jiaxu Zhu
8a7c9a5e01 [quant] Always match the first matchable pattern in fuse (#75047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75047

As title, For instance,

We match two patterns
```
(add, (bn, conv), matchallnode)
(add, matchallnode, (bn, conv))
```

Against the model
```
conv1 -> bn1 |
conv2 -> bn2 + add
```

For the add node, both two patterns passes `is_match` and `apply_match` is executed twice. As a result, both `conv1 -> bn1` and `conv2 -> bn2` will be matched as `(bn, conv)` instead of one `(bn, conv)` one `matchallnode`.

To fix this, stop trying all the other pattners once a pattern is matched.

Test Plan: verified in D35252100

Reviewed By: jerryzh168

Differential Revision: D35300191

fbshipit-source-id: 383b2eb971d436072e1c28597c5b6a01d0f49c5a
(cherry picked from commit 89d08ea2d2840e01ec3dd40da3f58405577c78fc)
2022-04-01 17:22:36 +00:00
Nikita Shulga
0b845bb645 Revert D35258695: [quant][fx] Cleanup unused to_fp16 check code in lowering
Test Plan: revert-hammer

Differential Revision:
D35258695 (ec6f767097)

Original commit changeset: 2297696493fe

Original Phabricator Diff: D35258695 (ec6f767097)

fbshipit-source-id: 8c2d608f3c585b8c00275f64a82478cb1af25b50
(cherry picked from commit b0b5f4f3e3414cb538ab35bc2968d39408157c7b)
2022-03-31 10:55:55 +00:00
Terry Chen
b82df92c33 [quant] Fix qmin/qmax when using customized qrange (#74717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74717

currently the weight map to 0 and max_float to 65535 due to incorrect qmin/qmax in qin16 customized qrange
the expectation from the set observers is the integer representation is supposed to be a signed int16 i.e -32768 to 32767.

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D35129924

fbshipit-source-id: 924902dd7e64c1218971422ba2451c2a484fd2f4
(cherry picked from commit 95659cdeeec7b3a01a64355244847e211c6dd2a6)
2022-03-31 07:49:17 +00:00
Terry Chen
3c701468dc [quant][ns] Fix ns tool bug for mobilenetv2/v3 (#74149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74149

mobilenet v2/v3 failed when using ns tool to analysis the model
due to the empty the tensor, fixed it by filtering the empty tensor

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34851886

fbshipit-source-id: db94fd5cef7d4a7a128d46bfe3f5ff4e532845fe
(cherry picked from commit 4616a75105abf187a178d95165249cd33345515d)
2022-03-31 05:31:51 +00:00
Jerry Zhang
ec6f767097 [quant][fx] Cleanup unused to_fp16 check code in lowering (#74969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74969

We can remove the check for fp16 ops now since we confirmed that fp16 ops are not
used

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35258695

fbshipit-source-id: 2297696493feb62a4c959e7fbdd6123f59615ef1
(cherry picked from commit a1b4658e661ce610e264e083dfa738c31859ec1a)
2022-03-31 04:25:43 +00:00
Charles David Hernandez
bf091f78a6 [AO][bugfix] Fixing FX QAT but for untraceable modules (#74277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74277

see issue: https://github.com/pytorch/pytorch/issues/74240

this fixes that issue by skipping the children of untraceable modules during
propagate_qconfig. This required extending said function to take the
prepare_custom_config_dict as an optional argument.

Test Plan:
python test/test_quantization.py
python test/test_quantization.py TestQuantizeFx.test_qat_skip_untraced

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34916074

fbshipit-source-id: 11caba2cbf78566fb51adf698b01bbba0275de28
(cherry picked from commit 5324c48e4c3277bb12a716a4408151c86006ee47)
2022-03-30 15:08:45 +00:00
Jerry Zhang
5f94eea495 [quant][fx] Remove input_output_observed from BinaryOpQuantizeHandler (#74776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74776

when both inputs are scalars, fx tracing will directly calculate the result, instead of generating an op in the fx graph
so num_tensor_args will always be greater than 1 for binary ops, so the input_output_observed will always return True
for BinaryQuantizeHandler

We will remove input_output_observed method after dynamic quantization in qconfig is properly supported

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: albanD

Differential Revision: D35153531

fbshipit-source-id: fa777429eeb64a6a78a98f8d8dcd9e0903c8b209
(cherry picked from commit 676becb650daf29977dbfeb8307de1b19a8d9243)
2022-03-28 16:41:21 +00:00
Jerry Zhang
550d50ed0a [quant][fx] Remove should_insert_output_observers (#74775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74775

We have simplified the way we insert observers, for add_scalar it now behaves the same way
as general_tensor_value ops, which means we only need to keep is_general_tensor_value_op now,
the other methods can be removed

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35153532

fbshipit-source-id: 2d17189e167a9932bdbf5ae46b3ced25b7128c2f
(cherry picked from commit 7cf7c8a522171f58954b227917e5c75cdfdddb1c)
2022-03-28 16:22:45 +00:00
Andrew Or
ea2d58a3df [Quant][fx] Refactor lowering code (part 2) (#74619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74619

This commit is part 2 of the effort to refactor the
lowering code in _lower_to_native_backend.py. The main change
included in this commit is generalizing the pattern matching
code across different lowering functions. There should be no
change in behavior with this PR.

A future commit will further merge the static and dynamic
lowering code paths.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D35082210

fbshipit-source-id: 7f0347c9449cc9ca68fee5a807c792222f0d1749
(cherry picked from commit 16d34c13c7eb0553680713878b52ece9c8884a1f)
2022-03-28 14:40:31 +00:00
Jerry Zhang
0747bdbf11 [quant][fx] Removing more unused code (#74603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74603

att

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35071546

fbshipit-source-id: 273a7f0cb2a8f306864eb118916056fad3bb1399
(cherry picked from commit 9c31a50a2bccb2e5b7a5db833085a75e5ebda707)
2022-03-25 16:39:48 +00:00
Jerry Zhang
66e07f2aef [quant][fx] Merge is_general_tensor_shape_op into is_general_tensor_value_op in QuantizeHandler (#74601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74601

Currently the behavior for general tensor shape op and general tensor value op are the same, so we can remove
this flag and merge with the is_general_tensor_value_op flag.

is_general_tensor_value_op flag is used in two places in prepare:
(1). dtype propgation: we only do dtype propgation when this flag is true (this will be refactor in the future to be more systematic)
(2). observer sharing, we'll use the input observer instance as output observer for an op if this flag is True

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: george-qi

Differential Revision: D35071438

fbshipit-source-id: 5e8f5fd84e37db0433a63fe0a0e212ce3c5908d6
(cherry picked from commit b4bbc9fa0e65f3768eb97ca8e84b7cbd7e840b67)
2022-03-25 11:10:44 +00:00
Jerry Zhang
b347b8c191 [quant][fx] Support some default ops in the native backend config (#74600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74600

Following https://github.com/pytorch/pytorch/pull/74210, this PR adds the support for some ops
using the DefaultNodeQuantizeHandler in the backend_config_dict defintion for pytorch native backend

TODO: There is still a few ops we didn't handle with backend_config_dict path: gelu and softmax, need to discuss if we still need them, if so we can change the test
to use backend_config_dict and remove the DefaultNodeQuantizeHandler after that

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35071437

fbshipit-source-id: 70351d2810ca1ac7dc09d4a9c239f6757ccb51ca
(cherry picked from commit 5e68f755a32ba7d90d6c73db9c2017f9c58d7fa5)
2022-03-25 02:59:36 +00:00
Jiaxu Zhu
7c1f3cc89e [quant] Populate FakeQuantize quant_min/quant_max to observer (#74581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74581

As title, currently the quant_min/quant_max of the FakeQuantize are not populated to the observer. We plan to populate when they are both not None.

To do this we need to do
1. Remove the current default quant_min/quant_max value (0/255) as it's not universal for various dtype.
2. Move the upper bound/lower bound check before creating the observer.

Test Plan:
```
[jiaxuzhu@devvm3400.frc0 /data/users/jiaxuzhu/fbsource/fbcode] buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize)'
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 9.5 sec (100%) 18535/84579 jobs, 2/84579 updated
  Total time: 10.3 sec
More details at https://www.internalfb.com/intern/buck/build/1cab97ef-0788-4d06-92ed-a828995e3bde
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 24be645e-eebc-45d6-8111-052ef1225fa0
Trace available for this run at /tmp/tpx-20220323-094106.724238-24be645e-eebc-45d6-8111-052ef1225fa0/trace.log
RemoteExecution session id: reSessionID-24be645e-eebc-45d6-8111-052ef1225fa0-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
    ✓ ListingSuccess: caffe2/test:quantization : 483 tests discovered (20.179)
    ✓ Pass: caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize) (18.896)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
```

Reviewed By: jerryzh168

Differential Revision: D34971236

fbshipit-source-id: 4407fd03116a296053256b333f7ce6d28dcc9c42
(cherry picked from commit f6980bccea802f220cc5b6dfe1bf3a3a3eef0a34)
2022-03-24 18:23:40 +00:00
Digant Desai
09f32eba7a [quant] Add default symmetric qat qconfig for qnnpack (#74507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507

* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.

Reviewed By: jerryzh168

Differential Revision: D34804808

fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
2022-03-24 16:19:28 +00:00
Jerry Zhang
93a1068d09 [quant][fx] Relax the constraint for input of custom module nodes (#74510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74510

Previously we require the dequantize before custom module to have one user, this is because we are removing the dequantize node
before custom module while we transform an observed custom module to a quantized custom module, but actually we don't need to remove it,
we can just change the input of custom module with quantize node instead. If the dequantize node only has one user, it will be removed
by the dead code elimination pass that was added recently.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_custom_module_class_input_has_multiple_users

Imported from OSS

Reviewed By: dzdang

Differential Revision: D35034626

fbshipit-source-id: eea9fbf9fb34c61f114c6431377be347632ce36d
(cherry picked from commit 2878085a56bc529afef5e533bc5f49079d4adc52)
2022-03-23 18:50:49 +00:00
Jerry Zhang
e9776fe58c [quant][fx] Support conv1d and its fusion variants in QAT (#74506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74506

This PR supports qat Conv1d, ConvBn1d, ConvBnReLU1d, ConvReLU1d in qat in FX Graph Mode Quantization

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_bn_relu

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D35032995

fbshipit-source-id: 645da33f0d893aa44f35ee1384fd1539a9c788e7
(cherry picked from commit 6b583baa74c5a4fd2f50270d633f277e2fc94716)
2022-03-23 18:43:53 +00:00
Jerry Zhang
56f218edb0 [quant][fx] Remove unused method from QuantizeHandler (#74408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74408

Removed
* should_mark_output_quantized_from_input_quantized_status
* _maybe_get_last_node_only_observer

since they are used in the previous convert code, which arep no logner needed

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34984301

fbshipit-source-id: 0c46126576bd4ef633f4de530d01364e68f7ed39
(cherry picked from commit d14d094c4de308f08181920cd0611ea1bc664605)
2022-03-22 07:51:32 +00:00
Jerry Zhang
ae23ad19f8 [quant][fx] Cleanup quantization_patterns.py (#74407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74407

The convert method of QuantizeHandler is no longer used after the convert refactor, this PR removes them

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34983830

fbshipit-source-id: cf9a6a19bd0ae035ba33497eecf74e98658dd5c7
(cherry picked from commit d85eb0f77513ef5f5f10543df6dec8b65b4985a3)
2022-03-21 18:46:02 +00:00
Jerry Zhang
b86554abed [quant][fx] Fix dynamic weighted op lowering when input is used multiple times (#74364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74364

if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Imported from OSS

Reviewed By: yixin94

Differential Revision: D34952755

fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3
(cherry picked from commit 8a6896801fdd96a55476faca4ccb7ba0b0bdb058)
2022-03-18 23:09:33 +00:00
Jerry Zhang
dbf43d621d [quant][fx] Only do reference moduel swapping for floating point fused modules (#74231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74231

Add a check to make sure the weighted modules we swap is actually a float fused module,
since the reference fused module like reference version of linear - relu would have the same
fused type as the floating point linear - relu (and the linear submodule will have different types)

Test Plan: phabricator diff for now, can add a test case after we know exactly what the problem is

Reviewed By: andrewor14

Differential Revision: D34888290

fbshipit-source-id: a7f53368a7c17f7d1a82afaa50d14d569b4923df
(cherry picked from commit 458dac9fdf8b4f0d786bf9c815c2f2fe8df13bb4)
2022-03-18 22:20:16 +00:00
Digant Desai
cfe1a41b01 [quant] Add default symmetric qconfig for qnnpack (#74396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396

# New qconfig `default_symmetric_qnnpack_qconfig`

Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.

## Restrictions on weights

Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.

This is driven, in part, by the desire to achieve better performance by XNNPACK ops.

## qengine/backend = `qnnpack` and XNNPACK ops

Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.

## Updated EPS value:
* From PyTorch:

eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`

* Requirement from XNNPACK

For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)

* New minimum allowed scale value

With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,

```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```

With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.

Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.

* Impact on accuracy is unclear as of writing this.

Reviewed By: kimishpatel

Differential Revision: D34625300

fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
2022-03-18 13:42:41 +00:00
Jiaxu Zhu
dc0c94910f [quant] Don't regard MatchAllNode as node matched (#74198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74198

As title, currently in the (add, X, MatchAllNode) pattnern, the node matched with MatchAllNode is regard as part of the pattern instead of the input. As a result, the possible patterns ends with that node will not be matched.

For instance, we have two patterns
1. (nn.ReLU, (torch.add, MatchAllNode, (nn.BatchNorm2d, nn.Conv2d)))
2. (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))

And we wanna fuse the following model

Conv2d -> BatchNorm2d -> ReLU +
Conv2d -> BatchNorm2d ------ Add -> ReLU

The pattern in the first row cannot be matched becaues the end node ReLU is recorded as MatchAllNode already.

Test Plan:
new unit test
```

[jiaxuzhu@devvm3400.frc0 /data/users/jiaxuzhu/fbsource/fbcode] buck test mode/dev //caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_fusion_pattern_with_matchallnode (quantization.fx.test_quantize_fx.TestFuseFx)'
Parsing buck files: finished in 0.9 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 12.6 sec (100%) 18546/84011 jobs, 2/84011 updated
  Total time: 13.5 sec
More details at https://www.internalfb.com/intern/buck/build/9d2decdb-d01e-4332-84f5-1728a65d4f7b
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: d92e10b8-9209-4e9e-95a6-2fcac02db251
Trace available for this run at /tmp/tpx-20220314-161230.347672-d92e10b8-9209-4e9e-95a6-2fcac02db251/trace.log
RemoteExecution session id: reSessionID-d92e10b8-9209-4e9e-95a6-2fcac02db251-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3377699814955263
    ✓ ListingSuccess: caffe2/test:quantization_fx : 365 tests discovered (19.275)
    ✓ Pass: caffe2/test:quantization_fx - test_fusion_pattern_with_matchallnode (quantization.fx.test_quantize_fx.TestFuseFx) (17.760)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3377699814955263
```

Reviewed By: jerryzh168

Differential Revision: D34873730

fbshipit-source-id: dc78455c7233ba33e9ab215f50754b1656b7dbc7
(cherry picked from commit 1cc74cadd7dc725be97064f57c910ef9d1bbe1a8)
2022-03-17 20:12:35 +00:00
Jerry Zhang
975c9f15bd [quant] Rename _convert_do_not_use.py to convert.py (#74322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74322

att, also change all references to _convert_do_not_use

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestAOMigrationQuantizationFx

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34936430

fbshipit-source-id: c96fb887847383bf47f0ec4219127e96e2b63b2d
(cherry picked from commit 8ad5a9e031e6ca4ede2656d9b2f7906a82b57c1c)
2022-03-17 18:57:08 +00:00
Jerry Zhang
a6bed4deaa [quant][fx] Remove convert.py since it is not used now (#74276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74276

Removing convert.py since we have rerouted the traffic to _convert_do_not_use, we'll do a rename in the follow up PR

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34914261

fbshipit-source-id: 09ad520d95fa91c525222a69474930efb3571088
(cherry picked from commit 8aeb33206f3572132356fe78395aa3ce6aff11cd)
2022-03-17 18:57:08 +00:00
andrewor14
a705486915 [Quant][fx] Refactor lowering code (part 1) (#74128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74128

**Summary:** This commit is the first step towards refactoring the
lowering code in _lower_to_native_backend.py. The main changes
included in this commit are:

(1) Remove the use of the subgraph rewriter in lowering
(2) Replace the use of `is_match` with manual pattern matching

The motivation behind (2) is it simplifies the lowering code
significantly; previously we had many different but similar
patterns for slightly different models. There should be no
change in behavior with this PR.

Note that this is only part 1 of the refactoring. Part 2
will merge the static and dynamic lowering code paths
and refactor the currently duplicate pattern matching /
cleanup code into common helper functions.

**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

**Reviewers:** jerryzh168

**Subscribers:** jerryzh168

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34910597

Pulled By: andrewor14

fbshipit-source-id: c6fea0c538ce5efc5afaf53e072922528988dda7
(cherry picked from commit fa05cb9fc0909fe6e199a6b50ea2001c9e9ac0ee)
2022-03-17 03:30:22 +00:00
Charles David Hernandez
c1d070d0f0 [ao] Fixing obs insertion through dtype propagation (#73274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73274

As noticed in https://discuss.pytorch.org/t/calibration-of-model-in-post-training-static-quantization-using-fx-api/143661/6
and related to https://github.com/pytorch/pytorch/issues/72698 when using fx quantizaiton, if an op like view was used in a
model and the index parameters were passed in to the ops with a
variable rather than
hard coded, fx would mistakenly insert observers for them, leading to an
error when the observer tried to do tensor only operations on a
non-tensor. To fix this, an API was added to specify non tensor
arguments for various ops to enable better dtype propagation.
NON_TENSOR_ARG_DICT is a nested dict whose first key is a named tuple
which contains matching parameters for ops with nontensor args, the
inner dict's keys are dtypes and the values are a list of those arg indices that
take use such dtypes. Alternatively, instead of a list, the inner dict
value can also be a function that takes the node as an argument and
returns the list of arg indices.

Theoretically this api can support arbitrary functions but the current
implmentation is limited to simpler functions given the particular
issue this fixes seems to be rare.

Note: although torch.unsqueeze and torch.transpose are listed in
quantization_patterns.py, those ops appear to be untraceable by fx. I've
included tests for their cases but fixing this issue is beyond the scope
of this PR

Test Plan:
python test/test_quantization.py test_non_reference_size
...
python test/test_quantization.py test_non_reference_<op>

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34410122

fbshipit-source-id: fc09949ca8a2d6473876a4b6c214eb91e9a9dae2
(cherry picked from commit 3a1375d677b7c98d62b1f5c839645698c39b32b9)
2022-03-16 01:41:17 +00:00
Jerry Zhang
ca4348f628 [quant][fx] Allow incrementally remove the items in quantization_patterns.py (#74210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74210

This PR added a codepath for getting patterns (quantize handlers) for the backend_config_dict for native backend when
backend_config_dict is None. This would allow us to incrementally define the backend_config_dict for
pytorch native backend and gradually remove the entries in quantization_patterns.py

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: dzdang

Differential Revision: D34899783

fbshipit-source-id: 7f31292948d7fc4566e51e175b41511f52d0a880
(cherry picked from commit a9f6ebd6478f362d5bb9c5ae04e02369e00f550c)
2022-03-16 00:20:52 +00:00
Jerry Zhang
9a0b7b4723 [quant] Fix implementation for output_quantized_idxs in convert (#74140) (#74229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74229

Previously we did not successfully remove the dequantize node for `dict`, this PR fixes that, tested with
meta-only tests right now but we should follow up with oss tests (with dict output)

since we called dead code elimination pass, some of the inplace operators are removed in the TestQuantizeFx.test_fixed_qparams_ops,
in this PR we also just removed the calls to the inplace ops, and changed the expected results in the test case,
in the future PR we can remove the support for inplace operators, since it is not really supported in fx, and it's OK
for us to skip them as well

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34888140

fbshipit-source-id: 48cea842b49e52baa8eee3ce0f4bfb4a3625ab2a
(cherry picked from commit ef790315ebcf954930deb6b9d1c384992c1f1ec8)
2022-03-16 00:00:13 +00:00
Natalia Gimelshein
1e64c8a8e3 Revert D34846005: [quant] Fix implementation for output_quantized_idxs in convert
Test Plan: revert-hammer

Differential Revision:
D34846005 (a7f9fb997a)

Original commit changeset: 4313ed6adff4

Original Phabricator Diff: D34846005 (a7f9fb997a)

fbshipit-source-id: c5719b0ad2514277b6ea026cbc0153613bf52d0c
(cherry picked from commit 84ed43a6d185879209bedca8e8ed8dc5b0a24ded)
2022-03-15 05:04:31 +00:00
Jerry Zhang
a7f9fb997a [quant] Fix implementation for output_quantized_idxs in convert (#74140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74140

Previously we did not successfully remove the dequantize node for `dict`, this PR fixes that, tested with
meta-only tests right now but we should follow up with oss tests (with dict output)

Reviewed By: andrewor14

Differential Revision: D34846005

fbshipit-source-id: 4313ed6adff425d73ad19aabedde1200a98f1915
(cherry picked from commit 682abe9ecbd42c4ac1b41891bbc3b79ad522b78a)
2022-03-15 03:35:53 +00:00
Weiwen Xia
060f1b822a Add onednn quant backend (#74137)
Summary:
Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820
jerryzh168 Please review. Thanks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137

Reviewed By: samdow

Differential Revision: D34840477

Pulled By: jerryzh168

fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425
(cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)
2022-03-15 01:28:21 +00:00
Jerry Zhang
5a897536f3 Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend
Test Plan: revert-hammer

Differential Revision:
D33716039 (989b24855e)

Original commit changeset: 6f7bb807e857

Original Phabricator Diff: D33716039 (989b24855e)

fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0
(cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)
2022-03-11 22:06:25 +00:00
Xia Weiwen
989b24855e Add ONEDNN quantization backend (#69820)
Summary:
This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend

The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.

ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.
```python
torch.backends.quantized.engine = 'onednn'
```

## Design docs
https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983
https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096

## File changes
**Add ONEDNN to qengine list**
- aten/src/ATen/Context.cpp
- c10/core/QEngine.h
- torch/ao/quantization/qconfig.py
- torch/backends/quantized/\_\_init\_\_.py

**Implement qconv & qlinear for ONEDNN backend**
- aten/src/ATen/native/quantized/cpu/conv_serialization.h
- aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
- aten/src/ATen/native/quantized/cpu/onednn_utils.h
- aten/src/ATen/native/quantized/cpu/qconv.cpp
- aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp

**Skip tests that are not supported by ONEDNN**
- test/ao/sparsity/test_kernels.py
- test/quantization/core/test_quantized_module.py
- test/quantization/core/test_quantized_op.py

## Validation results
This PR has passed `test_quantization.py` and `test_mkldnn.py`.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)

**Table 1. Performance comparison of int8 2d convolution operator**
|No.|	Shape|	FBGEMM|	ONEDNN|	Gain|
|-|-|-|-|-|
|1|	IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	668.310us|	535.630us|	24.8%|
|2|	IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	290.630us|	281.810us|	3.1%|
|3|	IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.045ms|	893.010us|	17.0%|
|4|	IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	385.320us|	373.720us|	3.1%|
|5|	IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.876ms|	1.641ms|	14.3%|
|6|	IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	660.460us|	638.470us|	3.4%|

**Table 2. Performance comparison of int8 linear operator**
|No.|	Shape (m, n, k)|	FBGEMM|	ONEDNN|	Gap|
|-|-|-|-|-|
|1|	64, 800, 320|	80.550us|	96.770us|	20.10%|
|2|	64, 768, 512|	101.230us|	130.720us|	29.10%|
|3|	16, 256, 512|	30.230us|	51.450us|	70.20%|
|4|	128, 128, 128|	33.810us|	50.480us|	49.30%|
|5|	256, 512, 256|	154.490us|	195.050us|	26.30%|
|6|	1024, 1024, 1024|	3.134ms|	3.514ms|	12.10%|

ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820

Reviewed By: HDCharles

Differential Revision: D33716039

Pulled By: jerryzh168

fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd
(cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)
2022-03-11 20:31:49 +00:00
Jerry Zhang
7ddf212f33 [quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863

This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).

This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34778506

fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
2022-03-11 17:11:30 +00:00
Charles David Hernandez
39605a5632 [ao] Removing memoryless observer args for MovingAverage (#73947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947

The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.

TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver

Test Plan: Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34732080

Pulled By: HDCharles

fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
2022-03-11 00:21:49 +00:00
Terry Chen
4e6aefaf72 [Qunat] Refactor reference module mapping (#72755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72755

Add is_refernece flag in convert function

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Imported from OSS

Reviewed By: mruberry

Differential Revision: D34188856

fbshipit-source-id: 291014a7b3b4d4b40ca0ca76a80711097dcc4b58
(cherry picked from commit cfba3b8dc0373708712c0d847d590f0d587df002)
2022-03-08 06:48:04 +00:00
Andrew Or
f3c6e8f720 [Quant][fx] Add lowering for functional conv (#73708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73708

This adds functionality to lower reference models
involving functional conv in FX.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_functional_conv

Imported from OSS

Reviewed By: mruberry

Differential Revision: D34648870

fbshipit-source-id: d1c8afdb9787c36639d5ee5762ae71e7e8ab3769
(cherry picked from commit 7a28617faf4b8aad152076239927e94ed3f0169e)
2022-03-07 15:32:54 +00:00
Andrew Or
cedce3be20 [Quant][fx] Add lowering for Linear-Bn1d in QAT mode (#73509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73509

This adds functionality to lower reference models
involving the Linear-Bn1d pattern in FX QAT mode. This follows
https://github.com/pytorch/pytorch/pull/72431 and https://github.com/pytorch/pytorch/pull/72796, which add Linear-Bn1d fusion functionality
to eager QAT mode.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module

Imported from OSS

Reviewed By: dagitses

Differential Revision: D34591251

fbshipit-source-id: 39144485f9954ee1830c8b414e724560fd7e47bf
(cherry picked from commit b97a39b4d9df00e045fab4c01eca88e562ca2c02)
2022-03-07 15:32:54 +00:00
Vasiliy Kuznetsov
6f2dad24d3 ns for fx: add ability for fp16 model to shadow fp32 model (#73785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73785

The conversion from fp32 to fp16 is easily defined, we just did not
have it in NS code yet. This PR adds it.This is needed for some customer models.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_fp16_shadows_fp32
```

Reviewed By: jerryzh168

Differential Revision: D34642873

Pulled By: vkuzo

fbshipit-source-id: 9df505b1ea3f3d3cdb3a5f2409ef3a66f40b7eff
(cherry picked from commit 679cd8a5e24b1cfd7f871dcba3ce8a90de980556)
2022-03-05 00:27:48 +00:00
Terry Chen
5167e9d59d [quant][fix] Fix bug for ave pooling in FX quant (#73054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73054

Fix bug for ave pooling in FX quant

Test Plan:
python3 test/test_quantization.py TestQuantizeFxOps.test_ave_pool_with_custom_cfg

Imported from OSS

Reviewed By: george-qi

Differential Revision: D34334059

fbshipit-source-id: a2ddad4fa3abf250f5dc20486c966fff3a9098a6
(cherry picked from commit d0f6ea680427a454200735075d557fb0b145a625)
2022-03-04 23:29:18 +00:00
Jerry Zhang
8b8fac91bf [quant][fx] Refactor _convert_fx_do_not_use (#73777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73777

att, this is to prepare for the migration of current convert to this function

Test Plan:
regression tests to make sure the refactor doesn't break anything
internal only, since tensorrt tests are moved to a separate repo

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34636000

fbshipit-source-id: 9850904e3b834345abbeedc8bccaf107397db59d
(cherry picked from commit a8c87d4592237c247989e7419bb165c96b8e90db)
2022-03-04 18:29:36 +00:00
Vasiliy Kuznetsov
727debb18e dbr quant: enable reference module support for torch.qint32 (#73493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73493

This PR enables basic support for reference modules in DBR quant.
For now, the support is limited to:
1. modules that have reference versions defined only (no functions)
2. torch.qint32 dtype only

Currently, the reference module logic is enabled whenever dtype is
torch.qint32. This is done because this is needed the earliest for
the first use case. A future PR will support more dtypes and also
add the `is_reference` flag to the API.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_conv_int32_reference_model
```

Reviewed By: jerryzh168

Differential Revision: D34520759

Pulled By: vkuzo

fbshipit-source-id: 363db715315c5c7c20962a1818330ce288948778
(cherry picked from commit 6ccdfe2889c252211f191edc49f4147f66e803a4)
2022-03-04 17:35:31 +00:00
Vasiliy Kuznetsov
5787a36e30 dbr quant: insert activation obs explicitly, instead of relying on hooks (#73492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73492

Before this PR, DBR quant reused the Eager mode quantization machinery
to insert activation observers. This was done for speed of developing the
prototype. A drawback of this is that the activation observers are not
present in DBR's data structures and live on the modules instead.

This PR refactors DBR quant to stop using Eager mode quantization
observer insertion for activations, and instead create and track the
activation observers in DBR's data structures. This has a couple of benefits:
1. activation observers are now created the same way in DBR for modules and functions
2. we can remove some technical debt due to fixing (1)
3. this will make it easier to support reference modules in a future PR

The reason (3) is true is because the current design of reference modules
assumes that the activation observer lives on the framework (like in FX
graph mode quantization). This PR starts to adhere to that assumption.

Test Plan:
```
python test/test_quantization.py -k DBR
```

Reviewed By: jerryzh168

Differential Revision: D34520758

Pulled By: vkuzo

fbshipit-source-id: 2f6448dce021024cb2fa112d8691c94128c43123
(cherry picked from commit cfc1a0eaf6579cea2c710c1c2b4c86d28ee799eb)
2022-03-04 17:35:31 +00:00
Haixin Liu
3042f0ce22 [NS] Mark output logger impure to avoid being removed in acc tracer (#73745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73745

Mark output logger as impure, which will help prevent it and the shadow ops from being removed in acc tracer.

Test Plan: Tested in N1611591

Reviewed By: jerryzh168

Differential Revision: D34616990

fbshipit-source-id: ccc93e30f9cbf3eb69f49fc2d0f02fd4d083c507
(cherry picked from commit e40fcbd1bc543eb64fa692776c34f26e2a0a05ff)
2022-03-04 11:30:10 +00:00
Jerry Zhang
f5c7e5406b [quant][fx] Add lowering support for qat and fused convs (#73527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73527

This includes:
```
torch.nn.qat.Conv2d,
torch.nn.qat.Conv3d,
torch.nn.intrinsic.qat.ConvBn1d,
torch.nn.intrinsic.qat.ConvBn2d,
torch.nn.intrinsic.qat.ConvBn3d,
torch.nn.intrinsic.qat.ConvBnReLU1d,
torch.nn.intrinsic.qat.ConvBnReLU2d,
torch.nn.intrinsic.qat.ConvBnReLU3d,
torch.nn.intrinsic.qat.ConvReLU2d,
torch.nn.intrinsic.qat.ConvReLU3d
torch.nn.intrinsic.ConvReLU1d,
torch.nn.intrinsic.ConvReLU2d,
torch.nn.intrinsic.ConvReLU3d,
```
We first produce the reference pattern and then lower the reference pattern to quantized modules

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34583206

fbshipit-source-id: d298114d1906ea44c071b0eee52730dadf67fd3e
(cherry picked from commit 6498af35b5aa6104cadb68ca48dff4e443bee7d6)
2022-03-04 06:29:03 +00:00
dzdang
a39e8e8f5e [Quant][fx] Added explicit entries for for functional and module conv&linear support into get_default_qconfig_dict&get_default_qat_qconfig_dict (#73528)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73528

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34535572

Pulled By: dzdang

fbshipit-source-id: 883f46e014e47aeba3ea6f9fb401c54e3792b2ac
(cherry picked from commit 66713d518295b2e7306561030aa6b7ca049a708c)
2022-03-04 03:29:20 +00:00
Vasiliy Kuznetsov
bf896a2988 dbr quant: add torchscript pass to remove redundant aliases (#71230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71230

DBR quantization uses `torch.Tensor.as_subclass` frequently. When
the quantized model is traced with `torch.jit.trace`, these calls appear
in the resulting graph as `aten::alias`. This PR adds a pass to remove
these calls from the graph, for two reasons:
1. ease of debugging (these calls do nothing)
2. less work for downstream passes (for example, converting to ONNX currently breaks if these alias calls are present)

For now, we have to inline the graph in order for `aliasDb` to determine
safety properly. In the future, we may choose to relax this if there is
a need for it.

Test Plan:
Test plan is pretty basic for now, it can be improved in future PRs.
```
python test/test_quantization.py TestQuantizeDBR.test_jit_tracing_removes_aliases
```

Reviewed By: eellison

Differential Revision: D33552387

Pulled By: vkuzo

fbshipit-source-id: 681a33ddfff394a91e971263ac593afd93c5ea78
(cherry picked from commit 0f8412725d0c6fd9ef1072a50d4203465aa5d1f9)
2022-03-03 15:31:53 +00:00
Vasiliy Kuznetsov
eb8d06591c quantization: fix bug in QuantWrapper with DeQuant qconfig (#73671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73671

QuantWrapper did not correctly apply qconfig to the dequant.
Therefore, if the user first applied qconfig to their module and
then wrapped it with `QuantWrapper`, the dequant would not get
swapped during the convert step.

The fix is to properly apply the qconfig to the dequant.

Test Plan:
```
python test/test_quantization.py TestQuantizeEagerPTQStatic.test_quantwrapper_attaches_qconfig_to_dequant
```

Reviewed By: MaigoAkisame

Differential Revision: D34585260

Pulled By: vkuzo

fbshipit-source-id: 82055a9fa7fc13a714fe460deb461c2e87e76b39
(cherry picked from commit c9f392333dd1c005d893bdc2fbafe8a82b317c88)
2022-03-03 15:31:53 +00:00
Andrew Or
b7a7cdd00a [Quant][fx] Add lowering for functional linear (#72855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72855

This adds functionality to lower reference models
involving functional linear in FX.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_functional_linear

Imported from OSS

Reviewed By: albanD

Differential Revision: D34514127

fbshipit-source-id: 7af4f37bdeda710dc7197ede9d46f66227d7932c
(cherry picked from commit a14cbc04dea4e578643c4183f0c8ea43fbdaf5c7)
2022-03-02 18:34:35 +00:00
Jerry Zhang
81437e66c1 [quant][fx] Add RNN reference module (#73386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73386

This PR adds support for RNN reference module, following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
This includes: RNNCell, LSTMCell, GRUCell, LSTM

Test Plan:
will be tested in the lowering flow in a separate PR

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34469445

fbshipit-source-id: 71a13d7d056f7aaccdd98fb477c8a3a38aecc249
(cherry picked from commit 0b10f0d127515556b677eae3150f026ac8cd9acd)
2022-03-02 10:30:37 +00:00
Jerry Zhang
bea075f305 [quant] Add support for multiple inputs in fusion pattern (#73572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73572

Previously we can't specify how to get extra inputs for fused ops in backend_config_dict,
for example, for patterns like:
(torch.add, (nn.BatchNorm2d, nn.Conv2d), MatchAllNode)

where nn.Conv2d is the root node, the extra MatchAllNode (the input for original torch.add) would be lost
This PR added a "extra_inputs_getter" key in the backend_config_dict, which allows user to provide a function,
that can return a list of extra input node for the fused op given the matched node pattern. In this case,
we need a function that returns the node that matches with `MatchAllNode`, it would be something like the following:

```
def extra_inputs_getter(pattern):
    add, conv_bn, extra_input = pattern
    return [extra_input]
```

Test Plan:
python test/test_quantization.py TestFuseFx.test_fusion_pattern_with_multiple_inputs

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34553210

fbshipit-source-id: 748f8ce20974438458a39dbe9eae75281156c227
(cherry picked from commit be748526480e811874dbca64b1cf3bf4950f0393)
2022-03-02 08:37:07 +00:00
Andrew Or
fb2fe11ce4 [Quant][improvement] Rename ReferenceableQuantizedModule (#72717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72717

This will be renamed to WeightedQuantizedModule to
minimize confusion with reference modules.

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34172554

fbshipit-source-id: 4cd77d6048fde4875218386f7e55f864a73d5bd3
(cherry picked from commit b7af4cedb4275b6f9c06c0773f2997bc4e61578a)
2022-03-01 17:43:16 +00:00
Jerry Zhang
d39ad0543a [quant][fx] Remove Fuser class in fusion implementation (#73470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73470

att, this does not affect user apis since we are only exposing fuse_fx as a public api

Test Plan:
python test/test_quantization.py TestFuseFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34495260

fbshipit-source-id: 3aa253bc7190e50acc7229186f210901ebc5481b
(cherry picked from commit a88517ff6feff7abbece2234d82fd53e33702237)
2022-03-01 09:29:21 +00:00
Jerry Zhang
ad1078a21e [quant] Enable reference path by default for CopyNodeQuantizeHandler (#73233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73233

This PR makes CopyNodeQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops

Lowering passes have been implemented previously, we just need to enable the reference path here,
and cleanup the previous code to allow list some of the ops (`check_node`)

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34469446

fbshipit-source-id: b9d9c5f793fbb735839199056c197ae98969cc4b
(cherry picked from commit af0cf4e79e11e7343d57e6ff7766c80e72ec60f3)
2022-03-01 01:33:30 +00:00
Jerry Zhang
5613527ef9 [quant][fx] Add lowering support for functional ops using DefaultNodeQuantizeHandler (#73120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73120

att
This is to align our implementation with https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34354038

fbshipit-source-id: 873a867e62bd541ef236974c697fac2334bf02ea
(cherry picked from commit 3fce7cade2f057b985833659c2cb365ee4d6d9f3)
2022-02-26 19:29:58 +00:00
Jerry Zhang
45a042037f [quant][fx] Add root_node_getter in backend_config_dict (#73345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73345

For complex patterns we need to identify which node is the root, so that we can eliminate all other nodes and only preserve the root,
e.g. (torch.add, MatchAllNode, (torch.nn.ReLU, torch.nn.Conv2d)), we can preserve the torch.nn.Conv2d as root node, and remove other nodes.

Prevoiusly we assumed the root_node of a pattern is the "last node" of the pattern, computed by:
```
def default_root_node_getter(node_pattern):
    while not isinstance(node_pattern[-1], Node):
       node_pattern = node_pattern[-1]
       return node_pattern[-1]
```
This PR enables user configuration to define their own root_node_getter, that means we can define root_node for patterns like:
(torch.add, (torch.nn.ReLU, torch.nn.Conv2d), MatchAllNode)

Test Plan:
python test/test_quantize_fx.py TestFuseFx.test_root_node_getter

Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D34442193

fbshipit-source-id: 2f6da69a5b6527b49710ae32820e8e2915d9af37
(cherry picked from commit 8b49bf0d7d53cdcf2c9f40f8e25bc843e8814026)
2022-02-26 06:34:22 +00:00
Jerry Zhang
186ef8b22d Fix test missing target (#73415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73415

Fix test missing TARGET file issue.

Test Plan: buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt

Reviewed By: yinghai

Differential Revision: D34400710

fbshipit-source-id: e68145b4e70db5333f4a8d11a2d240a2f38b4077
(cherry picked from commit fe78e63f0c646409b1cdab91d3b139f1b0a97b9e)
2022-02-26 03:31:31 +00:00
Jerry Zhang
16554bec1b [qunat][fx][fix] Fix get_module_type for fusion (#72735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72735

We use `get_matched_types` to get the (type) pattern from matched modules.
And we need to use MatchAllNode instead of type(MatchAllNode) to query the fuser_method for the pattern

Test Plan:
TODO

Imported from OSS

Reviewed By: raghuramank10000

Differential Revision: D34180705

fbshipit-source-id: db9b6e791a9f26b70079fddc95fce033052199ab
(cherry picked from commit 01d38afabcb1bfc207dee7d49ee13df500d32fdf)
2022-02-25 18:37:31 +00:00
Jerry Zhang
ee5b8f0c64 [quant][fx] Move MatchAllNode from match_utils.py to utils.py under quantization (#73344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73344

not user facing as of now, since we haven't advertised the backend_config_dict api,
we need this in fuser_method_mapping.py, this is to avoid circular dependency

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D34441778

fbshipit-source-id: 7a01c359e4b21e9e98345dc7781f735628209a20
(cherry picked from commit 758537094c5a98a17a8825b3f240c8d5acdd72b0)
2022-02-25 17:36:14 +00:00
Jerry Zhang
9db0e0e76e [quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op (#72953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72953

This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops
it includes rewrite for
torch.ops.quantized.add, torch.ops.quantized.mul, torch.ops.quantized.matmul

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: gchanan

Differential Revision: D34292408

fbshipit-source-id: 9872a5098249bc77db15e9fb614416958e62b9b2
(cherry picked from commit dbdc61ee8b5dde2e54a34a370a3af887e5117398)
2022-02-25 17:36:14 +00:00
Terry Chen
16e2f5d291 [quant] Add ConvTranspose reference module - Reland #73031 (#73094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73094

Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34352228

fbshipit-source-id: 03062d6b441bc5a3298ec094f421a69c4c3d5c40
(cherry picked from commit 2f2bdd4fcf)
2022-02-23 02:31:42 +00:00
Vasiliy Kuznetsov
6d86dc5390 dbr quant: store auto_quant_state on the top level model (#72934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72934

Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as

```
def forward(self, x):
    for module in self:
        x = module(x)
    return x
```

Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.

This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:

```
// toy model
model
 |--> child1

// toy model with AutoQuantizationState objects, before this PR
model
 |--> child1
 |  |--> _auto_quant_state
 |--> _auto_quant_state

// toy model with AutoQuantizationState objects, after this PR
model
 |--> child1
 |--> _fqn_to_auto_quant_state_map
    |--> ( ) --> _auto_quant_state // of `model`
    |--> (child1) --> _auto_quant_state // of `model.child1`
```

Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.

Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.

The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.

Test Plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```

Reviewed By: dzdang

Differential Revision: D34281074

Pulled By: vkuzo

fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1
(cherry picked from commit 4008f89967)
2022-02-22 17:31:32 +00:00
Jane Xu
477d1bd6cf Revert D34313425: [quant] Add ConvTranspose reference module
Test Plan: revert-hammer

Differential Revision:
D34313425 (710f12f58e)

Original commit changeset: 3eeec1b24a51

Original Phabricator Diff: D34313425 (710f12f58e)

fbshipit-source-id: aecf9113d2e4cef3ccf4e1a9c4c33b07dc2ad385
(cherry picked from commit 3fcb9cd14d)
2022-02-18 17:31:20 +00:00
Vasiliy Kuznetsov
1c0df26597 eager quant: convert mapping for fused QAT Linear-Bn1d (#72796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72796

Adds the eager mode convert mappint for fused QAT Linear-Bn1d module.

Test Plan:
```
python test/test_quantization.py TestQuantizeEagerQATNumerics.test_linear_bn_workflow
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34213150

fbshipit-source-id: c08b5eb843dea673fd07c6b7b93dcd3ba03eaec2
(cherry picked from commit 722edfe676)
2022-02-18 13:14:56 +00:00
Vasiliy Kuznetsov
e73eaffd3b quant: add QAT fused Linear-Bn1d [1/x]: prepared module (#72431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72431

Adds support for a fused QAT observed module for `Linear` followed by
`BatchNorm1d`. In this PR, only the support for prepared module with
fake_quants in the right places is added.

A future PR will add support for `convert`, and tests for eager and FX
graph mode workflows.

Similar to conv-bn, we rescale the weight before applying the fake
quant, and undo the rescaling after the linear operation.

Test Plan:
```
python test/test_quantization.py TestQuantizeEagerQATNumerics.test_linear_bn
```

Imported from OSS

Reviewed By: jerryzh168, raghuramank10000

Differential Revision: D34044427

fbshipit-source-id: 47a519173939ca4824d2c6e6ea7a599764a8ed10
(cherry picked from commit bfc75fe078)
2022-02-18 13:14:56 +00:00
Terry Chen
710f12f58e [quant] Add ConvTranspose reference module (#73031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73031

Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34313425

fbshipit-source-id: 3eeec1b24a51c7951c4d4b0c7dca43a012468b85
(cherry picked from commit 0ee7c1cc39)
2022-02-18 06:29:12 +00:00
Nikita Shulga
e6fd28fb05 Revert D34126542: [Qunat] Add ConvTranspose reference module
Test Plan: revert-hammer

Differential Revision:
D34126542 (7a031ec17f)

Original commit changeset: 7da167695a1f

Original Phabricator Diff: D34126542 (7a031ec17f)

fbshipit-source-id: 14e40884807b9908017ae30af83a8dea23ff1f0f
(cherry picked from commit f99a7f5a69)
2022-02-16 22:24:15 +00:00
Terry Chen
f67cf03526 [Quant] Add qint32 quantization support (#72472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72472

Add dtype=int32 support for observer

Test Plan:
python3 test/test_quantization.py TestObserver.test_per_tensor_observers

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34056640

fbshipit-source-id: 4fa15a7274cfbb6a7dd4e698e3989cc0c0626e7b
(cherry picked from commit bf4351de45)
2022-02-16 03:45:15 +00:00
Terry Chen
7a031ec17f [Qunat] Add ConvTranspose reference module (#72473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72473

Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_op

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34126542

fbshipit-source-id: 7da167695a1fd9c141059bce14cce4f0608b086c
(cherry picked from commit dee22dcf48)
2022-02-16 01:56:28 +00:00
Jerry Zhang
3d377fb4a3 [quant][fx][improvement] Add lowering support for BatchNormQuantizeHandler (#72490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72490

This is an effort to move the current implementation towards the reference quantized model design:
https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
so that we use reference model in the default fbgemm/qnnpack path

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps.test_qbatch_norm

Imported from OSS

Reviewed By: vkuzo, andrewor14

Differential Revision: D34062365

fbshipit-source-id: ed015c61f5b969554a6477f92cf6be2358cb558c
(cherry picked from commit 9498421ddd)
2022-02-15 21:34:17 +00:00
Jerry Zhang
8b67b83c6e [quant][fx][improvement] Add lowering support for FixedQParamsOpQuantizeHandler (#72488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72488

This is an effort to move the current implementation towards the reference quantized model design:
https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
so that we use reference model in the default fbgemm/qnnpack path

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34062364

fbshipit-source-id: 50c4a86644c3f5f6fb03d2a98aa7376895c0fc84
(cherry picked from commit ed8122e44d)
2022-02-11 18:13:29 +00:00
Vasiliy Kuznetsov
decc79e541 fx quant: add workflow support for torch.matmul quantization (#72444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72444

In https://github.com/pytorch/pytorch/pull/71783 support was added for
quantized matmul.

In this PR, the FX graph mode quantization workflow support for this
operator is added, for int8 dtypes.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_qmatmul
```

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34047310

fbshipit-source-id: 781219047419ce621a4deb46ea04881818bf4209
(cherry picked from commit 7e039fa3a1)
2022-02-09 18:43:58 +00:00
Jerry Zhang
ac0cac7724 [quant][fx][devs] Add lowering support for torch.cat (#72487)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72487

This is an effort to move the current implementation towards the reference quantized model design:
https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
so that we use reference model in the default fbgemm/qnnpack path

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34062366

fbshipit-source-id: 86673bead79180a7509b51bd577f328e90f24893
(cherry picked from commit de3e443384)
2022-02-09 06:09:57 +00:00
Jerry Zhang
4b69a2373f [quant][fx] Add lowering support for ops in GeneralTensorShapeOpQuantizeHandler (#72387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72387

Also make GeneralTensorShapeOpQuantizeHandler produce reference patterns by default

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: albanD, terrychenism

Differential Revision: D34025005

fbshipit-source-id: 01ca62cce727bbf4579ba8fb2b8c40198f327b86
(cherry picked from commit 7f3a9ab4c5)
2022-02-09 02:10:20 +00:00
Andrew Or
9d08318aa3 DBR Quantization: Add support for functional conv variants (#71795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71795

This commit expands the API coverage of functional conv
ops in DBR quantization from F.conv2d to all conv variants.

Test Plan:
python test/test_quantization.py TestQuantizeDBRIndividualOps.test_conv_functional

Imported from OSS

Reviewed By: albanD

Differential Revision: D33907099

fbshipit-source-id: f459c219482822f64c7c9d22cd316c6e9ef44405
(cherry picked from commit acf4548e8d)
2022-02-08 22:52:27 +00:00
Vasiliy Kuznetsov
998a5adf8a dbr quant function fusion [2/x]: use fusion for observation and inference (#71781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781

The previous PR added information about fusions found in the subgraphs.

This PR uses that information for:
1. inserting observers at the end of fusions and not in the middle
2. during inference, replacing the original op with the fused op. The
way this is implemented is that the base op is replaced with the fused op,
and all other ops are replaced with identity functions.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```

Reviewed By: jerryzh168

Differential Revision: D33775097

Pulled By: vkuzo

fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf
(cherry picked from commit 0db4324ea9)
2022-02-07 14:00:26 +00:00
Vasiliy Kuznetsov
d672bbd0a9 fx quant: add fusion matching for operator.add and torch.relu (#71780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71780

Adds support for matching operator.add -> torch.relu in FX graph
mode quantization.

It would be nice to support torch.relu better in general, but
saving that for a future PR to keep PRs small.

This is useful for DBR quant because we have some test cases in DBR
quant which use add-relu, and we'd like to match them to FX.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_add_relu
python test/test_quantization.py TestQuantizeFxOps.test_mul_relu
```

Reviewed By: jerryzh168

Differential Revision: D33775096

Pulled By: vkuzo

fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24
(cherry picked from commit c1f9f38ca1)
2022-02-07 14:00:26 +00:00