The Activation sparsifier class aims to sparsify/prune activations in a neural
network. The idea is to attach the sparsifier to a layer (or layers) and it
zeroes out the activations based on the mask_fn (or sparsification function)
input by the user.
The mask_fn is applied once all the inputs are aggregated and reduced i.e.
mask = mask_fn(reduce_fn(aggregate_fn(activations)))
Note::
The sparsification mask is computed on the input **before it goes through the attached layer**.
Test Plan:
```python test/test_ao_sparsity.py TestActivationSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80886
Approved by: https://github.com/HDCharles
Summary: This updates the DynamicStatic Detector to also provide insight
into whether Conv layers should use dynamic or static quantization.
Before, this was not included because as of now, Dynamic quantization is
not supported for Conv layers, but this adds a check for Conv layers and
if dynamic is recommended, it will also give a disclaimer that it is not
currently supported but will be in the future.
Test Plan: python test/test_quantization.py TestFxModelReportDetectDynamicStatic
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81972
Approved by: https://github.com/jerryzh168
Summary: The current implementation of the InputWeightEqualization
detector broke when it was tested on MobileNetV2, and the reason for
this is that it wasn't able to properly handle groups in Conv layers,
and there also had to be some minor reshaping of the weights to handle
this as well.
In addition, the output was correspondingly tuned so that instead of
giving on output for each channel on each layer, it gives a single
suggestion per module and just lets it know how many of the channels
could benefit from input-weight equalization, and suggests it if it's
more than half.
There was also the realization that the test class didn't do a good job
of testing different dimensions for the batch vs. height vs. width, so
this was updated to be more comprehensive as well.
Test Plan: python test/test_quantization TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81971
Approved by: https://github.com/jerryzh168
This callback aims to sparsify the model inside lightning module after training.
**Note that the model is copied and then sparsified, so the existing model is not modified**
The sparsified model can be used for comparison and can be accessed using
<callback_obj>.sparsified
Test Plan:
```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestTrainingAwareCallback```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80371
Approved by: https://github.com/z-a-f
Lightning callback that enables post-training sparsity.
This callback aims to sparsify the model inside lightning module after training.
**Note that the model is copied and then sparsified, so the existing model is not modified**
The sparsified model can be used for comparison and can be accessed using <callback_obj>.sparsified
Test Plan
```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestPostTrainingCallback```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80370
Approved by: https://github.com/z-a-f
Add prelu op and module for quantized CPU backend.
The PR includes:
- Quantized version of prelu op
- Native prelu kernel for quantized CPU
- Prelu modules in `nn` and `nn.quantized`
- FX support for prelu
- Unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491
Approved by: https://github.com/jerryzh168
Summary: Added the functionality to be able to get the feature names and
module_fqns from the ModelReportVisualizer class. The purpose of this
addition is so that users can see the exact set of module_fqns or
feature names that they can filter based on, and use this information to
perform their filtering.
Test Plan: python test/test_quantization.py
TestFxModelReportVisualizer.test_get_modules_and_features
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81647
Approved by: https://github.com/andrewor14
Summary: We created a ModelReportVisualizer class, and the primary
way it is envisioned that it is accessed is:
```
model_report_visualizer = model_reporter.generate_visualizer()
```
This method only works after reports have been generated and it takes in
the generated reports and reformats them to be ordered by module, into
the format required by the ModelReportVisualization. It then generates
the visualizer instance and returns that to the user.
Test Plan: python test/test_quantization.py TestFxModelReportClass.test_generate_visualizer
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81589
Approved by: https://github.com/andrewor14
Summary: This introduces the skeleton for the ModelReportVisualizer
class. This class helps visualize the information generated by the
ModelReport class `generate_report()` output. This class aims to provide
visualizations in a table, plot (line graph) and histogram view.
This also introduces an empty test class for testing visualizations. As
implementations start occuring for this class, tests will also be
approrpriately added.
This includes the high level descriptions for each of the methods as
well. Expected use cases will be added to the class description in a
future commit as that gets finalized.
Test Plan: python test/test_quantization.py TestFxModelReportVisualizer
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81523
Approved by: https://github.com/andrewor14
Summary: Currently, the ModelReport API only takes in detectors at the
beginning and for each of its methods, you have to pass in the model
each time, which doesn't really make sense because:
1. you will always want to be working on the same model
2. passing in a different model could break things, so more
fault-tolerant if we keep the model internally and make calls on it
Therefore, now the model will be passed in in intialization, and will
just be used for the rest of the operations with the local link.
All the ModelReport tests have been adjusted to account for this, and
this change must pass all the tests to ensure a successful API
transition.
If you wish to see how the updated API looks, the Expected Usage in the
ModelReport clas description has been updated to reflect the changes.
The README has also been updated with these changes as well.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81588
Approved by: https://github.com/jerryzh168
Summary: Currently, all the detectors have pretty accurate naming
schemes that give an idea of what they do. However, since now there are
more and more detectors being developed, there is a need to make sure
that the naming scheme for detectors are consistent for their keys.
This updates the keys of the returned dictionary keys to better
highlight if something is an activation stat or weight stat, etc.
Test Plan:
python test/test_quantization.py TestFxModelReportDetector
python test/test_quantization.py TestFxModelReportObserver
python test/test_quantization.py TestFxModelReportDetectDynamicStatic
python test/test_quantization.py TestFxModelReportClass
python test/test_quantization.py TestFxDetectInputWeightEqualization
python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81587
Approved by: https://github.com/jerryzh168
Summary: Currently the InputWeightEqualizationDetector has a
multi-layered output.
Example
```
{'block1.linear': {'channel_axis_selected': 1,
'channel_comparison_metrics': tensor([0.8736, 0.6594, 0.2916], grad_fn=<DivBackward0>),
'input_range_info': {'global_max': tensor(9.),
'global_min': tensor(-10.),
'per_channel_max': tensor([9., 9., 9.]),
'per_channel_min': tensor([-10., -10., -10.])},
'input_weight_equalization_recommended': [True,
False,
False],
'threshold': 0.8,
'weight_range_info': {'global_max': tensor(0.5618, grad_fn=<UnbindBackward0>),
'global_min': tensor(-0.2211, grad_fn=<UnbindBackward0>),
'per_channel_max': tensor([0.3764, 0.5618, 0.2894], grad_fn=<NotImplemented>),
'per_channel_min': tensor([-0.2211, 0.2213, 0.2228], grad_fn=<NotImplemented>)}},
}
```
With all the levels, it can be hard to parse the information for
anything, especially the planned visualization feature where the data
has to be reorganized. Therefore, to make it standardized across all
detectors, all outputs will be limited to one level.
The new format is:
```
{'block1.linear': { 'channel_axis_selected': 1,
'channel_comparison_metrics': tensor([0.5705, 0.9457, 0.8891], grad_fn=<DivBackward0>),
'activation_global_max': tensor(9.),
'activation_global_min': tensor(-10.),
'activation_per_channel_max': tensor([9., 9., 9.]),
'activation_per_channel_min': tensor([-10., -10., -10.]),
'input_weight_equalization_recommended': [False, True, True],
'threshold': 0.8,
'weight_global_max': tensor(0.4258, grad_fn=<UnbindBackward0>),
'weight_global_min': tensor(-0.4958, grad_fn=<UnbindBackward0>),
'weight_per_channel_max': tensor([0.1482, 0.3285, 0.4258], grad_fn=<NotImplemented>),
'weight_per_channel_min': tensor([-0.1517, -0.4958, -0.3027], grad_fn=<NotImplemented>)},
}
```
The README will also be updated to reflect this change.
Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81586
Approved by: https://github.com/jerryzh168
Summary: Currently, the PerChannelDetector has a multi-layered output.
Example:
```
{'backend': 'qnnpack',
'per_channel_status': {'block1.linear': {'per_channel_supported': True,
'per_channel_used': False},
'block2.linear': {'per_channel_supported': True,
'per_channel_used': False}}}
```
The issue with this is that when it comes to future features such as
visualizations where we need to go through this dictionary, it can be
hard because of the variable number of layers.
This changes the output format of the PerChannelDetector to have a
standard format.
Ex.)
```
{'block1.linear': {'backend': 'qnnpack',
'per_channel_supported': True,
'per_channel_used': False},
'block2.linear': {'backend': 'qnnpack',
'per_channel_supported': True,
'per_channel_used': False}}
```
Test Plan: python test/test_quantization.py TestFxModelReportDetector
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81585
Approved by: https://github.com/HDCharles
Summary: There were accidently two lines added after a return statement
in the OutlierDetecor insertion that was not caught by either the linter
nor the tests nor i, that were harmless, but some odd merge issue. This
removes those two lines.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81499
Approved by: https://github.com/kit1980
Summary: This adds a README for the ModelReport functionality that
contains an overview of the class, what it does,
and how it works, an example of usage, information on how to implement a
new detector (since this is how core functionality is added), folder
structure information, and finally information on tests and where they
are located.
The ModelReport class is still in development and will, in the future,
get additional features such as visualizations, and the README will be
updated with this information as it is added.
Test Plan: Just a new README, no code is added, README will be reviewed
for accuracy and ease of use/ easiness to read.
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81369
Approved by: https://github.com/jerryzh168
Summary: Before for the detectors, the
determine_observer_insert_points() function for all of them would have
hard coded strings as the keys for the dictionary that would be returned
to the ModelReport instance, and those same hard-coded keys would be
used to actually extract information from them. Since all detectors used
the same string keys, these were just made default variables at the top
of the detector.py file, and all detectors just used those. The same
ones are imported and now used in ModelReport file as well. This way,
there is less of a chance of an error because of incorrectly typed
strings.
The test plan primarily tests the ModelReport class because this uses
the same new vars as well for the strings and is the primary one calling
each of the detector instances' determine_observer_insert_points()
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81382
Approved by: https://github.com/jerryzh168
Summary: Before, all the function calls for the ModelReport object were
dependent on the Fx Graph Mode workflow. However, in reality, this was
not true and the only requirement that was needed was for the model to
be a traceable GraphModule. This also helped keep the ModelReport class
as detached from the Fx Workflow as possible so that it can be used as a
more all purpose tool in the future.
This updated all the references to make sure that it wasn't specifically
referencing that a Fx Graph Mode workflow is needed, and is instead more
general since all we really need is a traceable model.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81252
Approved by: https://github.com/jerryzh168
Summary: This adds a example usage description to the ModelReport class
so that people can see how it can be used right in the class
documentation without having to consult external sources. The example
usage depicts how it can be used using the QuantizationTracer, which was
a decision taken to illustrate how there is no strict requirement on
using this tool with only Fx Graph Mode workflow.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81251
Approved by: https://github.com/jerryzh168
Summary: A huge part of the work for the Outlier detector was figuring
out what a good nth percentile to compare against the 100th percentile
was while also figuring out what a good comparision ratio would be. This
commit adds the link to a collab to the documentation of the function so
that people can go and see what the calculations used to determine those
values are and realize that they are not just randomly thrown in there.
At a high level, this collab contains work that includes:
- Figuring out whether to use interpolation or lower as the rule for
finding quantile between two indices
- Figuring out what a good value for reference_percentile is
- Figuring out what a good value for ratio_threshold is
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81250
Approved by: https://github.com/jerryzh168
Summary: The current Outlier detector does a good job of finding whether
data distributions passing through layers have outliers. However,
suppose we have a completely constant channel. The outlier detector
would not detect it as an outlier, but that is still something we want
to highlight because a constant channel usually is a result of a bad
configuration or something really wrong with the data.
To address this there are two additions to the outlier detector that
this commit makes:
- The first is to add whether there are any constant batches at all and
let the user know in the text report
- The second is to let the user know the number of total constant
batches found for each channel, so they can figure out if there are any
unnecessary channels present.
The exisiting outlier detector tests were modified to do a quick check
for this feature.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81249
Approved by: https://github.com/andrewor14
Summary: The outlier detector has a feature where it's able to notify
the user if below the whole set of batches that passed through were used
in Outlier calculation, which mainly happens as a result of 0-errors.
This changes the code so that instead of comparing against a value like
30 as we were before, we now let the user pass in an optional fractional
value and if the ratio of the batches used was below that value, the
detector alerts the user.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81174
Approved by: https://github.com/andrewor14
Summary: This adds the implementation for the report generation for the
Outlier Detector class. This includes both the generation of a
dictionary containing each module that had an observer attached and any
relavent stats collected by the observer that can help shed light on
outlier relavent data or computed metrics. It also includes a string
denoting specific modules that had outliers and gives a bit of insight
into what channels they are contained in.
This contains both the implementation for the report generation for the
outlier detector as well as a test class to test the report generation
functionality.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80937
Approved by: https://github.com/andrewor14
The previous implementation was using loops to compute the sparsity within a block in a mask, as well as across the mask blocks. This implements the vectorized version.
## Vectorization:
A high level overview of the vectorization procedure falls into a two step process:
### Tensor-level masking
A tensor-level masking is a mask generation routine that has a granularity of `sparse_block_shape`. That means that only patches of that shape can be considered sparse/dense. To vectorize:
1. Reshape the data such that one of the dimensions represents the patches of sparse_block_shape.
2. Create a mask of the same shape as the reshaped data
3. Find the smallest `k` elements in the the data, given the dimension of the sparse "patches". `k` represents a derived paramter specifying the sparsity level.
4. Apply the 0/1 to the patches in the mask
5. Reshape the mask back to the original dimensions
Note: because the shape of the mask might not be multiple of the sparse_block_shape, we nudge the sshape of the mask, and truncate it afterwards.
## Block-level masking
A block-level masking is a mask generation routine that concerns itself only with sparsity within a patch of shape `sparse_block_shape`. This is useful when block sparsity allows partial block sparsification.
To vectorize:
Overall the block-level masking follows the same routine as the tensor-level algorithm described above. One distinction is that when reshaping the data/mask tensors we aim for creating a dimension that captures the internals of each patch. For example, if a `sparse_block_shape` is `(2, 2)`, we want to reshape the data/mask into `(2, 2, -1)`. That allows us to sort the internal elements on the last axis, and zero-out the ones that obey the sparse logic.
Differential Revision: [D37352494](https://our.internmc.facebook.com/intern/diff/D37352494/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37352494/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80059
Approved by: https://github.com/jerryzh168
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/78117
Fixes: https://github.com/pytorch/pytorch/issues/73463
This PR adds a normalization pass that normalizes all the args to keyword args in positional order and fixes lowering code that previously
only uses node.args to use both args and kwargs instead.
Also tried to add a test for F.conv2d, but since conv2d matches multiple schemas we are doing an extra schema match, and because we are using symbolic values
in `transform`, we don't have a schema match, so F.conv2d still fails with runtime errors. we can resolve this issue later when there is a need.
Another thing I'm considering is to do the normalization with real inputs instead of symbolic inputs and not rely on operator_schemas (which is based on torchscript),
and rely on inspect.signature, I tried this briefly but didn't get too far, it looks like we cannot get the python signature for `torch._C._nn.linear`, it might be possible to fix as well, but will need follow up discussions.
The goal for this PR is just to introduce normalization in our codebase so that we can adapt some downstream code to this, and also fix the F.linear issue.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_normalize_args
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D37163228](https://our.internmc.facebook.com/intern/diff/D37163228)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79095
Approved by: https://github.com/andrewor14
Summary: Previously, we automatically moved the model to CPU in
torch.ao.quantization.fx.convert to work around the issue where
certain functions called by convert expect CPU arguments. This
commit pushes this responsibility to the caller since it is the
user's decision of which device to use.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
BC-breaking Notes:
Before:
```
model = resnet18(...)
model = prepare_fx(model, qconfig_mapping, example_inputs)
... # calibrate
model = convert_fx(model)
```
After:
```
model = resnet18(...)
model.cpu()
model = prepare_fx(model, qconfig_mapping, example_inputs)
... # calibrate
model = convert_fx(model)
```
Reviewers: jerryzh168
Subscribers: jerryzh168
Differential Revision: [D37528830](https://our.internmc.facebook.com/intern/diff/D37528830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80555
Approved by: https://github.com/jerryzh168
Summary: This adds the implementation for observer insertion point
selection for the OutlierDetector. For this detector, the insertion
points are to insert a ModelReportObserver before any leaf level module
to study the distribution of data that passes into the module to detect
outliers.
This commit contains the implementation of the observer insertion as
well as the relavent test case. Some code from the
InputWeightEqualization was abstracted and made more modular so the same
helper function could be used for multiple outlier class tests.
As a part of the work for this, there was testing done to determine what
a good default ratio threshold and reference percentile would be, and
the work to determine this (based on a normal distribution) was then
analyzed to find good paramters.
We still want to keep thresholds and reference percentile as something
the user can input because these were based on a normal distribution,
and it can definately vary depending on the type of data a user has.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80880
Approved by: https://github.com/andrewor14
Issue:
Previously, the L1/L2 norm data sparsifier was not supported with
1D tensors or parameters.
Fix:
If the tensor is 1D, then unsqueeze it to make it look 2D and
perform the rest as usual. Also, added some 1D tensor in the
unit test to test this issue.
Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80465
Approved by: https://github.com/z-a-f
Issue:
Previously, the data was not "attached" to the data sparsifier. Meaning
the data sparsifier created a copy of the actual data inside it's container. So,
when the data was modified outside of the sparsifier, the changes was not reflected
in the sparsifier.
Fix:
Use register_buffer() instead of nn.Parameter(..) to store the data inside the container.
Also, added a unit-test to reference this issue.
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80394
Approved by: https://github.com/z-a-f
Summary:
Some of the util functions in FX graph mode quantization throw warnings
such as:
```
/Users/vasiliy/pytorch/torch/ao/quantization/fx/utils.py:410: UserWarning: To copy construct from
a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().
requires_grad_(True), rather than torch.tensor(sourceTensor).
```
This PR fixes the warnings by moving the code to the recommended syntax if the
value is a tensor.
Test plan:
```
python test/test_quantization.py -k test_conv_linear_reference
// warning appeared before this PR and disappeared after this PR
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80883
Approved by: https://github.com/jerryzh168
### Summary:
This PR moves the clamping functionality from `quantize` to `float_to_apot` util function to align with the uniform quantize workflow in the codebase.
### Test Plan:
Run unit tests with:
python pytorch/test/quantization/core/experimental/test_quantizer.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80885
Approved by: https://github.com/dzdang
Summary: This adds the class framework for the ModelReport
OutlierDetector. This detector will be in charge of looking at
activation data and figuring out whether there are significant oultiers
present in them. It will average this data across batches to make a
recommendation / warning if significant outliers are found.
This commit contains just the class framework and a base test class.
Implementations will follow in following commits.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80743
Approved by: https://github.com/HDCharles
Add prelu op and module for quantized CPU backend.
The PR includes:
- Quantized version of prelu op
- Native prelu kernel for quantized CPU
- Prelu modules in `nn` and `nn.quantized`
- FX support for prelu
- Unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491
Approved by: https://github.com/jerryzh168
Summary:
Currently we expect the users to provide custom modules for LSTM and MHA. However, as we almost always ask the users to use those modules in the custom context, it is better to make this behavior default. In this case we try to align with the base quantization API, if the user specifies a custom_config_dict then that is used, however if the value is left as None then the default is used. If a user would like to both use the default and modify it, they have to do so manually, however the default is accessible by get_default_custom_config_dict
Additionally, the NS which uses prepare to insert custom observers for
its purposes had to be slightly modified to pass in an empty
custom_config_dict in order to avoid modifying the custom modules.
due to weird CI issues with previous PR,
previous discussion can be found: https://github.com/pytorch/pytorch/pull/71192
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79960
Approved by: https://github.com/z-a-f
Summary: This commit adds qconfigs with special observers for fixed
qparams ops in get_default_qconfig_mapping and
get_default_qat_qconfig_mapping. For correctness, we also require
users to use these special observers if we detect these fixed
qparams ops in prepare.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184
Approved by: https://github.com/jerryzh168
Summary: This PR removes the is_reference flag from the existing
convert_fx API and replaces it with a new convert_to_reference
function. This separates (1) converting the prepared model to a
reference model from (2) lowering the reference model to a quantized
model, enabling users to call their custom lowering function for
custom backends. For the native fbgemm backend, for example, the
following are equivalent:
```
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx
prepared = prepare_fx(model, ...)
quantized = convert_fx(prepared, ...)
```
```
from torch.ao.quantization.fx import lower_to_fbgemm
from torch.ao.quantization.quantize_fx import (
prepare_fx,
convert_to_reference
)
prepared = prepare_fx(model, ...)
reference = convert_to_reference(prepared, ...)
quantized = lower_to_fbgemm(reference, ...)
```
Note that currently `lower_to_fbgemm` takes in two other arguments
that are difficult for users to provide. A future commit will remove
these arguments to make the helper function more user friendly.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Differential Revision: [D37359946](https://our.internmc.facebook.com/intern/diff/D37359946)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80091
Approved by: https://github.com/jerryzh168
### Summary:
This PR implements APoT fake quantization for the purpose of quantization aware training. This implements `calculate_qparams` and `forward `methods to be used in fake quantization.
### Test Plan:
Run unit tests with: `python pytorch/test/quantization/core/experimental/test_fake_quantize.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79845
Approved by: https://github.com/dzdang
Summary: This adds the implementation for the InputWeightEqualization
detector. This includes both the implementation and the relavent test
cases. This detector is meant to be added to initialize a ModelReport
instance and it will keep track of the necessary statistics to decide if
for certain layers of interest (linear and conv for now), it makes sense
to use input weight equalization and gives the suggestion to the user.
This includes the implementation and subsequent tests for the report
generation functionality of the detector. The full detector should now
be fleshed out and complete with this addition. This included
modifications to the ModelReportObserver class as well to capture min
and max per channel values. In addition, instead of passing in the
observer class to instantiate, the detectors now pass the ModelReport
instance the observer instance that they themselves instantiate.
Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80191
Approved by: https://github.com/HDCharles, https://github.com/andrewor14
### Summary:
This PR updates the APoT global API method signatures and parameters for `dequantize_APoT` and `calculate_qparams` to align with their uniform counterparts in the codebase.
### Test Plan:
Run unit tests with:
`python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
`python pytorch/test/quantization/core/experimental/test_quantizer.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80364
Approved by: https://github.com/jerryzh168
Summary: This adds the implementation for the InputWeightEqualization
detector. This includes both the implementation and the relavent test
cases. This detector is meant to be added to initialize a ModelReport
instance and it will keep track of the necessary statistics to decide if
for certain layers of interest (linear and conv for now), it makes sense
to use input weight equalization and gives the suggestion to the user.
This implements the functionality of adding observer points for the
input-weight equalization detector and contains the relavent tests for
this functionality. The full Detector functionality will be fleshed out
in a later commit.
Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79962
Approved by: https://github.com/HDCharles, https://github.com/andrewor14
### Summary:
This PR updates the design of APoT Observer, Quantizer, and Tensor to be more consistent with their uniform counterparts in the PyTorch framework. APoT Observer now calculates alpha as the max between the absolute values of the max and min values in the input tensor. APoT Quantizer is modified so its instance methods quantize_APoT and dequantize_APoT are called by their global method counterparts. APoT Tensor is modified to account for the new method definition of the `quantize_APoT` from APoT Quantizer.
### Test Plan:
Run APoT Observer class unit tests with: `python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
Run APoT Quantize class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantizer.py`
Run APoT Tensor class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantized_tensor.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80075
Approved by: https://github.com/jerryzh168
The BaseDataScheduler is the abstract scheduler class specifically for the
BaseDataSparsifier class. This class controls a specific hyperparameter of
the sparsifier class and varies it across the training process (or across time).
Args:
data_sparsifier (instance of BaseDataSparsifier)
Implemented class data sparsifier class wherein the update_mask is implemented
schedule_param (str)
A specific hyperparameter of the passed sparsifier that needs to be scheduled/varied
last_epoch (int, default=-1)
This is specifically is passed when training needs to be resumed from a particular
point.
verbose (bool, default=False)
Verbosity of the BaseDataScheduler
The *get_schedule_param()* function needs to be implemented by the user.
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataScheduler```
Differential Revision: [D37358608](https://our.internmc.facebook.com/intern/diff/D37358608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79817
Approved by: https://github.com/jerryzh168, https://github.com/z-a-f
Summary: This adds the framework (method signatures and descriptors) for
the InputWeightEqualization Detector. There is no code implemenation yet
so the test suite for this is a simple pass. This Detector will be used
to determine whether input weight equalization should be recommended.
Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79916
Approved by: https://github.com/HDCharles
Summary: per our design discussion about sparsity API, we're
discontinuing the old API in favor of the new tensor_fqn based one.
The pruning class has not been updated mostly because this change
doesn't cause any further knock on effects.
Test Plan: python test/test_ao_sparsity.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79597
Approved by: https://github.com/z-a-f
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the implementation and the tests for the generate_report
method which is used on a callibrated fx model to generate reports based
on data collected by the inserted observers during the callibration
phase and also potentially remove those observers if desired.
This also addresses and fixes a revert issue that has been fixed.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80054
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the implementation and tests for the
prepare_detailed_calibration method which is used on a prepared fx model
to insert the desired observers for the different detectors.
This also fixes a revert issue with the applied fix.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80053
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the init method and the signatures and docs for each
of the proposed helper functions.
This also address and fixes a revert issue.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80052
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the implementation and the tests for the generate_report
method which is used on a callibrated fx model to generate reports based
on data collected by the inserted observers during the callibration
phase and also potentially remove those observers if desired.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79792
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the implementation and tests for the
prepare_detailed_calibration method which is used on a prepared fx model
to insert the desired observers for the different detectors.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79752
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the init method and the signatures and docs for each
of the proposed helper functions.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79595
Approved by: https://github.com/andrewor14
Summary: the goal is to add a base class to the model report detectors
so that they can contain a lot more specific information compared to the
primary model report class related to the observers and where they are
inserted etc.
Since this is just a base class, the testing will be with the
implemenations of the classes that derive from the base class
The two current detector methods were turned into Detector classes and
the tests were modified to reflect this, but the same functionality was
tested.
As a result, _detector.py was changed to detector.py
Test Plan: python test/test_quantization.py TestFxModelReportDetector
python test/test_quantization.py TestFxModelReportDetectDynamicStatic
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79671
Approved by: https://github.com/andrewor14
Summary: updated sparsity api to accept tensor_fqn as primary
specification method, i.e. [{'tensor_fqn':
'linear.weight'}]
Pruning API also updated due to knock on changes.
left old api for accepting module_fqns but changed 'fqn' to 'module_fqn'
for clarity (this will break BC)
updated variables in code to use module rather than layer
updated state dict to use tensor_fqn rather than 'fqn' or 'module_fqn'
Test Plan: python test/test_ao_sparsity.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79113
Approved by: https://github.com/z-a-f
Summary:
Refactors `find_matches` function to only find subgraph
matches and not assign qconfigs to them. Moves the qconfig assignment
outside of the function. No logic change.
This will useful for prototyping future tools for quantizing
parts of the model. These tools will need to know the matches
and will reuse the `find_matches` function,
but they will assign their own qconfigs to them using a different
strategy.
Test plan:
```
python test/test_quantization.py -k Fx
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79713
Approved by: https://github.com/jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79066
Following https://github.com/pytorch/pytorch/pull/78452,
this commit replaces the following config dicts with python objects:
- prepare_custom_config_dict -> PrepareCustomConfig
- convert_custom_config_dict -> ConvertCustomConfig
- fuse_custom_config_dict -> FuseCustomConfig
This leads to better type safety and better user experience in
notebook settings due to improved auto completion. The new APIs
are as follows:
```
from torch.ao.quantization.fx.custom_config import PrepareCustomConfig
prepare_custom_config = PrepareCustomConfig() \
.set_float_to_observed_mapping(float_class, observed_class) \
.set_non_traceable_module_names(["mod1", "mod2"]) \
.set_non_traceable_module_classes([class1, class2]) \
.set_input_quantized_indexes([0, 1]) \
.set_output_quantized_indexes([0]) \
.set_preserved_attributes(["attr1", "attr2"])
convert_custom_config = ConvertCustomConfig() \
.set_observed_to_quantized_mapping(observed_class, quantized_class) \
.set_preserved_attributes(["attr1", "attr2"])
model = prepare_fx(
model,
qconfig_mapping,
example_inputs,
prepare_custom_config=prepare_custom_config)
model(data)
model = convert_fx(model, convert_custom_config=convert_custom_config)
```
For backwards compatibility, prepare_fx, prepare_qat_fx, and
convert_fx will continue to accept Dicts, which will be converted
to the relevant *CustomConfig object internally.
Note that this commit does not modify existing tests to use the
new API; they will continue to pass in Dicts as before, which still
works but triggers a deprecation warning. This will be handled in
a future commit.
Differential Revision: [D37088095](https://our.internmc.facebook.com/intern/diff/D37088095/)
Approved by: https://github.com/jerryzh168
L2-Norm Sparsifier
This sparsifier computes the *L2-norm* of every sparse block and "zeroes-out" the
ones with the lowest norm. The level of sparsity defines how many of the
blocks is removed.
This sparsifier is controlled by three variables:
1. `sparsity_level` defines the number of *sparse blocks* that are zeroed-out
2. `sparse_block_shape` defines the shape of the sparse blocks. Note that
the sparse blocks originate at the zero-index of the tensor.
3. `zeros_per_block` is the number of zeros that we are expecting in each
sparse block. By default we assume that all elements within a block are
zeroed-out. However, setting this variable sets the target number of
zeros per block. The zeros within each block are chosen as the *smallest
absolute values*.
Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79535
Approved by: https://github.com/z-a-f
L1-Norm Sparsifier
This sparsifier computes the *L1-norm* of every sparse block and "zeroes-out" the
ones with the lowest norm. The level of sparsity defines how many of the
blocks is removed.
This sparsifier is controlled by three variables:
1. `sparsity_level` defines the number of *sparse blocks* that are zeroed-out
2. `sparse_block_shape` defines the shape of the sparse blocks. Note that
the sparse blocks originate at the zero-index of the tensor.
3. `zeros_per_block` is the number of zeros that we are expecting in each
sparse block. By default we assume that all elements within a block are
zeroed-out. However, setting this variable sets the target number of
zeros per block. The zeros within each block are chosen as the *smallest
absolute values*.
Test Plan:
```python test/test_ao_sparsity.py TestNormDataSparsifiers```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79534
Approved by: https://github.com/z-a-f
Added the embedding and embedding bags in the supported data types. Currently, the base data sparsifier extracts the weight
and stores it as a parameter in the internal module container whose requires_grad=False. The embeddings inside the data sparsifier
are non-trainable
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79254
Approved by: https://github.com/z-a-f
Users can now just pass in a nn.Parameter (or layer.weight) to the Data Sparsifier.
Note: The data sparsifier stores the passed nn.Parameter as a new parameter in the internal container module whose requires_grad=False.
So, essentialy when the parameter is trained, it's new values are not reflected inside the data sparsifier class
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79253
Approved by: https://github.com/z-a-f
The state of the data sparsifier object contains the name->mask mapping, name -> config mapping and the state_dict() of the container.
The load_state_dict() and __set_state__() automatically creates a container moduie and loads the named data internally without having the user to intervene.
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79427
Approved by: https://github.com/z-a-f
The users can now pass in raw torch tensors and the base class handles all the parametrizations and masking
Example -
>>> data_list = [('tensor_1', torch.randn(3,3)), ('tensor_2', torch.randn(4,4))]
>>> defaults = {'sparsity_level': 0.7}
>>> sparsifier = DerivedDataSparsifier(data_list = data_list, **defaults) # Some sparsifier that inherits BaseDataSparsifier
>>> new_tensor_to_add = {'name': 'tensor_3', 'data': torch.randn(5,5), 'sparsity_level': 0.3}
>>> sparsifier.add_data(**new_tensor_to_add)
>>> # tensor_1 and tensor_2 will have sparsity_level of 0.7 but tensor_3 will have sparsity_level=0.3
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79252
Approved by: https://github.com/HDCharles, https://github.com/z-a-f
Base Data Sparsifier class for all Data sparsifiers.
The abstract class accepts raw torch tensors / embedding / embedding bags (refer to SUPPORTED_TYPES above)
to prepare for sparsification.
In this case, mask (and parametrizations) is owned by the class and not by the user.
Specifically, the container object inside the class maintains the mask and parametrizations of the input data
Test Plan:
```python test/test_ao_sparsity.py TestBaseDataSparsifier```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79251
Approved by: https://github.com/z-a-f, https://github.com/HDCharles
Summary: This follows https://github.com/pytorch/pytorch/pull/78452,
which replaced the qconfig_dict with QConfigMapping. This PR
additionally replaces get_default_*qconfig_dict with
get_default_*qconfig_mapping. For backward compatibility, we
deprecate the old functions instead of removing them.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618
Approved by: https://github.com/jerryzh168
Summary: The _detect_dynamic_vs_static function was added to take in a
prepared fx graph model that already had ModelReportObservers built into
it and uses the collected information to determine whether input and
output are stationary or non-stationary and provides feedback on whether
to make linear modules static or dynamic based on this information.
This PR will be followed up soon with another PR that will more
rigoursly test the whole end to end performance of this system, which is
primarily how the function in this PR will be tested for functionality,
which is why this one only has 1 test.
Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79326
Approved by: https://github.com/HDCharles
Summary: The purpose of this is to add to the module report functioality
by creating an observer that will take a prepared fx module and suggest
whether static or dynamic quantization is more appropriate. The tests
for this have been written and included in the location indicated by the
Test Plan
Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportObserver
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79243
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
Summary: This code is meant to be a tool to help people get the most out
of their backend by hinting them to use per_channel quantization if it's
supported, which will help increase accuracy significantly. The code is
completed and ready to be reviewed.
Test Plan: test/quantization/fx/test_model_report_fx.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79104
Approved by: https://github.com/HDCharles
Summary:
The fbgemm and qnnpack backends mostly support ops with quint8 activations.
Historically, the default backend config has included ops with fp16 activations
for other backends. This PR keeps the old config under a different name to keep
the functionality tested, and makes the default config match fbgemm/qnnpack ops.
Test plan:
```
python test/test_quantization.py -k TestQuantizeFx
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78528
Approved by: https://github.com/andrewor14
This sparsifier creates a nearly diagonal mask to be applied to the weight matrix.
Nearly Diagonal Matrix is a matrix that contains non-zero elements near the diagonal and the rest are zero.
An example of a nearly diagonal matrix with degree (or nearliness) 3 and 5 are follows respectively.
1 1 0 0 1 1 1 0
1 1 1 0 1 1 1 1
0 1 1 1 1 1 1 1
0 0 1 1 0 1 1 1
Note that a nearly diagonal matrix with degree 1 is just a matrix with main diagonal populated
This sparsifier is controlled by one variable:
1. `nearliness` defines the number of non-zero diagonal lines that are closest to the main diagonal.
Currently - supports only odd number
Note:
This can be accelerated (vectorized) once the Spdiagonal feature (PR: #78439) is landed or the banded matrix
feature is landed: https://stackoverflow.com/questions/52463972/generating-banded-matrices-using-numpy
Test Plan:
```
python test/test_ao_sparsity.py TestNearlyDiagonalSparsifier
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78448
Approved by: https://github.com/z-a-f, https://github.com/HDCharles
Summary:
Some helper functions that generate operator configs based on dtype_configs are reused in native backend and tensorrt, so we
factor out this part to a util file: common_operator_configs.py
Test Plan: buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt
Differential Revision: D36728359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78407
Approved by: https://github.com/vkuzo, https://github.com/andrewor14
Summary: https://github.com/pytorch/pytorch/pull/78452 replaced
qconfig_dict with QConfigMapping as the default API for prepare_fx,
prepare_qat_fx, and convert_fx. We should update the docs to reflect
this change as well.
Test Plan:
```
cd docs
make html
cd build/html
python -m server.http
```
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78533
Approved by: https://github.com/vkuzo
**Summary:** Previously, FX graph mode quantization configurations
were specified through a dictionary of qconfigs. However, this
API was not in line with other core APIs in PyTorch. This commit
replaces this dictionary with a config object that users will
create and pass to prepare and convert. This leads to better
type safety and better user experience in notebook settings
due to improved auto completion.
The new API is as follows:
```
from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.quantize_fx import prepare_fx
qconfig_mapping = QConfigMapping()
.set_global(qconfig)
.set_object_type(torch.nn.Linear, qconfig)
.set_module_name_regex("foo.*bar", qconfig)
.set_module_name("mod", qconfig)
prepare_fx(model, qconfig_mapping)
```
For backwards compatibility, `prepare_fx`, `prepare_qat_fx`,
and `convert_fx` will continue to accept qconfig_dicts, which
will be converted to QuantizationConfigs internally.
Note that this commit does not modify existing tests to use the
new API; they will continue to pass in qconfig_dict as before,
which still works but triggers a deprecation warning. This will
be handled in a future commit.
**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
**Reviewers:** jerryzh168, vkuzo
**Subscribers:** jerryzh168, vkuzo
Differential Revision: D36747998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452
Approved by: https://github.com/jerryzh168
Summary:
att, currently it errors out with the following error:
```
---> 72 dummy_weight = trt.Weights(weight_shape)
73 layer = network.add_convolution_nd(
74 input=input_val,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. tensorrt.tensorrt.Weights(type: tensorrt.tensorrt.DataType = <DataType.FLOAT: 0>)
2. tensorrt.tensorrt.Weights(a: numpy.ndarray)
```
full error: https://www.internalfb.com/phabricator/paste/view/P503598381
we need to pass arond a numpy ndarray instead of a shape here.
and support conv1d in backend_config_dict for tensorrt
Test Plan:
```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/converters:test_convolution
```
```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt
```
Differential Revision: D36721313
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78402
Approved by: https://github.com/842974287
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286
Approved by: https://github.com/dzdang
Summary:
Removes the restriction from NS for FX on handling nodes which have
no positional arguments, such as `F.linear(input=x, weight=w, bias=b).
In order to achieve this, we delete all places in the code which
were doing things like
```
node.args[0]
```
And replace them with
```
_get_normalized_nth_input(node, gm, 0)
```
The `_get_normalized_nth_input` function is a best effort way to
get the n'th normalized input.
This is needed because some FX tools output nodes normalized to
be kwargs only, and we need to be able to handle this in NS.
Test plan:
```
python test/test_quantization.py -k test_linear_kwargs_shadow
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78181
Approved by: https://github.com/z-a-f, https://github.com/hx89
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146
Approved by: https://github.com/vkuzo
Summary:
This improves the documentation page for backend_config_dict to render
the configurations in a human readable format, such as
```
{
'pattern': torch.nn.modules.pooling.AdaptiveAvgPool1d,
'dtype_configs': [
{
'input_dtype': torch.quint8,
'output_dtype': torch.quint8,
},
{
'input_dtype': torch.float16,
'weight_dtype': torch.float16,
'bias_dtype': torch.float16,
'output_dtype': torch.float16,
},
],
'observation_type': ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT,
},
```
The results are also now sorted alphabetically by the normalized name of
the root op in the pattern.
A couple of utility functions are created to help with this. If in the future
we convert backend_config_dict to use typed objects, we can move this logic
to the objects at that time.
Test plan:
```
cd docs
make html
cd build
python -m server.http
// renders correctly, example: https://gist.github.com/vkuzo/76adfc7c89e119c59813a733fa2cd56f
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77535
Approved by: https://github.com/andrewor14
Summary:
These mappings are already defined for `BatchNorm{n}d` as the root
node, we don't need to specify them again. Removing to clean
up the code.
Test plan:
```
python test/test_quantization.py -k FXNumericSuite
python test/test_quantization.py -k FXGraphMatcher
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76993
Approved by: https://github.com/jerryzh168
Summary:
GroupNorm quantization is defined but it looks like FX graph
mode quantization does not have it enabled.
Removing it from NS for FX.
Test plan:
```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76991
Approved by: https://github.com/jerryzh168
Summary:
More cleanups in mappings:
1. makes the `nnqatd.Linear` entry be looked up dynamically
2. moves the `NonDynamicallyQuantizableLinear` down and marks it as edge case
Test plan:
```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76990
Approved by: https://github.com/jerryzh168
Summary:
Instead of hardcoding the relationship between `F.dropout` and `toq.dropout`,
read it from the mapping.
The mapping itself might need to be in the lowering file, but that's a separate
issue.
Test plan:
```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76989
Approved by: https://github.com/jerryzh168
Summary:
FX graph mode quantization no longer uses `torch.ops.quantized.cat`,
instead `torch.cat` can use quantized inputs.
This PR removes the outdated mapping from NS for FX.
Test plan:
```
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76988
Approved by: https://github.com/jerryzh168
Summary:
Fixes a couple of problems with `ConvTranspose` in NS mappings:
1. deletes the dynamic versions, as they do not work yet
2. deletes `ConvTranspose3d`, as it's not swapped yet in the quantization workflow
3. removes a duplicate set
Test plan:
```
python test/test_quantization.py -k FXGraphMatcher
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76980
Approved by: https://github.com/jerryzh168
Summary:
NS for FX mappings were originally hardcoded, because quantization op
mappings were not easily reusable. Now that we have `backend_config_dict`,
we can start moving NS for FX to use them and delete the hardcoded mappings.
This PR deletes the hardcoded mappings from NS about the lowering step,
and instead reads them from the lowering configs.
Note: for now, there is no way to configure the tool to use lowering
configs from a different lowering pass. That may be added at some
future point, but it's not important now.
Test plan:
```
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76978
Approved by: https://github.com/jerryzh168
Summary:
NS for FX mappings were originally hardcoded, because quantization op
mappings were not easily reusable. Now that we have `backend_config_dict`,
we can start moving NS for FX to use them and delete the hardcoded mappings.
This first PR deletes the hardcoded mappings for `nni` and `nniqat` modules,
and instead reads these mappings from `backend_config_dict`.
Future PRs will incrementally move more of the mappings over.
Test plan:
```
python test/test_quantization.py -k FXNumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76958
Approved by: https://github.com/jerryzh168
Summary:
FakeQuantize class has quant_min/quant_max and activation_post_process
attributes, the latter of which already includes quant_min/max. As such,
we can remove quant_min/quant_max from FakeQuantize and use
FakeQuantize.activation_post_process.quant_m* directly.
Test plan:
```
python test/test_quantization.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76674
Approved by: https://github.com/vkuzo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76663
Subgraph copy does not handle all edge cases. It's high eng time
to handle them all, and currently an unhandled edge case crashes
the script.
This PR adds a function to check if the subgraph copy is supported,
and skips shadowing if it is not supported. This way the model
can still go through the shadowing APIs without an exception.
Test Plan:
```
python test/test_quantization.py -k FXNumericSuite
```
Reviewed By: hx89
Differential Revision: D36069304
Pulled By: vkuzo
fbshipit-source-id: 6b38b8d8e43396a4cf2373b247223a19d451d096
(cherry picked from commit e2322ca0635c51a4701e60fa90f77915a3c46d0f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76561
User model had syntax like `torch.cat(tensors=[x])`. This PR fixes two errors
to unbreak this in NS shadow model:
1. skip nodes which only have kwargs (instead of throwing an exception)
2. explicitly skip shadowing of `torch.cat` (since it's not supported anyways)
Test Plan:
```
python test/test_quantization.py -k test_op_with_only_kwargs_skips_shadowing
python test/test_quantization.py -k test_op_mul_add_cat_skips_shadowing
```
Reviewed By: hx89
Differential Revision: D36017356
Pulled By: vkuzo
fbshipit-source-id: 0da4840a62c2dac183f8294c2cec4fce262474b3
(cherry picked from commit 88409c1576e7f690708957b2baa285fc7961e9d6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637
The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers
The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`
Test Plan:
```
python test/test_quantization.py
```
```
python test/test_quantization.py
```
Differential Revision:
D36054169
D36054169
Reviewed By: vkuzo
Pulled By: dzdang
fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76504
Shadowing for add and mul is not implemented, this PR fixes the skipping
logic to also skip the `operator.add` and `operator.mul` flavor of these
operators.
Test Plan:
```
python test/test_quantization.py -k test_mul_add_skips_shadowing
```
Reviewed By: dzdang
Differential Revision: D35985997
Pulled By: vkuzo
fbshipit-source-id: f832e54a5461d3b182df4bb905357d6c66742e98
(cherry picked from commit 93ae9592f68873865ebfdc438bffb1c9486dd1c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76468
This makes the error message when copying an unsupported node more verbose.
This is useful to debug where specifically in a user model this is failing.
Test Plan:
1. hardcode this condition to hit
2. run NS tests
3. verify the exception now prints details about the offending node
Reviewed By: jerryzh168
Differential Revision: D35978652
Pulled By: vkuzo
fbshipit-source-id: 9cc93dfa46469bf6ef60aa38d4011041b6709df9
(cherry picked from commit c6e382c2a69aba6ba66740f238bc14446521a433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76461
Renaming as the old name was confusing. The name represents
better what this class is doing.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976350
Pulled By: vkuzo
fbshipit-source-id: 6da6c1767cec729c3959b13ae9dd939d0b2f622c
(cherry picked from commit 065608ef42c599525bfad4603af74c5bdf0881c3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76460
`RecordingObserver` inherits from `_ObserverBase` but does not use any functionality
from it. Making it inherit from `ObserverBase` instead.
This will make it simpler to rename `_ObserverBase` to something more meaningful in the next PR.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D35976351
Pulled By: vkuzo
fbshipit-source-id: 19c106bf0d48607c231702e2e048f42a7f48a5c6
(cherry picked from commit 4fd44123b0e9bcdcae546aecabe80d7642129cf5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76414
Previously we refactored FX Graph Mode Quantization code base to use a native backend config dict for fbgemmq/qnnpack,
because of this, we need to defien the backend config dict for tensorrt properly as well (previously it was relying on
fbgemm/qnnpack configs), this PR added some configs to enable uru10x10 again
Test Plan: buck run mode/dev-nosan -c fbcode.split-dwarf=true -c fbcode.platform=platform009 accelerators/workloads/models/uru10x10:uru_10x10_to_trt_eval -- --int8
Reviewed By: vkuzo
Differential Revision: D35939944
fbshipit-source-id: c64ade5074f5a8ee74a833bb990cd7a91c2cb152
(cherry picked from commit 02855a5ef8c196fb5b0defdfff58d6f2b94c693e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76148
X-link: https://github.com/pytorch/fx2trt/pull/60
Recently, we landed some PRs to enable backend_config_dict by default in the quantization codebase, we also changes
the config to include "fused_module" for a pattern, but we didn't update tensorrt backend config dict,
this PR adds that configuration. Also adds the config for binary ops in TensorRT, since it was relying on fbgemm backend
config dict previously
Test Plan: Facebook internal tests
Reviewed By: andrewor14, frankgt40
Differential Revision: D35789709
fbshipit-source-id: 9dc93b9f454eff6baefb38c4c1567f88da2a1506
(cherry picked from commit 7d30e5ecbfd096c32cdb1b68abde394bcba45f94)
Summary:
Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented
the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different
workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that
it can be shared by different workflows
Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py)
Test Plan:
python test/teset_quantization.py TestQuantizeFx
python test/teset_quantization.py TestQuantizeFxOps
python test/teset_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestAOMigrationQuantization
python test/test_quantization.py TestAOMigrationQuantizationFx
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823
Approved by: https://github.com/vkuzo
Summary: Calling `prepare_fx` with `get_default_qconfig_dict`
failed for models with fused modules, such as `ConvReLU2d`.
This commit fixes this by adding qconfig entries for ReLU
and BatchNorm as well.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_dict_with_fused_modules
Reviewers: jerryzh168
Subscribers: jerryzh168, vkuzo
Issue: https://github.com/pytorch/pytorch/issues/75825
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75838
Approved by: https://github.com/jerryzh168
Summary:
The tests were disabled by https://github.com/pytorch/pytorch/pull/61687, but
this specific behavior broke some time after while these tests were disabled.
The issue was that:
1. `torch.add` is present in these models
2. In the common codepath of comparing fp32 to int8, torch.ops.quantized.add was already filtered out because it did not have a dtype specified
3. In the less common codepath of comparing fp32 to fp32, torch.add was eligible for shadowing, but the logic was broken
This PR fixes (3) by disabling shadowing on ops which do not support it, by op type.
The support may be built later, if needed.
Test plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_mobilenet_v2
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75472
Approved by: https://github.com/jerryzh168
Summary:
In https://github.com/pytorch/pytorch/pull/61687, a couple of FX Numeric Suite
tests were disabled.
This PR reenables one of these tests. We update the dtype inference logic
of NS to always return a specific type instead of sometimes returning
"fp32 or int8". When the type cannot be deduced by the current logic,
we do not shadow the node.
As a better version of dtype inference becomes available in FX Graph Mode Quantization,
we could migrate this code to use it.
Future PRs in the stack will unbreak other things to enable NS for FX to
work on torchvision again.
Test plan:
```
python test/test_quantization.py -k NumericSuite
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75471
Approved by: https://github.com/jerryzh168
Summary:
Currently in `maybe_make_input_output_share_observers` we trace back from a node to find the activation_post_process
of the input node, we have internal use case which would error out during tracing back, this PR is adding a guard
during this process to return False early when the node doesn't have any input
Test Plan:
not sure when this would happen, verify within the internal test case
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75650
Approved by: https://github.com/vkuzo
Summary:
Previously the list of qat modules, fused modules etc. are hardcoded in the convert code, in this PR we get these information
from backend_config_dict instead
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75520
Approved by: https://github.com/vkuzo
Summary:
Previously we are still relying on the registration mechnism and get the default quantize handlers that are registered,
now we have moved all registration to backend_config_dict we can get all quant patterns just from backend_config_dict now.
This PR enables using native backend_config_dict everywhere in prepare when the backend_config_dict is None, we'll also
do similar changes in convert as well
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75469
Approved by: https://github.com/vkuzo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75389
This seems to be removed before, so won't mark this PR as bc-breaking, this use case
is now enabled with backend_config_dict api
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35451960
fbshipit-source-id: 21a8f19c1968af44bf4fa603f16ee8c6f5080e5a
(cherry picked from commit 2862f17b57f846b55736bc6b5d10df4256567adf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75388
This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35451958
fbshipit-source-id: 86e482d0782470ea02408836755cfc8531b8f66e
(cherry picked from commit 072541824b454e30df2b48758f465ebd814b436e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75387
This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35451955
fbshipit-source-id: 77ede61f1d8f169dc1e1e6d847244ba99a97ab76
(cherry picked from commit 953576259fdc8827437acb6f5d04e584e37a7d64)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75386
This is now replaced with backend_config_dict, we don't want to expose the implementation detail to
users. We'll have docs for backend_config_dict later
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: ezyang
Differential Revision: D35451957
fbshipit-source-id: 52ebb5fb20cd96c1f21410b07c3d0c448c58cdba
(cherry picked from commit ccb38026f14644f9eb43335b7a7de5568c556394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75378
Previously we are still relying on the registration mechnism and get the default fusion patterns that are registered,
now we have moved all registration to backend_config_dict we can get all fusion patterns just from backend_config_dict now.
This PR enables using native backend_config_dict everywhere in fusion when the backend_config_dict is None,
we'll do similar changes for prepare and convert in the future, to fully enable backend_config_dict in
quantization code base.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35451962
fbshipit-source-id: 31d51850c669e061b67d6d9e0efec994f7ea79ed
(cherry picked from commit 60cc2dcadce705a923f9279465e3fb0e8fddad48)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75377
This is in `prepare_custom_config_dict` but we never talked about them before, and we didn't find use cases internally
So it should be OK to remove.
We can now serve the same use case with `backend_config_dict` api
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35451961
fbshipit-source-id: 8a44c4518eecd50fab7ea2ff06697527b1cdb049
(cherry picked from commit 964183ed26bd8f367a4cf7fcc991eb519dc31a58)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75318
This PR moves the registrations for fusion patterns to backend_config_dict
Also fixed one issue in numeric suite graph matcher, since now (torch.nn.ReLU, torch.nn.BatchNorm3d)
would appear in quant patterns, (previously only in fusion pattern), and we need to match sure (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d))
can match before (torch.nn.ReLU, torch.nn.BatchNorm3d), but previously, it looks like (torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d)) is not
really matched since `end_node_matches_reversed_fusion` is expecting a flattened pattern like (torch.nn.ReLU, torch.nn.BatchNorm3d, torch.nn.Conv3d),
for now we'll manually flatten this pattern, but in the future I think we might want to use the matching function `is_match` under torch.ao.quantization.fx.match_utils
to do this matching.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: vkuzo, andrewor14
Differential Revision: D35423788
fbshipit-source-id: a54093ccebae9c59aeee9399669ddb2c48bfb9aa
(cherry picked from commit 6a55ea8eb2740cedafb9972888fedf68e927586d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75401
This commit removes asserts that require prepare_fx to
be run in eval mode and prepare_qat_fx to be run in training mode.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_prepare_mode
Imported from OSS
Reviewed By: vkuzo, jerryzh168
Differential Revision: D35457100
fbshipit-source-id: 13a55b13d9e389991f69c06c6a70bc51cdebba36
(cherry picked from commit fb0685e0873dc8e807da3213be403b51e8b4a687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75317
att, similar to previous PRs, this one moves dynamically quantized rnn ops
to backend_config_dict
we have some temporary configs in backend_config_dict, but it will be removed soon, we want to migrate
everything to backend_config_dict so that we can enable this path for all the code in the code base, starting
from prepare, then to convert. We can start this process after this PR
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35423789
fbshipit-source-id: 9391bde6f4cbceb45de4ce9aaee136c9bfde8ab7
(cherry picked from commit 909edb9f131e9ba047b49d51a6c300da77988cb3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75316
att, similar to previous PRs, this one moves dynamically quantized rnn ops
to backend_config_dict
Currently the dtype check is not yet enabled, so we provided the dtype_configs but it is not really used yet,
we will enable it a bit later after we moved everything to backend_config_dict
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: malfet
Differential Revision: D35423792
fbshipit-source-id: ef862ea1be5bfb4c28130775c3b2158df28d3e22
(cherry picked from commit 0247f3a768a2c165f482a66c4225b3357e33e966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75460
add checking for number of args checking observer in same graph
Test Plan:
python3 test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: malfet
Differential Revision: D35479504
fbshipit-source-id: d7dc38a27fdf8e0b236b6976d484b0701c61184c
(cherry picked from commit 45542f796f5e6f6259f3ec647dbd2a9fa69ababc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74847
Similar to the other PRs in this stack, the main problem was
that fusion needed to detect the original module type of parametrized
module when sparse prepare was called before fusion. In addition, there
was a potential issue with fusion before sparse_prepare but after the
sparse_config is created. However, in practice fusion moves the
references to the original modules into the fused module without issue.
Thus the original sparse_config that pointed to the original modules
gets automatically updated. If the fusion method changes this may cause
an issue since no explicit handling or updating of these pointers was
needed.
Test Plan:
python test/test_ao_sparsity.py TestComposability
Imported from OSS
Reviewed By: vkuzo, andrewor14, jerryzh168
Differential Revision: D35240273
fbshipit-source-id: 62ed66689b285c3fa68f1e149266ab877f1cdd8e
(cherry picked from commit 2adb002c43f702fa1f18637157264fcbc545002a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75314
this is a refactor to use backend_config_dict for operators with fixed quantization parameters
api is not final yet, we'll update the api after we moved everything to backend_config_dict
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35423790
fbshipit-source-id: a69ce19340e2e3c996f1435b887ba122de85f22f
(cherry picked from commit 5d35983a3bac4281f8636f69ffb68adb358e9a5f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75258
att, the remaining registrations are for fp16 ops which are no longer used
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D35403588
fbshipit-source-id: fc328d42f4cb80901ed545a11fdde49ee7ff8b2e
(cherry picked from commit fbe2db090cf8d1221dd37d19636058d8dd44c728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75241
We have a previous PR that enabled operator.add in backend_config_dict, this
PR moved the rest binary ops to backend_config_dict.
There are some ops left, which are not needed (previously fp16 ops), we
will move them in the following PR
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D35403589
fbshipit-source-id: 663703b310944a6b7c5ade6d07a4d938a6ca082b
(cherry picked from commit 5a76ce031872c4fed5fcab5bb3c84a9394b01118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75202
Instead of checking the type we use a method in the QuantizeHandler to check if a module
is a standalone or custom module, not user facing
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D35379641
fbshipit-source-id: c2f970c7e27f74793fa67f8fd5a16a43525e35aa
(cherry picked from commit 251500f06359c9046dd9067543cc80be24ddee33)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75135
Some operators have fixed quantization parameters, this PR adds the support to override the
qconfig in the backend_config_dict
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35334279
fbshipit-source-id: 390510bd8fc2d61004c36c54390989583e6519ce
(cherry picked from commit ccf9bcd7eb4564ec97c5e0548b8ee926f640360b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74882
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo, andrewor14
Differential Revision: D35236032
fbshipit-source-id: 7077f3ccee8a5d8d19b40107cf8ff16cceafc535
(cherry picked from commit a6f7a37f99fc727269d022d35cc5c0157b70c656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74846
This PR primarily allows the PTQ convert function to work with
parametrized modules. Given that the parametrized weight is what is used
by default in convert, as long as sparsifier.step() has already been
called, the converted model will use the sparisified weights. There is
currently no way to handle things if sparsifier.step() has not been
called. Lastly, added the is_leaf_or_only_parametrized function because
parametrized modules no longer look like leaves due to the
parametrizations module attached to them
Test Plan:
python test/test_ao_sparsity.py TestComposability
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35240275
fbshipit-source-id: 48529f2a83edfe6d8a2d2dff8ca3d08a3fb0d553
(cherry picked from commit 9d6361482e2885db964e02b0222cd23c9f4d469e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74878
Previously we record the matched node as a list of nodes: `List[Node]`, this does not generalize
to a graph, which is needed for future use cases, in this PR we changed the recorded node as
NodePattern instead, currently defined as
```
NodePattern = Union[Tuple[Node, Node], Tuple[Node, Tuple[Node, Node]], Any]
```
but can be more general.
This will allow us to support more general patterns with backend_config_dict api, and is also needed
for BinaryOpQuantizeHandler refactor
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35203616
fbshipit-source-id: f4bf5b056cfc0955455eea9c2bf1ac9f6dde3974
(cherry picked from commit b290c047e1861bbb62fb1bb576761e801b210220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74845
This PR adds support for quantization flow to detect
parametrized modules and match them using their original module types.
This mainly involved using the new type_before_parametrizations function rather than
type to check for module mathcing
Test Plan:
python test/test_ao_sparsity.py TestComposability
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D35240274
fbshipit-source-id: 7294d89c9c2e069e51d8b9bafa45c15f92bed124
(cherry picked from commit ed5cdb7b636c42e040d1b4a67b6b94604d06e1ff)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75146
Previously we assume `to` must be called with positioanl args, but this may not be the case,
e.g. we can do `to(dtype=?)` or `to(memory_format=?)`
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: ejguan
Differential Revision: D35342088
fbshipit-source-id: 22bfe78ae84e74141ae6560285c5c38bc068c999
(cherry picked from commit a3593c0bb658a4615559c951ee68c9a6f55074d5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74636
This commit changes how quantization patterns for linear
and conv are set up in prepare. Previously, these were set up
through ConvReluQuantizeHandler and LinearReLUQuantizeHandler.
After this commit, however, these were set up through the
corresponding entries in the native backend_config_dict,
rendering the above quantize handlers no longer necessary.
In future commits, we will do the same for the remaining ops.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: jerryzh168, ngimel
Differential Revision: D35225680
fbshipit-source-id: 4a79f63a11fce46701eb17aaf3619c1e827d72a4
(cherry picked from commit 475f599821cd32d3ba71ba086885ecdc4cbee755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74843
is_output_quantized is used to check if we should quantize the op based on the dtype configuration in qconfig and what
is supported by the backend, we'll skip inserting observer if the dtype configuration is not supported by the backend,
this is now supported by backend_config_dict, and we can remove this function now.
Also we previously supported fp16 static quantization for some ops for one of our internal use case, and now it is not required, so
we can remove them
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D35190541
fbshipit-source-id: 623d961810737ec01e1f8b269ec48a6a99bb284a
(cherry picked from commit a405998c60c0146dbd5feef60e2d5cb3b0aa289c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75047
As title, For instance,
We match two patterns
```
(add, (bn, conv), matchallnode)
(add, matchallnode, (bn, conv))
```
Against the model
```
conv1 -> bn1 |
conv2 -> bn2 + add
```
For the add node, both two patterns passes `is_match` and `apply_match` is executed twice. As a result, both `conv1 -> bn1` and `conv2 -> bn2` will be matched as `(bn, conv)` instead of one `(bn, conv)` one `matchallnode`.
To fix this, stop trying all the other pattners once a pattern is matched.
Test Plan: verified in D35252100
Reviewed By: jerryzh168
Differential Revision: D35300191
fbshipit-source-id: 383b2eb971d436072e1c28597c5b6a01d0f49c5a
(cherry picked from commit 89d08ea2d2840e01ec3dd40da3f58405577c78fc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74717
currently the weight map to 0 and max_float to 65535 due to incorrect qmin/qmax in qin16 customized qrange
the expectation from the set observers is the integer representation is supposed to be a signed int16 i.e -32768 to 32767.
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D35129924
fbshipit-source-id: 924902dd7e64c1218971422ba2451c2a484fd2f4
(cherry picked from commit 95659cdeeec7b3a01a64355244847e211c6dd2a6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74149
mobilenet v2/v3 failed when using ns tool to analysis the model
due to the empty the tensor, fixed it by filtering the empty tensor
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34851886
fbshipit-source-id: db94fd5cef7d4a7a128d46bfe3f5ff4e532845fe
(cherry picked from commit 4616a75105abf187a178d95165249cd33345515d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74969
We can remove the check for fp16 ops now since we confirmed that fp16 ops are not
used
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D35258695
fbshipit-source-id: 2297696493feb62a4c959e7fbdd6123f59615ef1
(cherry picked from commit a1b4658e661ce610e264e083dfa738c31859ec1a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74277
see issue: https://github.com/pytorch/pytorch/issues/74240
this fixes that issue by skipping the children of untraceable modules during
propagate_qconfig. This required extending said function to take the
prepare_custom_config_dict as an optional argument.
Test Plan:
python test/test_quantization.py
python test/test_quantization.py TestQuantizeFx.test_qat_skip_untraced
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34916074
fbshipit-source-id: 11caba2cbf78566fb51adf698b01bbba0275de28
(cherry picked from commit 5324c48e4c3277bb12a716a4408151c86006ee47)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74776
when both inputs are scalars, fx tracing will directly calculate the result, instead of generating an op in the fx graph
so num_tensor_args will always be greater than 1 for binary ops, so the input_output_observed will always return True
for BinaryQuantizeHandler
We will remove input_output_observed method after dynamic quantization in qconfig is properly supported
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: albanD
Differential Revision: D35153531
fbshipit-source-id: fa777429eeb64a6a78a98f8d8dcd9e0903c8b209
(cherry picked from commit 676becb650daf29977dbfeb8307de1b19a8d9243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74775
We have simplified the way we insert observers, for add_scalar it now behaves the same way
as general_tensor_value ops, which means we only need to keep is_general_tensor_value_op now,
the other methods can be removed
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D35153532
fbshipit-source-id: 2d17189e167a9932bdbf5ae46b3ced25b7128c2f
(cherry picked from commit 7cf7c8a522171f58954b227917e5c75cdfdddb1c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74619
This commit is part 2 of the effort to refactor the
lowering code in _lower_to_native_backend.py. The main change
included in this commit is generalizing the pattern matching
code across different lowering functions. There should be no
change in behavior with this PR.
A future commit will further merge the static and dynamic
lowering code paths.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D35082210
fbshipit-source-id: 7f0347c9449cc9ca68fee5a807c792222f0d1749
(cherry picked from commit 16d34c13c7eb0553680713878b52ece9c8884a1f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74601
Currently the behavior for general tensor shape op and general tensor value op are the same, so we can remove
this flag and merge with the is_general_tensor_value_op flag.
is_general_tensor_value_op flag is used in two places in prepare:
(1). dtype propgation: we only do dtype propgation when this flag is true (this will be refactor in the future to be more systematic)
(2). observer sharing, we'll use the input observer instance as output observer for an op if this flag is True
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: george-qi
Differential Revision: D35071438
fbshipit-source-id: 5e8f5fd84e37db0433a63fe0a0e212ce3c5908d6
(cherry picked from commit b4bbc9fa0e65f3768eb97ca8e84b7cbd7e840b67)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74600
Following https://github.com/pytorch/pytorch/pull/74210, this PR adds the support for some ops
using the DefaultNodeQuantizeHandler in the backend_config_dict defintion for pytorch native backend
TODO: There is still a few ops we didn't handle with backend_config_dict path: gelu and softmax, need to discuss if we still need them, if so we can change the test
to use backend_config_dict and remove the DefaultNodeQuantizeHandler after that
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D35071437
fbshipit-source-id: 70351d2810ca1ac7dc09d4a9c239f6757ccb51ca
(cherry picked from commit 5e68f755a32ba7d90d6c73db9c2017f9c58d7fa5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74581
As title, currently the quant_min/quant_max of the FakeQuantize are not populated to the observer. We plan to populate when they are both not None.
To do this we need to do
1. Remove the current default quant_min/quant_max value (0/255) as it's not universal for various dtype.
2. Move the upper bound/lower bound check before creating the observer.
Test Plan:
```
[jiaxuzhu@devvm3400.frc0 /data/users/jiaxuzhu/fbsource/fbcode] buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize)'
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 9.5 sec (100%) 18535/84579 jobs, 2/84579 updated
Total time: 10.3 sec
More details at https://www.internalfb.com/intern/buck/build/1cab97ef-0788-4d06-92ed-a828995e3bde
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 24be645e-eebc-45d6-8111-052ef1225fa0
Trace available for this run at /tmp/tpx-20220323-094106.724238-24be645e-eebc-45d6-8111-052ef1225fa0/trace.log
RemoteExecution session id: reSessionID-24be645e-eebc-45d6-8111-052ef1225fa0-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
✓ ListingSuccess: caffe2/test:quantization : 483 tests discovered (20.179)
✓ Pass: caffe2/test:quantization - test_quant_min_max_override (quantization.core.test_workflow_module.TestFakeQuantize) (18.896)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549674998735
```
Reviewed By: jerryzh168
Differential Revision: D34971236
fbshipit-source-id: 4407fd03116a296053256b333f7ce6d28dcc9c42
(cherry picked from commit f6980bccea802f220cc5b6dfe1bf3a3a3eef0a34)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507
* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.
Reviewed By: jerryzh168
Differential Revision: D34804808
fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74510
Previously we require the dequantize before custom module to have one user, this is because we are removing the dequantize node
before custom module while we transform an observed custom module to a quantized custom module, but actually we don't need to remove it,
we can just change the input of custom module with quantize node instead. If the dequantize node only has one user, it will be removed
by the dead code elimination pass that was added recently.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_custom_module_class_input_has_multiple_users
Imported from OSS
Reviewed By: dzdang
Differential Revision: D35034626
fbshipit-source-id: eea9fbf9fb34c61f114c6431377be347632ce36d
(cherry picked from commit 2878085a56bc529afef5e533bc5f49079d4adc52)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74408
Removed
* should_mark_output_quantized_from_input_quantized_status
* _maybe_get_last_node_only_observer
since they are used in the previous convert code, which arep no logner needed
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34984301
fbshipit-source-id: 0c46126576bd4ef633f4de530d01364e68f7ed39
(cherry picked from commit d14d094c4de308f08181920cd0611ea1bc664605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74407
The convert method of QuantizeHandler is no longer used after the convert refactor, this PR removes them
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34983830
fbshipit-source-id: cf9a6a19bd0ae035ba33497eecf74e98658dd5c7
(cherry picked from commit d85eb0f77513ef5f5f10543df6dec8b65b4985a3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74364
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Imported from OSS
Reviewed By: yixin94
Differential Revision: D34952755
fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3
(cherry picked from commit 8a6896801fdd96a55476faca4ccb7ba0b0bdb058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74231
Add a check to make sure the weighted modules we swap is actually a float fused module,
since the reference fused module like reference version of linear - relu would have the same
fused type as the floating point linear - relu (and the linear submodule will have different types)
Test Plan: phabricator diff for now, can add a test case after we know exactly what the problem is
Reviewed By: andrewor14
Differential Revision: D34888290
fbshipit-source-id: a7f53368a7c17f7d1a82afaa50d14d569b4923df
(cherry picked from commit 458dac9fdf8b4f0d786bf9c815c2f2fe8df13bb4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74396
# New qconfig `default_symmetric_qnnpack_qconfig`
Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.
## Restrictions on weights
Restrictions on weights include,
1. weight zero point is force zero. and
2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.
This is driven, in part, by the desire to achieve better performance by XNNPACK ops.
## qengine/backend = `qnnpack` and XNNPACK ops
Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.
## Updated EPS value:
* From PyTorch:
eps:
```
>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'
```
All scale values are float32 and `scale = max(scale, eps)`
* Requirement from XNNPACK
For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0`
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)
* New minimum allowed scale value
With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,
```
minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12
```
With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.
Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically.
* Impact on accuracy is unclear as of writing this.
Reviewed By: kimishpatel
Differential Revision: D34625300
fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
(cherry picked from commit 61ed1a2a308a1792ccbfc316153a6dc39798f02a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74198
As title, currently in the (add, X, MatchAllNode) pattnern, the node matched with MatchAllNode is regard as part of the pattern instead of the input. As a result, the possible patterns ends with that node will not be matched.
For instance, we have two patterns
1. (nn.ReLU, (torch.add, MatchAllNode, (nn.BatchNorm2d, nn.Conv2d)))
2. (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))
And we wanna fuse the following model
Conv2d -> BatchNorm2d -> ReLU +
Conv2d -> BatchNorm2d ------ Add -> ReLU
The pattern in the first row cannot be matched becaues the end node ReLU is recorded as MatchAllNode already.
Test Plan:
new unit test
```
[jiaxuzhu@devvm3400.frc0 /data/users/jiaxuzhu/fbsource/fbcode] buck test mode/dev //caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_fusion_pattern_with_matchallnode (quantization.fx.test_quantize_fx.TestFuseFx)'
Parsing buck files: finished in 0.9 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 12.6 sec (100%) 18546/84011 jobs, 2/84011 updated
Total time: 13.5 sec
More details at https://www.internalfb.com/intern/buck/build/9d2decdb-d01e-4332-84f5-1728a65d4f7b
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: d92e10b8-9209-4e9e-95a6-2fcac02db251
Trace available for this run at /tmp/tpx-20220314-161230.347672-d92e10b8-9209-4e9e-95a6-2fcac02db251/trace.log
RemoteExecution session id: reSessionID-d92e10b8-9209-4e9e-95a6-2fcac02db251-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3377699814955263
✓ ListingSuccess: caffe2/test:quantization_fx : 365 tests discovered (19.275)
✓ Pass: caffe2/test:quantization_fx - test_fusion_pattern_with_matchallnode (quantization.fx.test_quantize_fx.TestFuseFx) (17.760)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3377699814955263
```
Reviewed By: jerryzh168
Differential Revision: D34873730
fbshipit-source-id: dc78455c7233ba33e9ab215f50754b1656b7dbc7
(cherry picked from commit 1cc74cadd7dc725be97064f57c910ef9d1bbe1a8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74276
Removing convert.py since we have rerouted the traffic to _convert_do_not_use, we'll do a rename in the follow up PR
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34914261
fbshipit-source-id: 09ad520d95fa91c525222a69474930efb3571088
(cherry picked from commit 8aeb33206f3572132356fe78395aa3ce6aff11cd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74128
**Summary:** This commit is the first step towards refactoring the
lowering code in _lower_to_native_backend.py. The main changes
included in this commit are:
(1) Remove the use of the subgraph rewriter in lowering
(2) Replace the use of `is_match` with manual pattern matching
The motivation behind (2) is it simplifies the lowering code
significantly; previously we had many different but similar
patterns for slightly different models. There should be no
change in behavior with this PR.
Note that this is only part 1 of the refactoring. Part 2
will merge the static and dynamic lowering code paths
and refactor the currently duplicate pattern matching /
cleanup code into common helper functions.
**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
**Reviewers:** jerryzh168
**Subscribers:** jerryzh168
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D34910597
Pulled By: andrewor14
fbshipit-source-id: c6fea0c538ce5efc5afaf53e072922528988dda7
(cherry picked from commit fa05cb9fc0909fe6e199a6b50ea2001c9e9ac0ee)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73274
As noticed in https://discuss.pytorch.org/t/calibration-of-model-in-post-training-static-quantization-using-fx-api/143661/6
and related to https://github.com/pytorch/pytorch/issues/72698 when using fx quantizaiton, if an op like view was used in a
model and the index parameters were passed in to the ops with a
variable rather than
hard coded, fx would mistakenly insert observers for them, leading to an
error when the observer tried to do tensor only operations on a
non-tensor. To fix this, an API was added to specify non tensor
arguments for various ops to enable better dtype propagation.
NON_TENSOR_ARG_DICT is a nested dict whose first key is a named tuple
which contains matching parameters for ops with nontensor args, the
inner dict's keys are dtypes and the values are a list of those arg indices that
take use such dtypes. Alternatively, instead of a list, the inner dict
value can also be a function that takes the node as an argument and
returns the list of arg indices.
Theoretically this api can support arbitrary functions but the current
implmentation is limited to simpler functions given the particular
issue this fixes seems to be rare.
Note: although torch.unsqueeze and torch.transpose are listed in
quantization_patterns.py, those ops appear to be untraceable by fx. I've
included tests for their cases but fixing this issue is beyond the scope
of this PR
Test Plan:
python test/test_quantization.py test_non_reference_size
...
python test/test_quantization.py test_non_reference_<op>
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D34410122
fbshipit-source-id: fc09949ca8a2d6473876a4b6c214eb91e9a9dae2
(cherry picked from commit 3a1375d677b7c98d62b1f5c839645698c39b32b9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74210
This PR added a codepath for getting patterns (quantize handlers) for the backend_config_dict for native backend when
backend_config_dict is None. This would allow us to incrementally define the backend_config_dict for
pytorch native backend and gradually remove the entries in quantization_patterns.py
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: dzdang
Differential Revision: D34899783
fbshipit-source-id: 7f31292948d7fc4566e51e175b41511f52d0a880
(cherry picked from commit a9f6ebd6478f362d5bb9c5ae04e02369e00f550c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74229
Previously we did not successfully remove the dequantize node for `dict`, this PR fixes that, tested with
meta-only tests right now but we should follow up with oss tests (with dict output)
since we called dead code elimination pass, some of the inplace operators are removed in the TestQuantizeFx.test_fixed_qparams_ops,
in this PR we also just removed the calls to the inplace ops, and changed the expected results in the test case,
in the future PR we can remove the support for inplace operators, since it is not really supported in fx, and it's OK
for us to skip them as well
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34888140
fbshipit-source-id: 48cea842b49e52baa8eee3ce0f4bfb4a3625ab2a
(cherry picked from commit ef790315ebcf954930deb6b9d1c384992c1f1ec8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74140
Previously we did not successfully remove the dequantize node for `dict`, this PR fixes that, tested with
meta-only tests right now but we should follow up with oss tests (with dict output)
Reviewed By: andrewor14
Differential Revision: D34846005
fbshipit-source-id: 4313ed6adff425d73ad19aabedde1200a98f1915
(cherry picked from commit 682abe9ecbd42c4ac1b41891bbc3b79ad522b78a)
Summary:
This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend
The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.
ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.
```python
torch.backends.quantized.engine = 'onednn'
```
## Design docs
https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096
## File changes
**Add ONEDNN to qengine list**
- aten/src/ATen/Context.cpp
- c10/core/QEngine.h
- torch/ao/quantization/qconfig.py
- torch/backends/quantized/\_\_init\_\_.py
**Implement qconv & qlinear for ONEDNN backend**
- aten/src/ATen/native/quantized/cpu/conv_serialization.h
- aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
- aten/src/ATen/native/quantized/cpu/onednn_utils.h
- aten/src/ATen/native/quantized/cpu/qconv.cpp
- aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp
**Skip tests that are not supported by ONEDNN**
- test/ao/sparsity/test_kernels.py
- test/quantization/core/test_quantized_module.py
- test/quantization/core/test_quantized_op.py
## Validation results
This PR has passed `test_quantization.py` and `test_mkldnn.py`.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)
**Table 1. Performance comparison of int8 2d convolution operator**
|No.| Shape| FBGEMM| ONEDNN| Gain|
|-|-|-|-|-|
|1| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 668.310us| 535.630us| 24.8%|
|2| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 290.630us| 281.810us| 3.1%|
|3| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.045ms| 893.010us| 17.0%|
|4| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 385.320us| 373.720us| 3.1%|
|5| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.876ms| 1.641ms| 14.3%|
|6| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 660.460us| 638.470us| 3.4%|
**Table 2. Performance comparison of int8 linear operator**
|No.| Shape (m, n, k)| FBGEMM| ONEDNN| Gap|
|-|-|-|-|-|
|1| 64, 800, 320| 80.550us| 96.770us| 20.10%|
|2| 64, 768, 512| 101.230us| 130.720us| 29.10%|
|3| 16, 256, 512| 30.230us| 51.450us| 70.20%|
|4| 128, 128, 128| 33.810us| 50.480us| 49.30%|
|5| 256, 512, 256| 154.490us| 195.050us| 26.30%|
|6| 1024, 1024, 1024| 3.134ms| 3.514ms| 12.10%|
ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820
Reviewed By: HDCharles
Differential Revision: D33716039
Pulled By: jerryzh168
fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd
(cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863
This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).
This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34778506
fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73947
The original implementation of memoryless observers used MinMaxObservers and
a memoryless argument to manipulate the behavior of the observer such that it wouldn't
keep track of previously observed min and max's. It was later pointed
out that this was equivalent to a movingaverageobserver with averaging_constant=1
which is requires less overhead and no 1 off args (memoryless) so this PR refactors
the memoryless arg and uses MovingAverage observers instead, although the memoryless
adjective is still used, a complete definintion was also added to clarify error
messages given these changes.
TestPlan
python test/test_quantization.py TestQuantizeEagerQAT
python test/test_quantization.py TestObserver
Test Plan: Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34732080
Pulled By: HDCharles
fbshipit-source-id: 227a1ab29d18adae55093a684ea35ac34523d07a
(cherry picked from commit 5238e70e8f90f3219c36f9c64b647951dcf64b5a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73509
This adds functionality to lower reference models
involving the Linear-Bn1d pattern in FX QAT mode. This follows
https://github.com/pytorch/pytorch/pull/72431 and https://github.com/pytorch/pytorch/pull/72796, which add Linear-Bn1d fusion functionality
to eager QAT mode.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module
Imported from OSS
Reviewed By: dagitses
Differential Revision: D34591251
fbshipit-source-id: 39144485f9954ee1830c8b414e724560fd7e47bf
(cherry picked from commit b97a39b4d9df00e045fab4c01eca88e562ca2c02)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73785
The conversion from fp32 to fp16 is easily defined, we just did not
have it in NS code yet. This PR adds it.This is needed for some customer models.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_fp16_shadows_fp32
```
Reviewed By: jerryzh168
Differential Revision: D34642873
Pulled By: vkuzo
fbshipit-source-id: 9df505b1ea3f3d3cdb3a5f2409ef3a66f40b7eff
(cherry picked from commit 679cd8a5e24b1cfd7f871dcba3ce8a90de980556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73777
att, this is to prepare for the migration of current convert to this function
Test Plan:
regression tests to make sure the refactor doesn't break anything
internal only, since tensorrt tests are moved to a separate repo
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34636000
fbshipit-source-id: 9850904e3b834345abbeedc8bccaf107397db59d
(cherry picked from commit a8c87d4592237c247989e7419bb165c96b8e90db)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73493
This PR enables basic support for reference modules in DBR quant.
For now, the support is limited to:
1. modules that have reference versions defined only (no functions)
2. torch.qint32 dtype only
Currently, the reference module logic is enabled whenever dtype is
torch.qint32. This is done because this is needed the earliest for
the first use case. A future PR will support more dtypes and also
add the `is_reference` flag to the API.
Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_conv_int32_reference_model
```
Reviewed By: jerryzh168
Differential Revision: D34520759
Pulled By: vkuzo
fbshipit-source-id: 363db715315c5c7c20962a1818330ce288948778
(cherry picked from commit 6ccdfe2889c252211f191edc49f4147f66e803a4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73492
Before this PR, DBR quant reused the Eager mode quantization machinery
to insert activation observers. This was done for speed of developing the
prototype. A drawback of this is that the activation observers are not
present in DBR's data structures and live on the modules instead.
This PR refactors DBR quant to stop using Eager mode quantization
observer insertion for activations, and instead create and track the
activation observers in DBR's data structures. This has a couple of benefits:
1. activation observers are now created the same way in DBR for modules and functions
2. we can remove some technical debt due to fixing (1)
3. this will make it easier to support reference modules in a future PR
The reason (3) is true is because the current design of reference modules
assumes that the activation observer lives on the framework (like in FX
graph mode quantization). This PR starts to adhere to that assumption.
Test Plan:
```
python test/test_quantization.py -k DBR
```
Reviewed By: jerryzh168
Differential Revision: D34520758
Pulled By: vkuzo
fbshipit-source-id: 2f6448dce021024cb2fa112d8691c94128c43123
(cherry picked from commit cfc1a0eaf6579cea2c710c1c2b4c86d28ee799eb)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73745
Mark output logger as impure, which will help prevent it and the shadow ops from being removed in acc tracer.
Test Plan: Tested in N1611591
Reviewed By: jerryzh168
Differential Revision: D34616990
fbshipit-source-id: ccc93e30f9cbf3eb69f49fc2d0f02fd4d083c507
(cherry picked from commit e40fcbd1bc543eb64fa692776c34f26e2a0a05ff)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71230
DBR quantization uses `torch.Tensor.as_subclass` frequently. When
the quantized model is traced with `torch.jit.trace`, these calls appear
in the resulting graph as `aten::alias`. This PR adds a pass to remove
these calls from the graph, for two reasons:
1. ease of debugging (these calls do nothing)
2. less work for downstream passes (for example, converting to ONNX currently breaks if these alias calls are present)
For now, we have to inline the graph in order for `aliasDb` to determine
safety properly. In the future, we may choose to relax this if there is
a need for it.
Test Plan:
Test plan is pretty basic for now, it can be improved in future PRs.
```
python test/test_quantization.py TestQuantizeDBR.test_jit_tracing_removes_aliases
```
Reviewed By: eellison
Differential Revision: D33552387
Pulled By: vkuzo
fbshipit-source-id: 681a33ddfff394a91e971263ac593afd93c5ea78
(cherry picked from commit 0f8412725d0c6fd9ef1072a50d4203465aa5d1f9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73671
QuantWrapper did not correctly apply qconfig to the dequant.
Therefore, if the user first applied qconfig to their module and
then wrapped it with `QuantWrapper`, the dequant would not get
swapped during the convert step.
The fix is to properly apply the qconfig to the dequant.
Test Plan:
```
python test/test_quantization.py TestQuantizeEagerPTQStatic.test_quantwrapper_attaches_qconfig_to_dequant
```
Reviewed By: MaigoAkisame
Differential Revision: D34585260
Pulled By: vkuzo
fbshipit-source-id: 82055a9fa7fc13a714fe460deb461c2e87e76b39
(cherry picked from commit c9f392333dd1c005d893bdc2fbafe8a82b317c88)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73572
Previously we can't specify how to get extra inputs for fused ops in backend_config_dict,
for example, for patterns like:
(torch.add, (nn.BatchNorm2d, nn.Conv2d), MatchAllNode)
where nn.Conv2d is the root node, the extra MatchAllNode (the input for original torch.add) would be lost
This PR added a "extra_inputs_getter" key in the backend_config_dict, which allows user to provide a function,
that can return a list of extra input node for the fused op given the matched node pattern. In this case,
we need a function that returns the node that matches with `MatchAllNode`, it would be something like the following:
```
def extra_inputs_getter(pattern):
add, conv_bn, extra_input = pattern
return [extra_input]
```
Test Plan:
python test/test_quantization.py TestFuseFx.test_fusion_pattern_with_multiple_inputs
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34553210
fbshipit-source-id: 748f8ce20974438458a39dbe9eae75281156c227
(cherry picked from commit be748526480e811874dbca64b1cf3bf4950f0393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72717
This will be renamed to WeightedQuantizedModule to
minimize confusion with reference modules.
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D34172554
fbshipit-source-id: 4cd77d6048fde4875218386f7e55f864a73d5bd3
(cherry picked from commit b7af4cedb4275b6f9c06c0773f2997bc4e61578a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73470
att, this does not affect user apis since we are only exposing fuse_fx as a public api
Test Plan:
python test/test_quantization.py TestFuseFx
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34495260
fbshipit-source-id: 3aa253bc7190e50acc7229186f210901ebc5481b
(cherry picked from commit a88517ff6feff7abbece2234d82fd53e33702237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73233
This PR makes CopyNodeQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops
Lowering passes have been implemented previously, we just need to enable the reference path here,
and cleanup the previous code to allow list some of the ops (`check_node`)
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: mrshenli
Differential Revision: D34469446
fbshipit-source-id: b9d9c5f793fbb735839199056c197ae98969cc4b
(cherry picked from commit af0cf4e79e11e7343d57e6ff7766c80e72ec60f3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73345
For complex patterns we need to identify which node is the root, so that we can eliminate all other nodes and only preserve the root,
e.g. (torch.add, MatchAllNode, (torch.nn.ReLU, torch.nn.Conv2d)), we can preserve the torch.nn.Conv2d as root node, and remove other nodes.
Prevoiusly we assumed the root_node of a pattern is the "last node" of the pattern, computed by:
```
def default_root_node_getter(node_pattern):
while not isinstance(node_pattern[-1], Node):
node_pattern = node_pattern[-1]
return node_pattern[-1]
```
This PR enables user configuration to define their own root_node_getter, that means we can define root_node for patterns like:
(torch.add, (torch.nn.ReLU, torch.nn.Conv2d), MatchAllNode)
Test Plan:
python test/test_quantize_fx.py TestFuseFx.test_root_node_getter
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D34442193
fbshipit-source-id: 2f6da69a5b6527b49710ae32820e8e2915d9af37
(cherry picked from commit 8b49bf0d7d53cdcf2c9f40f8e25bc843e8814026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72735
We use `get_matched_types` to get the (type) pattern from matched modules.
And we need to use MatchAllNode instead of type(MatchAllNode) to query the fuser_method for the pattern
Test Plan:
TODO
Imported from OSS
Reviewed By: raghuramank10000
Differential Revision: D34180705
fbshipit-source-id: db9b6e791a9f26b70079fddc95fce033052199ab
(cherry picked from commit 01d38afabcb1bfc207dee7d49ee13df500d32fdf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73344
not user facing as of now, since we haven't advertised the backend_config_dict api,
we need this in fuser_method_mapping.py, this is to avoid circular dependency
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D34441778
fbshipit-source-id: 7a01c359e4b21e9e98345dc7781f735628209a20
(cherry picked from commit 758537094c5a98a17a8825b3f240c8d5acdd72b0)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72953
This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we have
some custom lowering pass to rewrite the reference qunatized patterns to quantized ops
it includes rewrite for
torch.ops.quantized.add, torch.ops.quantized.mul, torch.ops.quantized.matmul
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: gchanan
Differential Revision: D34292408
fbshipit-source-id: 9872a5098249bc77db15e9fb614416958e62b9b2
(cherry picked from commit dbdc61ee8b5dde2e54a34a370a3af887e5117398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72934
Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as
```
def forward(self, x):
for module in self:
x = module(x)
return x
```
Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.
This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:
```
// toy model
model
|--> child1
// toy model with AutoQuantizationState objects, before this PR
model
|--> child1
| |--> _auto_quant_state
|--> _auto_quant_state
// toy model with AutoQuantizationState objects, after this PR
model
|--> child1
|--> _fqn_to_auto_quant_state_map
|--> ( ) --> _auto_quant_state // of `model`
|--> (child1) --> _auto_quant_state // of `model.child1`
```
Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.
Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.
The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.
Test Plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```
Reviewed By: dzdang
Differential Revision: D34281074
Pulled By: vkuzo
fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1
(cherry picked from commit 4008f89967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72431
Adds support for a fused QAT observed module for `Linear` followed by
`BatchNorm1d`. In this PR, only the support for prepared module with
fake_quants in the right places is added.
A future PR will add support for `convert`, and tests for eager and FX
graph mode workflows.
Similar to conv-bn, we rescale the weight before applying the fake
quant, and undo the rescaling after the linear operation.
Test Plan:
```
python test/test_quantization.py TestQuantizeEagerQATNumerics.test_linear_bn
```
Imported from OSS
Reviewed By: jerryzh168, raghuramank10000
Differential Revision: D34044427
fbshipit-source-id: 47a519173939ca4824d2c6e6ea7a599764a8ed10
(cherry picked from commit bfc75fe078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72490
This is an effort to move the current implementation towards the reference quantized model design:
https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
so that we use reference model in the default fbgemm/qnnpack path
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps.test_qbatch_norm
Imported from OSS
Reviewed By: vkuzo, andrewor14
Differential Revision: D34062365
fbshipit-source-id: ed015c61f5b969554a6477f92cf6be2358cb558c
(cherry picked from commit 9498421ddd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72444
In https://github.com/pytorch/pytorch/pull/71783 support was added for
quantized matmul.
In this PR, the FX graph mode quantization workflow support for this
operator is added, for int8 dtypes.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_qmatmul
```
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34047310
fbshipit-source-id: 781219047419ce621a4deb46ea04881818bf4209
(cherry picked from commit 7e039fa3a1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71795
This commit expands the API coverage of functional conv
ops in DBR quantization from F.conv2d to all conv variants.
Test Plan:
python test/test_quantization.py TestQuantizeDBRIndividualOps.test_conv_functional
Imported from OSS
Reviewed By: albanD
Differential Revision: D33907099
fbshipit-source-id: f459c219482822f64c7c9d22cd316c6e9ef44405
(cherry picked from commit acf4548e8d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781
The previous PR added information about fusions found in the subgraphs.
This PR uses that information for:
1. inserting observers at the end of fusions and not in the middle
2. during inference, replacing the original op with the fused op. The
way this is implemented is that the base op is replaced with the fused op,
and all other ops are replaced with identity functions.
Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```
Reviewed By: jerryzh168
Differential Revision: D33775097
Pulled By: vkuzo
fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf
(cherry picked from commit 0db4324ea9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71780
Adds support for matching operator.add -> torch.relu in FX graph
mode quantization.
It would be nice to support torch.relu better in general, but
saving that for a future PR to keep PRs small.
This is useful for DBR quant because we have some test cases in DBR
quant which use add-relu, and we'd like to match them to FX.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_add_relu
python test/test_quantization.py TestQuantizeFxOps.test_mul_relu
```
Reviewed By: jerryzh168
Differential Revision: D33775096
Pulled By: vkuzo
fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24
(cherry picked from commit c1f9f38ca1)