Commit Graph

207 Commits

Author SHA1 Message Date
vasiliy
dc70e8175f Add various uninterpreted bit tensor data types (try 2) (#95860)
Summary:

This is a retry of https://github.com/pytorch/pytorch/pull/94992 which was reverted due to CI issues.

This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).

@bypass-github-export-checks

Test Plan:

```
python test/test_quantization.py -k TestBits
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95860
Approved by: https://github.com/atalman
2023-03-04 03:35:59 +00:00
PyTorch MergeBot
3bafecf719 Revert "Add various uninterpreted bit tensor data types (#94992)"
This reverts commit 9dbfca7840.

Reverted https://github.com/pytorch/pytorch/pull/94992 on behalf of https://github.com/atalman due to breaks libtorch windows nightly builds see: https://github.com/pytorch/pytorch/pull/95406
2023-02-23 23:54:23 +00:00
vasiliy
9dbfca7840 Add various uninterpreted bit tensor data types (#94992)
Summary:

This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).

Note: this is a copy-pasta of https://github.com/pytorch/pytorch/pull/89990 with a bug fix for clang9, easier to just to put up another PR since I'm not sure how comandeering works with Meta-only changes.

@bypass-github-export-checks

Test Plan:

```
python test/test_quantization.py -k TestBits
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94992
Approved by: https://github.com/angelayi
2023-02-18 00:04:30 +00:00
Jerry Zhang
8fa66a6337 [quant][pt2e] Add a test to confirm we can set qconfig according to module_name (#91977)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qconfig_none

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91977
Approved by: https://github.com/jcaip
2023-01-12 21:59:02 +00:00
Jerry Zhang
f7b384cc46 [reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#91035)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91035
Approved by: https://github.com/HDCharles
2022-12-17 02:15:53 +00:00
PyTorch MergeBot
ad1b04c4a9 Revert "[reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971)"
This reverts commit 7dd5e55497.

Reverted https://github.com/pytorch/pytorch/pull/90971 on behalf of https://github.com/ezyang due to still broke tons of master jobs sorry
2022-12-16 09:29:39 +00:00
Jerry Zhang
7dd5e55497 [reland][quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90971)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90971
Approved by: https://github.com/HDCharles
2022-12-16 06:24:28 +00:00
PyTorch MergeBot
9c912c7dd0 Revert "[quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802)"
This reverts commit a66af1feba.

Reverted https://github.com/pytorch/pytorch/pull/90802 on behalf of https://github.com/malfet due to somehow broke test_resnet18 (quantization.fx.test_quantize_pt2e.TestQuantizePT2EModels), see a66af1feba
2022-12-15 23:28:21 +00:00
Jerry Zhang
a66af1feba [quant][pt2e] Add early prototype top level quantize_pt2e APIs (#90802)
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization

* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules

Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config

Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90802
Approved by: https://github.com/qihqi
2022-12-15 21:50:29 +00:00
AllenTiTaiWang
bdb14238ec [Reland][ONNX] Move all torch.onnx.export related tests to test/onnx (#87292)
Moving torch.onnx.export related tests to test/onnx integrates ONNX tests to the same CI machine, so the testing environment can be better managed.

Fixes https://github.com/pytorch/pytorch/issues/87320
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87292
Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao, https://github.com/kit1980, https://github.com/malfet
2022-11-01 14:22:46 +00:00
Vasiliy Kuznetsov
237316aa1d PNP: early FX numeric suite tool to quantize each layer N times (#80521)
Summary:

This PR is an early prototype of a tool to quantize each layer of a model
N times, with N qconfigs each. We follow the design agreed upon in
https://fburl.com/gdoc/e1gaq3ih .

Current API:

```
m = M().eval()
example_input = (torch.randn(2, 2),)
qconfig_mappings = [
    QConfigMapping().set_global(torch.quantization.default_qconfig),
    QConfigMapping().set_global(torch.quantization.default_dynamic_qconfig),
]
backend_config = get_native_backend_config()

msp = prepare_n_shadows_model(
    m, example_input, qconfig_mappings, backend_config)

for _ in range(2):
    msp(*example_input)

msq = convert_n_shadows_model(msp)
msq(*example_input)

results = extract_results_n_shadows_model(msq)
print_comparisons_n_shadows_model(results)

// example output

subgraph_idx    ref_node_name      best_idx        1        2
--------------  ---------------  ----------  -------  -------
subgraph_0      fc1                       2  42.0834  42.6279
subgraph_1      fc2                       2  43.7259  50.0593
```

Test plan:

```
python test/test_quantization.py -k test_n_shadows
```

Differential Revision: [D37650332](https://our.internmc.facebook.com/intern/diff/D37650332)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80521
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
2022-10-06 02:30:45 +00:00
zaf
d542aab5c1 [quant][ao_migration] nn.intrinsic migration to ao (#84842)
All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.modules`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline.

Differential Revision: [D39419733](https://our.internmc.facebook.com/intern/diff/D39419733/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419733/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84842
Approved by: https://github.com/jerryzh168
2022-09-28 23:54:29 +00:00
Vasiliy Kuznetsov
58170fb8aa Remove DBR quantization from the codebase (#83642)
Summary:

DBR quantization is a no-go for now because it does not align well with
PyTorch 2.0 plans and we do not want to build yet another tracing system.

Deleting it from the codebase for now since there are no plans to develop
this in the near future. We can bring it back at a later time if necessary.

Test plan:

CI

Differential Revision: [D38839556](https://our.internmc.facebook.com/intern/diff/D38839556)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83642
Approved by: https://github.com/andrewor14, https://github.com/jerryzh168
2022-08-23 15:18:40 +00:00
zaf
78c8a0d752 [quant][ao_migration] torch.nn.quantized.functionaltorch.ao.nn.quantized.functional (#78712)
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.

The list of the `nn.quantized` files that are being migrated:

- [ ] `torch.nn.quantized` → `torch.ao.nn.quantized`
  - [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
  - [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
  - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
  - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [ ] `torch.nn.qat` → `torch.ao.nn.qat`
  - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
  - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
  - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
  - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
  - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
    - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
    - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`

Majority of the files are just moved to the new location.
However, specific files need to be double checked:

- [Documentation](docs/source/quantization-support.rst) @vkuzo
- [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10

Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712
Approved by: https://github.com/jerryzh168
2022-08-18 17:51:54 +00:00
mikey dagitses
3f612b58be fix quantization/core/test_docs for Buck2 (#83341)
Summary:
We extract the test to its own target, fixing the relative path to the
quantization docs. This allows us to find the docs with a more simple
implementation.

Test Plan: Tested locally with buck1 and buck2.

Differential Revision: D38662169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83341
Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/ZainRizvi
2022-08-18 13:03:00 +00:00
Andrew Or
194255bb56 [Quant][fx] Implement BackendConfig (part 1) (#81469)
Summary: Following https://github.com/pytorch/pytorch/pull/78452
and https://github.com/pytorch/pytorch/pull/79066, this commit
is part 1 of the broader effort to replace `backend_config_dict`
with a python config object, a more formal and robust API that
leads to better user experience. Note that there is no change in
behavior in this commit by itself. A future commit (part 2) will
replace all existing usages of `backend_config_dict` with the
`BackendConfig` object added in this commit.

Test Plan:
python test/test_quantization.py TestBackendConfig

Reviewers: jerryzh168

Subscribers: jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81469
Approved by: https://github.com/jerryzh168
2022-07-24 00:34:48 +00:00
vspenubarthi
d0ce1fbbe2 [ao] Created Skeleton for ModelReportVisualizer class (#81523)
Summary: This introduces the skeleton for the ModelReportVisualizer
class. This class helps visualize the information generated by the
ModelReport class `generate_report()` output. This class aims to provide
visualizations in a table, plot (line graph) and histogram view.

This also introduces an empty test class for testing visualizations. As
implementations start occuring for this class, tests will also be
approrpriately added.

This includes the high level descriptions for each of the methods as
well. Expected use cases will be added to the class description in a
future commit as that gets finalized.

Test Plan: python test/test_quantization.py TestFxModelReportVisualizer

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81523
Approved by: https://github.com/andrewor14
2022-07-20 02:39:14 +00:00
vspenubarthi
e5162dcfa7 [ao] Added framework for ModelReport Outlier Detector (#80743)
Summary: This adds the class framework for the ModelReport
OutlierDetector. This detector will be in charge of looking at
activation data and figuring out whether there are significant oultiers
present in them. It will average this data across batches to make a
recommendation / warning if significant outliers are found.

This commit contains just the class framework and a base test class.
Implementations will follow in following commits.

Test Plan: python test/test_quantization.py TestFxDetectOutliers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80743
Approved by: https://github.com/HDCharles
2022-07-01 01:03:31 +00:00
vspenubarthi
845021db2c [ao] Adds framework for InputWeightEqualization Detector (#79916)
Summary: This adds the framework (method signatures and descriptors) for
the InputWeightEqualization Detector. There is no code implemenation yet
so the test suite for this is a simple pass. This Detector will be used
to determine whether input weight equalization should be recommended.

Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79916
Approved by: https://github.com/HDCharles
2022-06-24 14:51:15 +00:00
HDCharles
ffdc5eebc7 [ao][docs] tests for quantization docs (#79923)
Summary: per https://github.com/pytorch/pytorch/issues/79135 the code
snippets in the docs don't run. This is a recurring problem since
previously there was no unit test to check that these code snippets
actually ran. This PR adds support for such a test, importing the
snippet as a string and evaluating it to make sure that it actually runs
if the code snippet has user defined code, you can pass in dummy
versions using global_inputs. Sometimes the imports of the code snippets
behave oddly but you can pass them in as in test_quantization_doc_custom
where nnq is passed in.

Test Plan: python test/test_quantization.py TestQuantizationDocs
also see https://github.com/pytorch/pytorch/pull/79994 to see what shows up in CI when the docs get broken

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79923
Approved by: https://github.com/z-a-f, https://github.com/vspenubarthi
2022-06-23 20:50:31 +00:00
vspenubarthi
01720ae3b6 [ao] Added ModelReport class outline for Fx Graph Modules
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the init method and the signatures and docs for each
of the proposed helper functions.

This also address and fixes a revert issue.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80052

Approved by: https://github.com/HDCharles
2022-06-22 21:12:58 +00:00
PyTorch MergeBot
ea6fa8dc95 Revert "[ao] Added ModelReport class outline for Fx Graph Modules"
This reverts commit 0f95e1846c.

Reverted https://github.com/pytorch/pytorch/pull/79595 on behalf of https://github.com/malfet due to Broke tests on MacOS, see 0f95e1846c
2022-06-22 12:43:07 +00:00
vspenubarthi
0f95e1846c [ao] Added ModelReport class outline for Fx Graph Modules
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.

This contains the init method and the signatures and docs for each
of the proposed helper functions.

Test Plan: python test/test_quantization.py TestFxModelReportClass

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79595

Approved by: https://github.com/andrewor14
2022-06-22 02:47:24 +00:00
vspenubarthi
38952d9350 [ao] Added function to inform dynamic vs static appropriate
Summary: The _detect_dynamic_vs_static function was added to take in a
prepared fx graph model that already had ModelReportObservers built into
it and uses the collected information to determine whether input and
output are stationary or non-stationary and provides feedback on whether
to make linear modules static or dynamic based on this information.

This PR will be followed up soon with another PR that will more
rigoursly test the whole end to end performance of this system, which is
primarily how the function in this PR will be tested for functionality,
which is why this one only has 1 test.

Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79326

Approved by: https://github.com/HDCharles
2022-06-15 02:51:27 +00:00
vspenubarthi
8e05513152 [ao] Added ModelReportObserver to inform on dynamic vs static
Summary: The purpose of this is to add to the module report functioality
by creating an observer that will take a prepared fx module and suggest
whether static or dynamic quantization is more appropriate. The tests
for this have been written and included in the location indicated by the
Test Plan

Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportObserver

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79243

Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
2022-06-14 19:08:40 +00:00
vspenubarthi
28c541776c [ao] Added fx model report per_channel detector
Summary: This code is meant to be a tool to help people get the most out
of their backend by hinting them to use per_channel quantization if it's
supported, which will help increase accuracy significantly. The code is
completed and ready to be reviewed.

Test Plan: test/quantization/fx/test_model_report_fx.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79104

Approved by: https://github.com/HDCharles
2022-06-10 08:09:59 +00:00
Jerry Zhang
7ea5fa3dd4 [reland][quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286

Approved by: https://github.com/dzdang
2022-05-25 23:31:51 +00:00
PyTorch MergeBot
87148f2b59 Revert "[quant] Add utility function get_fqn_to_example_inputs"
This reverts commit 50a44fe461.

Reverted https://github.com/pytorch/pytorch/pull/78146 on behalf of https://github.com/suo due to as it broke master
2022-05-25 06:37:32 +00:00
Jerry Zhang
50a44fe461 [quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146

Approved by: https://github.com/vkuzo
2022-05-25 03:07:16 +00:00
Jerry Zhang
81437e66c1 [quant][fx] Add RNN reference module (#73386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73386

This PR adds support for RNN reference module, following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
This includes: RNNCell, LSTMCell, GRUCell, LSTM

Test Plan:
will be tested in the lowering flow in a separate PR

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D34469445

fbshipit-source-id: 71a13d7d056f7aaccdd98fb477c8a3a38aecc249
(cherry picked from commit 0b10f0d127515556b677eae3150f026ac8cd9acd)
2022-03-02 10:30:37 +00:00
Vasiliy Kuznetsov
4e90fa6a8c dbr quant: break up test class into multiple classes (#70246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70246

Breaks up the large `TestQuantizeDBR` test case into
1. `TestQuantizeDBRIndividualOps` for testing functionality of ops
2. `TestQuantizeDBRMultipleOps` for testing non-fusion interactions between ops
3. `TestQuantizeDBR` for everything else

We may need to refactor this more in the future, but this should
unblock things for the near future.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
python test/test_quantization.py TestQuantizeDBRIndividualOps
python test/test_quantization.py TestQuantizeDBRMultipleOps
```

Reviewed By: jerryzh168

Differential Revision: D33255925

Pulled By: vkuzo

fbshipit-source-id: 82db1a644867e9303453cfedffed2d81d083c9cd
2022-01-05 06:36:41 -08:00
Jerry Zhang
ef6f776e82 [quant][be] Cleanup test cases for eager mode workflow (#69880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69880

Making the test cases more standardized, in general we would like to have
```
TestQuantizeEager,
TestQuantizeEagerOps,
TestQuantizeEagerModels,
```

but currently since we have separate ptq static, ptq dynamic and qat static apis, we only partially cleaned
up the test cases, we can merge all of them later when we merge all the apis

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33081418

fbshipit-source-id: fcb96559b76bbc51eb1b0625e0d4b193dbb37532
2021-12-16 17:47:30 -08:00
Jerry Zhang
1940cc028e [quant][graphmode][fx] Fork subgraph_rewriter from torch.fx to quantization (#68228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68228

Forking this for now so that we can make changes as we need, the changes can be merged back to torch.fx
later

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537713

fbshipit-source-id: 326598d13645fcc28ef2c66baaac6a077b80fd0c
2021-11-24 10:49:05 -08:00
Jerry Zhang
a6d862c50a [quant][graphmode][fx] Add support for weight and bias dtype in backend_config_dict (#68602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602

This PR adds support for configuring weight/bias dtype in backend_config_dict
and refactor the current code that checks when to insert observers

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537712

fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2
2021-11-19 13:01:50 -08:00
Charles David Hernandez
7ee84ad321 Refactoring quantized op tests to combine test classes (#68282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68282

Combined 3 Dynamic quantized op test classes into 1

Test Plan:
python test/test_quantization.py TestDynamicQuantizedOps

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D32402163

fbshipit-source-id: 696b7ef5d823632941dc7afc95161501445d0e18
2021-11-15 20:47:02 -08:00
Charles David Hernandez
09615cd0b0 Adding Dynamic Conv and ConvT ops/modules (#68176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176

it should be noted that for the modules, reduce_range is set to
true by default in a similar fashion to linear_dynamic.

Test Plan:
python test/test_quantization.py TestDynamicQuantizedModule
python test/test_quantization.py TestDynamicQuantizedConv
python test/test_quantization.py TestQuantizedConv

Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D32374003

fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3
2021-11-15 16:42:25 -08:00
Vasiliy Kuznetsov
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00
Ben Koopman
3aadff651c [quant][embedding qat][bugfix] Fix and test QAT EmbeddingBag from_float error message (#66989)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66989

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31961773

Pulled By: b-koopman

fbshipit-source-id: 0d28728c87751ffc696ac221c3e8e75ac923cc57
2021-10-28 06:29:20 -07:00
Jerry Zhang
a7bbf8814c [quant][graphmode][fx] Move quant-fx2trt unittests to test_quantize_fx.py (#67064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67064

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849075

fbshipit-source-id: 9c5e8aad7c88070830d853faf3106491726e77ff
2021-10-22 14:36:36 -07:00
Jane Xu
6a224b3370 Set test owners for quantization tests (#66832)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66832

Reviewed By: saketh-are

Differential Revision: D31842880

Pulled By: janeyx99

fbshipit-source-id: 8aee760e4203045c12e7548a21ed5b71c557e3ee
2021-10-21 16:04:41 -07:00
Jerry Zhang
a89851a0d9 [quant][fx][graphmode] Adding a new convert function that produces reference pattern by default (#66925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66925

Current convert_fx implementation is using "The Interpreter Pattern" in https://pytorch.org/docs/stable/fx.html
There are two things that's changed which make the approach in this PR possible and needed:
1). original convert implementation is developed at the initial prototype where fx does not allow mutations, now fx
supports mutations
2). original convert needs to work for a lot of fbgemm/qnnpack specific logic, which is not needed for reference patterns

Therefore it makes sense for us to write a new convert function just for reference patterns, the implementation
is significantly easier to understand than the original convert implementation

Current support:
* we should be able to support all non-weighted ops like relu, add etc.

Missing:
* linear and conv
* some advanced features like standalone modules, input_quantized_idxs etc.

will add linear and conv support and start defining the backend_config_dict based on this version of convert

Test Plan:
python test/test_quantization.py TestQuantizeFxOpsNew

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31786241

fbshipit-source-id: 2a32156eb6d3c5271cb44906cd863055785fb5d4
2021-10-20 18:54:30 -07:00
Vasiliy Kuznetsov
1d9a6862cd fx quant: add a BC test for loading old torch.package models (#65538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65538

Adds a test which verifies that `prepare_fx` and `convert_fx` work
on models created by `torch.package` in the past.  In detail:

1. (one time) create a model and save it with torch.package. Also save input,
expected output, and names of quantization related get_attrs added by
our passes.
2. (every time) load the model from (1), and verify that expected output
matches current output, and that get_attr targets did not change.

Test Plan:
```
python test/test_quantization.py TestSerialization.test_linear_relu_package_quantization_transforms
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31512939

fbshipit-source-id: 718ad5fb66e09b6b31796ebe0dc698186e9a659f
2021-10-11 08:23:38 -07:00
Jerry Zhang
508845f2b5 [quant] AO migration of the torch/quantization/quantize_fx.py and torch/quantization/fx/* (#65033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033

1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: vkuzo, z-a-f

Differential Revision: D30949749

fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
2021-09-22 09:29:15 -07:00
Zafar Takhirov
425f173f9d [quant][refactor] Change the structure of the ao migration tests (#64912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912

The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out
ghstack-source-id: 138157450

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898214

fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84
2021-09-15 13:15:43 -07:00
Zafar Takhirov
9cc44aad21 [quant] AO migration of the quantize.py (resubmission) (#64445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: HDCharles

Differential Revision: D30734870

fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
2021-09-08 04:58:47 -07:00
Zafar Takhirov
046ed57a4d Revert D30055886: [quant] AO migration of the quantize.py
Test Plan: revert-hammer

Differential Revision:
D30055886 (44e3ed88c9)

Original commit changeset: 8ef7470f9fa6

fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
2021-09-02 16:59:59 -07:00
Zafar Takhirov
44e3ed88c9 [quant] AO migration of the quantize.py (#64086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.

This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.

At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/opt //caffe2/test:quantization`

Reviewed By: jerryzh168, raghuramank100

Differential Revision: D30055886

fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
2021-08-29 20:30:01 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Supriya Rao
b8386f5d72 [quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691

Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation
The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel

Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled)
Original FQ module
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"}

Fused FakeQuant module (~50% improvement in latency)
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"}

Individual module benchmark result (>5x improvement in latency)
===> Baseline FakeQuantize module
```
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              aten::fake_quantize_per_tensor_affine         0.77%       1.210ms         4.92%       7.730ms     154.596us     718.528us         0.45%       9.543ms     190.862us            50
    aten::fake_quantize_per_tensor_affine_cachemask         2.41%       3.792ms         4.15%       6.520ms     130.402us       8.825ms         5.58%       8.825ms     176.492us            50
                                     aten::_aminmax         3.25%       5.105ms         4.43%       6.955ms     139.102us       8.193ms         5.18%       8.193ms     163.868us            50
                                   aten::zeros_like         1.87%       2.939ms         6.95%      10.922ms     109.218us       5.992ms         3.79%      10.844ms     108.442us           100
                                        aten::zeros         0.97%       1.527ms         3.11%       4.885ms      97.702us       2.383ms         1.51%       4.800ms      96.010us            50
                                         aten::rsub         1.34%       2.106ms         2.94%       4.614ms      92.277us       2.063ms         1.30%       4.559ms      91.173us            50
                                        aten::clamp         2.79%       4.381ms         5.42%       8.519ms      85.190us       5.385ms         3.41%       8.438ms      84.381us           100
                                           aten::eq        11.70%      18.384ms        21.31%      33.479ms      83.280us      22.465ms        14.21%      33.310ms      82.861us           402
                                         aten::ones         1.05%       1.656ms         2.57%       4.038ms      80.751us       2.494ms         1.58%       3.951ms      79.028us            50
                                           aten::le         2.52%       3.955ms         4.84%       7.607ms      76.071us       4.998ms         3.16%       7.702ms      77.016us           100
                                          aten::min         0.69%       1.087ms         2.32%       3.641ms      72.827us       1.017ms         0.64%       3.603ms      72.055us            50
                                          aten::max         1.40%       2.195ms         4.62%       7.260ms      72.597us       2.008ms         1.27%       7.140ms      71.404us           100
                                   aten::is_nonzero         2.68%       4.207ms        11.35%      17.829ms      71.033us       4.062ms         2.57%      17.225ms      68.625us           251
                                       aten::detach         1.17%       1.831ms         3.65%       5.736ms      57.360us       1.680ms         1.06%       5.634ms      56.340us           100
                                          aten::mul         3.36%       5.278ms         3.36%       5.278ms      53.862us       5.215ms         3.30%       5.215ms      53.216us            98
                                          aten::div         3.42%       5.376ms         3.42%       5.376ms      53.759us       5.320ms         3.36%       5.320ms      53.196us           100
                                          aten::sub         6.79%      10.672ms         6.79%      10.672ms      53.901us      10.504ms         6.64%      10.504ms      53.050us           198
                                         aten::item         4.06%       6.380ms        12.02%      18.883ms      53.798us       6.127ms         3.87%      18.322ms      52.198us           351
                                          aten::add         3.28%       5.147ms         3.28%       5.147ms      52.518us       5.113ms         3.23%       5.113ms      52.171us            98
                                      aten::minimum         1.63%       2.555ms         1.63%       2.555ms      51.092us       2.585ms         1.64%       2.585ms      51.708us            50
                                      aten::maximum         3.22%       5.065ms         3.22%       5.065ms      50.646us       5.133ms         3.25%       5.133ms      51.329us           100
                                        aten::round         1.61%       2.529ms         1.61%       2.529ms      50.578us       2.528ms         1.60%       2.528ms      50.552us            50
                                        aten::zero_         1.99%       3.125ms         4.72%       7.422ms      49.481us       2.835ms         1.79%       7.269ms      48.462us           150
                                        aten::copy_         6.62%      10.394ms         6.62%      10.394ms      41.576us      10.252ms         6.48%      10.252ms      41.010us           250
                                             detach         2.49%       3.905ms         2.49%       3.905ms      39.049us       3.954ms         2.50%       3.954ms      39.539us           100
                                       aten::select         2.01%       3.154ms         2.47%       3.876ms      38.759us       3.866ms         2.44%       3.866ms      38.658us           100
                          aten::_local_scalar_dense         7.96%      12.503ms         7.96%      12.503ms      35.621us      12.195ms         7.71%      12.195ms      34.743us           351
                                           aten::to         2.31%       3.625ms         4.16%       6.530ms      32.650us       4.320ms         2.73%       6.270ms      31.348us           200
                                        aten::fill_         3.70%       5.808ms         3.70%       5.808ms      29.039us       5.892ms         3.73%       5.892ms      29.459us           200
                                   aten::as_strided         0.79%       1.244ms         0.79%       1.244ms       6.221us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         3.55%       5.579ms         3.55%       5.579ms      11.137us       0.000us         0.00%       0.000us       0.000us           501
                                      aten::resize_         2.36%       3.712ms         2.36%       3.712ms      12.332us       0.000us         0.00%       0.000us       0.000us           301
                                   aten::empty_like         1.45%       2.284ms         3.68%       5.776ms      28.878us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         2.80%       4.398ms         2.80%       4.398ms      17.592us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 157.108ms
Self CUDA time total: 158.122ms
```

===> FusedFakeQuant
```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                   fb::fused_fake_quant        23.42%       6.408ms       100.00%      27.361ms     547.215us       7.887ms        27.20%      28.996ms     579.925us            50
                  aten::fake_quantize_per_tensor_affine         4.25%       1.162ms        27.65%       7.565ms     151.298us     686.176us         2.37%      10.217ms     204.336us            50
aten::_fake_quantize_per_tensor_affine_cachemask_ten...        14.11%       3.860ms        23.40%       6.403ms     128.068us       9.531ms        32.87%       9.531ms     190.612us            50
                                         aten::_aminmax        20.57%       5.628ms        27.47%       7.515ms     150.305us       8.218ms        28.34%       8.218ms     164.367us            50
                                             aten::item         3.65%     999.522us        10.27%       2.810ms      56.202us     931.904us         3.21%       2.674ms      53.481us            50
                              aten::_local_scalar_dense         6.62%       1.811ms         6.62%       1.811ms      36.212us       1.742ms         6.01%       1.742ms      34.843us            50
                                            aten::empty        10.85%       2.969ms        10.85%       2.969ms      14.843us       0.000us         0.00%       0.000us       0.000us           200
                                       aten::as_strided         1.92%     524.365us         1.92%     524.365us       5.244us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::empty_like         6.48%       1.774ms        14.62%       4.000ms      26.670us       0.000us         0.00%       0.000us       0.000us           150
                                    aten::empty_strided         8.14%       2.226ms         8.14%       2.226ms      14.842us       0.000us         0.00%       0.000us       0.000us           150
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 27.361ms
Self CUDA time total: 28.996ms
```

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuantModule

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29706889

fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1
2021-07-21 10:13:04 -07:00