Commit Graph

1019 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
b3a7d696b3 dbr quant overhead[5/x]: remove unnecessary asserts (#68370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68370

Removes asserts which are duplicate (the same condition is checked
when calculating the hook type, so there is no need to check it again).
For the assert in `validate_is_at_last_seen_idx`, rewrites it to
raise an Error instead to ensure it does not get stripped in
production environments.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463766

Pulled By: vkuzo

fbshipit-source-id: 8a7b7e0bf270bc327f49bd3e5bd156339e846381
2021-11-21 07:08:09 -08:00
Vasiliy Kuznetsov
16a6e0612d dbr quant: clean up key types in AutoQuantizationState mappings (#68369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369

`AutoQuantizationState` has various mappings keyed on IDs. Only
`tensor_id_to_observer` actually needs string keys because it is an
`torch.nn.ModuleDict`.  This PR changes the other mappings to have
integer keys, for simplicity and performance.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463765

Pulled By: vkuzo

fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856
2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov
3fc9bc43c6 dbr quant overhead[4/x]: speed up hook type calculations (#68351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351

Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by
bypassing the expensive `torch.nn.Module` getters and setters and
fetching `_auto_quant_state` directly.

Test Plan:
Model level benchmarking is noisy.  Individual `cProfile` results:

```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
get_module_hook_type - 5.96%
get_torch_function_hook_type - 2.24%

// after
get_module_hook_type - 2.10%
get_torch_function_hook_type - 0.57%
```

Reviewed By: jerryzh168

Differential Revision: D32463756

Pulled By: vkuzo

fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58
2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov
c72ffee497 dbr quant overhead[3/x]: speed up AutoQuantizationState.mark_cur_op_complete (#68350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68350

`torch.nn.Module` has overhead for getting and setting attributes because
it does various type checks on the attribute.

This PR explicitly gets and sets the right thing for this particular
function, avoding the type checks. Model level benchmarks are too noisy,
but according to function level profiling this reduces the time spent in
this function in a quantized model from 2.60% to 0.53%, on MobileNetV2 with
input size 1x3x224x224.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463751

Pulled By: vkuzo

fbshipit-source-id: a29beed2a2b87ca4df675a30dd591f797c8a1dbe
2021-11-21 07:06:42 -08:00
Vasiliy Kuznetsov
c7ecf1498d dbr quant overhead[2/x]: precalculate op_convert_info (#68347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68347

Moves `op_convert_info` to be precalculated in the convert step
instead of calculated dynamically.  This should help with framework
overhead.

Test Plan:
Noisy benchmark:

```
// before

fp32: 0.016103 seconds avg
fx_prepared: 0.019841 seconds avg, 0.811601 speedup vs fp32
fx_quantized: 0.011907 seconds avg, 1.352346 speedup vs fp32
dt_prepared: 0.035055 seconds avg, 0.459357 speedup vs fp32
dt_quantized: 0.018891 seconds avg, 0.852417 speedup vs fp32

// after

fp32: 0.020535 seconds avg
fx_prepared: 0.023071 seconds avg, 0.890070 speedup vs fp32
fx_quantized: 0.011693 seconds avg, 1.756206 speedup vs fp32
dt_prepared: 0.038691 seconds avg, 0.530734 speedup vs fp32
dt_quantized: 0.021109 seconds avg, 0.972793 speedup vs fp32
```

The benchmark is too noisy to rely on, but according to `cProfiler`
this removes about 5% of overhead.

Reviewed By: jerryzh168

Differential Revision: D32463761

Pulled By: vkuzo

fbshipit-source-id: e2ad0d7eeff7dbadf3aa379604bfe9bec0c228fe
2021-11-20 15:17:12 -08:00
Vasiliy Kuznetsov
9fba8971a7 dbr quant: move model level utils into own file (#68346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346

Some utility functions for DBR quant need to be aware
of `AutoQuantizationState`.  This PR moves them into their own file, so they
can use the type directly without circular imports, and removes the mypy
ignores which are no longer necessary after this change.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463763

Pulled By: vkuzo

fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1
2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov
629f9a5532 dbr quant: clean up AutoQuantizationState.get_op_convert_info flag (#68345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68345

Removes a flag to unwrap scale and zp which was only needed by
the FX rewriter. Moves the logic to happen in the FX tracer instead.
This resolves a technical debt TODO.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463764

Pulled By: vkuzo

fbshipit-source-id: ba7c976664c95111174fb65488bdac62b4f4984d
2021-11-20 15:17:07 -08:00
Vasiliy Kuznetsov
52cc9cb0ee dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344

Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info`
instead of the current op. This will make future performance improvements
easier.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463758

Pulled By: vkuzo

fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e
2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov
2755cf457c dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343

Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to
use less internal state, makes the function have no side effects by passing
the state in the arguments, and moves the function to utils file.

This will help with a future refactor to cache this info at runtime.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463760

Pulled By: vkuzo

fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b
2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov
57472ec414 dbr quant: refactor get_quantized_op to only use seen_op_info (#68342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342

Before this PR, `get_quantized_op` required the current callable.

After this PR, `get_quantized_op` only requires `seen_op_info`.
The signature was changed slightly to return `None` if the original
callable does not need replacement for quantization.

This will make it easier to make performance improvements in a
future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463768

Pulled By: vkuzo

fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f
2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov
9cf4779ec9 dbr quant: refactor get_func_output_obs_type to only use seen_op_info (#68341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341

Before this PR, `get_func_output_obs_type` used information from the
incoming op and its arguments, which makes it hard to cache.

This PR refactors `get_func_output_obs_type` to only use information
collected during tracing. This will make it easier to make performance
improvements in a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463755

Pulled By: vkuzo

fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c
2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov
f8b084c563 dbr quant overhead[1/x]: remove expensive calls to named_modules (#68309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68309

This is the first of a series of PRs to reduce overhead of DBR quantization
prototype. For now, the measurement of this work is not super scientific as
there are a lot of low hanging fruit.  As we speed up the prototype, we
might need to invest in better benchmarking.

Current benchmarking setup:
* mac OS laptop with OMP_NUM_THREADS=1
* torchvision's mobilenet_v2
* input size 1x3x224x224
* we measure fp32 forward, prepared and quantized forward with FX quant vs DBR quant

Note that due to small input size, this benchmark is pretty noisy.
The goal here is to measure overhead of DBR quant logic (not the kernels),
so small input is good as we want the kernels to take as little % of overall
time as possible.

High level goal is for DBR quant convert forward to approach the FX time.

This first PR removes the expensive named_modules calls and resets the op
counter in the op instead. According to cProf, this should be a 2 to 3 pct win.

Test Plan:
```
benchmark: https://gist.github.com/vkuzo/1a4f98ca541161704ee3c305d7740d4a

// before

fp32: 0.020101 seconds avg
fx_prepared: 0.020915 seconds avg, 0.961083 speedup vs fp32
fx_quantized: 0.012037 seconds avg, 1.670005 speedup vs fp32
dt_prepared: 0.037506 seconds avg, 0.535953 speedup vs fp32
dt_quantized: 0.022688 seconds avg, 0.885988 speedup vs fp32

// after

fp32: 0.020722 seconds avg
fx_prepared: 0.023417 seconds avg, 0.884893 speedup vs fp32
fx_quantized: 0.014834 seconds avg, 1.396942 speedup vs fp32
dt_prepared: 0.039120 seconds avg, 0.529700 speedup vs fp32
dt_quantized: 0.020063 seconds avg, 1.032831 speedup vs fp32
```

Reviewed By: albanD

Differential Revision: D32463753

Pulled By: vkuzo

fbshipit-source-id: 1d7de7d9c4837e2b0ec815f0f67014c7600bb16c
2021-11-20 15:16:53 -08:00
Vasiliy Kuznetsov
ed6ef0eec4 dbr quantization: inline scale and zp (#68251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251

Before this PR, DBR quantization used to recalculate scale and zero_point
in the converted model every time it was needed, which is slow.
This PR creates a pass during the convert function to go through every
observer in the model and cache its scale and zero_point.

Note: only doing this for observers which correspond to int8 operations
is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: VitalyFedyunin

Differential Revision: D32463769

Pulled By: vkuzo

fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29
2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov
ca499567d2 barebones numeric suite for quantization with dynamic tracing (#67776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776

This adds a barebones `add_loggers` and `extract_logger_info` API
to analyze intermediate activations of models using quantization
with dynamic tracing.  The API generally matches the NS for FX tool,
with some omissions.  For now, this is moving fast to help us
debug real models, and the API will be 100% aligned before this is marketed to users,
in future PRs.

Note: the current approach couples Numeric Suite with the quantization
logic. This is not the best for composability, and may be changed
at a future time.

Test Plan:
```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

Differential Revision:
D32231332
D32231332

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a
2021-11-20 15:15:48 -08:00
Jerry Zhang
a545a409f8 [quant][graphmode][fx] Support input_quantized_idxs and output_quantized_idxs in the new convert (#68042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68042

att

Also added test cases from TestQuantizeFx which tests all combinations of {fp32, int8} input and output override

Test Plan:
```
python test/fx2trt/test_quant_trt.py TestConvertFxDoNotUse
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32271511

fbshipit-source-id: 87ffc00069aaff7d1c455cdd97fac82b11aa4527
2021-11-19 15:12:54 -08:00
Jerry Zhang
875ba3dddb [quant][trt] Add support for torch.addmm in TensorRT (#67537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67537

This PR adds support for quantizing torch.addmm to produce a reference quantized pattern,
and also adds support in the backend_config_dict api that allows people to specify the input, weight and bias input for each input:

```
    addmm_config = {
        "pattern": torch.addmm,
        "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
        "dtype_configs": [
            weighted_op_qint8_dtype_config,
        ],
        # a map from input type to input index
        "input_type_to_index": {
            "bias": 0,
            "input": 1,
            "weight": 2,
        }
    }
```

This requires some changes in getting weight_dtype and bias_dtype in the type inference stage of prepare, which will be added in the previous PR

Test Plan:
```
pytho test/fx2trt/test_quant_trt.py TestQuantizeFxTRT.test_addmm
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32014998

fbshipit-source-id: 8d96c1e8b7ebb2ab385c08a5b1e43f2d5a2cbcbe
2021-11-19 13:19:28 -08:00
Jerry Zhang
a6d862c50a [quant][graphmode][fx] Add support for weight and bias dtype in backend_config_dict (#68602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602

This PR adds support for configuring weight/bias dtype in backend_config_dict
and refactor the current code that checks when to insert observers

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537712

fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2
2021-11-19 13:01:50 -08:00
Ben Koopman
6c9cf5e6ea [quant][embedding qat] eager mode QAT for Embeddings (#66429)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66429

Test Plan: Imported from OSS

Reviewed By: HDCharles, supriyar

Differential Revision: D31618284

Pulled By: b-koopman

fbshipit-source-id: 0c0e2e86b98da9f29e9b2fc2a35c59424f94cbba
2021-11-18 05:57:11 -08:00
Jerry Zhang
2f37a39a5c [quant][graphmode][fx] Refactor node_name_to_target_dtype to make it more clear (#68317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68317

We use the node_name_to_target_dtype to store the target dtype for output activations for each node, computed from qconfig for the node,
there are two problems with node_name_to_target_dtype that makes it hard to work with:
1. we mutate node_name_to_target_dtype when we insert observers, this makes the data structure confusing because it's typically unexpected
to change a data structure that store the "target" dtype
2. currently it only stores target dtype about output activations, while we also need target dtype for input activation, weight and bias

This PR fixes both problem by removing mutation from the node_name_to_target_dtype and expanding the target_dtype for node to include
the missing target dtype for input activation, weight and bias. We will have another refactor to simplify the observation for weight and bias dtype
in the future.

Please see comments for the updated structure of node_name_to_target_dtype

TODO: we may want to rename node_name_to_target_dtype to node_name_to_target_dtype_info in a separate PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32411858

fbshipit-source-id: 3d76dd65056920ff8642899517bc1b95d43fc1de
2021-11-17 11:21:25 -08:00
Charles David Hernandez
09615cd0b0 Adding Dynamic Conv and ConvT ops/modules (#68176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176

it should be noted that for the modules, reduce_range is set to
true by default in a similar fashion to linear_dynamic.

Test Plan:
python test/test_quantization.py TestDynamicQuantizedModule
python test/test_quantization.py TestDynamicQuantizedConv
python test/test_quantization.py TestQuantizedConv

Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D32374003

fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3
2021-11-15 16:42:25 -08:00
Ben Koopman
f6e45102d2 [quant][embedding qat] Support non-partial functions in qconfig comparison (#68067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68067

Embedding QAT uses a NoopObserver class for activation,
and a FakeQuant for weight, make sure that qconfig comparison
functions properly for a mix of partial function and class in
qconfig.

Test Plan:
`pytest test/quantization/eager/test_quantize_eager_qat.py  -v -k "test_embedding_qat_qconfig_equal"`

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D32318434

fbshipit-source-id: c036eef9cbabe7c247745930501328e9c75a8cb0
2021-11-12 12:48:00 -08:00
Charles David Hernandez
e795315c63 Changes and fixes to prepare for dynamic conv (#68175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68175

This slightly alters the way from_float works so it will work
with placeholder observers. It also fixes a but with ConvTranspose3d and
ConvTranspose1d where the parameters like kernel_size, stride...etc
weren't set properly. New tests were added to check for this type of
issue as well.

Test Plan:
python test/test_quantization.py TestQuantizedOps
python test/test_quantization.py TestStaticQuantizedModule

Imported from OSS

Reviewed By: z-a-f

Differential Revision: D32374004

fbshipit-source-id: caaa548d12d433d9c1fa0abc8597a7d31bb4e8af
2021-11-11 23:55:04 -08:00
Vasiliy Kuznetsov
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00
andrewor
4a8f27445d [Quant] Add dynamic QAT Linear module (#67325)
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325

Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`

**Reviewers:** Charles David Hernandez, Jerry Zhang

**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu

**Tasks:** 99696812

**Tags:** pytorch

Reviewed By: malfet, jerryzh168

Differential Revision: D32178739

Pulled By: andrewor14

fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
2021-11-08 10:24:25 -08:00
Jerry Zhang
10411e3561 [quan][fusion] Fix a additional_fuser_method method for fuse_fx (#67876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67876

Previously we miss it when we call obj.convert and this argument would not impact the fusion.
This PR fixes it and adds a test for it

Test Plan:
python test/test_quantization.py TestFuseFx

Imported from OSS

Reviewed By: malfet

Differential Revision: D32191364

fbshipit-source-id: 566bd39461010d70a21de71f611bb929976fe01d
2021-11-05 14:51:15 -07:00
Charles David Hernandez
f455030931 Adding a docstring for memoryless in observer args (#67690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690

see title [skip ci]

Test Plan:
python setup.py develop

Imported from OSS

Reviewed By: ejguan

Differential Revision: D32107512

fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e
2021-11-03 12:46:44 -07:00
Jerry Zhang
54241a9cfa [quant][fx] Add support for fused modules in _convert_do_not_use (#67245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67245

Add support for fused modules in the new convert path, including linear-relu, conv{1-3}d-relu and their qat versions,
also tested with trt (conv2d-relu and linear-relu)

Test Plan:
```
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_linear_relu_module
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_conv_relu_module
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31919724

fbshipit-source-id: 7e5c96eba30706f7989da680aa3443159847bdfd
2021-11-02 19:21:54 -07:00
Jerry Zhang
acdc754918 [quant][graphmode][fx] Add support for ObservationType.OUTPUT_SHARE_OBSERVE_WITH_INPUT in backend_config_dict (#67210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67210

`OUTPUT_SHARE_OBSERVE_WITH_INPUT` is an observation type for operators that would have the same observer/fake_quant instance
as output, when quantized, these ops can take quantized Tensor as input and output a quantized Tensor with the same quantization parameters (scale/zero_point etc.) as input
Using cat as an example in this PR. Other ops can be added later gradually (together with tests).

Test Plan:
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_cat

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31907243

fbshipit-source-id: 2c7af4a456deb5e6597b0b9cd4e32c5fcdec580b
2021-10-29 02:37:48 -07:00
Onyiee
eea20bfa15 fixed type checking errors in fuse.py (#66799)
Summary:
Fixes [Issue#70](https://github.com/MLH-Fellowship/pyre-check/issues/70)
This PR fixes the type checking error that was found in fuse.py as follows:

torch/quantization/fx/fuse.py:34:13 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.

Signed-off-by: Onyemowo Agbo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66799

Reviewed By: 0xedward

Differential Revision: D31961462

Pulled By: onionymous

fbshipit-source-id: 7481afc07152ba13f3224e4ad198fd8e2c34c880
2021-10-28 07:45:28 -07:00
Jerry Zhang
0117ada47c [quant][graphmode][fx] Add input_idx_to_dtype and ouptut_idx_to_dtype to backend_config_dict (#67067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67067

Plan to gradually adding features to backend_config_dict, this PR adds support
for specifying the dtype for input and output of a given pattern

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849074

fbshipit-source-id: ca2fbb873176fe72e08ea79ed1bc659bf27cbd8a
2021-10-27 22:10:12 -07:00
Supriya Rao
1cfdb6f4c6 [quant][fx] add pass to duplicate dequant nodes with multi use (#67118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67118

Fixes a bug in the reference pattern support for nn.Linear when the same quantized input is shared across multiple Linear nodes.

This PR adds a pass to duplicate the dequant nodes for each use so that for a case like
```
x -> quant -> dequant -> linear1 - quant1
                     |
                   linear2 - quant2
```
We duplicate the dequant nodes
```
x -> quant -> dequant1 -> linear1 - quant1
            |
          dequant2-> linear2 - quant2
```
So that we can match each pattern in the loweing step

We also add a pass to remove the extra/duplicate dequant nodes that may be leftover from the above pass if we don't lower them based on pattern match

Test Plan:
python test/test_quantization.py test_ref_pattern_multi_use

Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31873511

fbshipit-source-id: aea0819222f084635157426743a50e065e6503c3
2021-10-27 18:25:35 -07:00
Jerry Zhang
4ac8d06911 [quant] Remove unused print in quantization_patterns.py (#67191)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67191

Test Plan:
sandcastle and ossci

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31899784

fbshipit-source-id: 31ad63c0b2a5328fff80c38dc4e527e0399e802e
2021-10-25 15:07:18 -07:00
Jerry Zhang
adc21f1966 [quant] Fix docs build (#67169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67169

Looks like the doc error only appears after it's landed

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31890431

fbshipit-source-id: d40cba082712c4b35704ea15d82fbc4749f85aec
2021-10-25 08:02:26 -07:00
Jerry Zhang
364c4959c3 [quant] Fix docs error in convert_fx (#67152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67152

Test Plan:
```
cd docs
make html
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31884570

fbshipit-source-id: 2b521f617c93f6fa08da3387df2d25497293eee6
2021-10-24 19:26:45 -07:00
Jerry Zhang
313939c9c6 [quant] Fix lint errors (#67138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67138

Test Plan:
ossci

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31879558

fbshipit-source-id: 271905d3d254c906aa78bae9f2bd411f9d57e1e8
2021-10-23 11:26:25 -07:00
Jerry Zhang
2d81d5ab0a [quant][graphmode][fx] Remove fbgemm_backend_config_dict for now (#67066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67066

We'll add it later when the api is ready

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849079

fbshipit-source-id: 0c00d08510166b2d897cf1562c7276527319b05c
2021-10-22 21:57:56 -07:00
Supriya Rao
8460fa5707 [quant][fx] Add an option in convert_fx to accept qconfig_dict to skip quantization (#66878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66878

Currently convert_fx quantizes all layers that have been prepared, depending on the prepare qconfig_dict
This PR adds support to accept a variation of qconfig_dict in convert_fx that can be used to specify skip quantizing certain layers

This can help with prepare/observe all operators, quantize a subset of them (based on quantization error), to avoid preparing multiple times.

The qconfig_dict passed to convert_fx can only have the values set to `None`, with the keys being the same as what is allowed in the prepare qconfig_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_convert_qconfig_dict

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31808247

fbshipit-source-id: a4f5dca1090f0083fc3fea14aff56924033eb24f
2021-10-22 21:18:15 -07:00
Supriya Rao
d13829e6be [quant][[fx] update observer_fqn to not depend on node.name (#66767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767

Make observer fqn in prepare step independent of input_node/observed_node name.
This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique.

Test Plan:
python test/test_quantization.py test_observer_fqn

Imported from OSS

Reviewed By: anjali411

Differential Revision: D31752052

fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d
2021-10-22 21:16:24 -07:00
Jerry Zhang
a7bbf8814c [quant][graphmode][fx] Move quant-fx2trt unittests to test_quantize_fx.py (#67064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67064

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849075

fbshipit-source-id: 9c5e8aad7c88070830d853faf3106491726e77ff
2021-10-22 14:36:36 -07:00
Jerry Zhang
e8742f15cf [quant][graphmode][fx] Add observation_type.py (#67063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67063

Adding ObservationType Enum for `backend_config_dict`

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849078

fbshipit-source-id: e9e7225d564b51fa9454f7f087dd134152c069a0
2021-10-22 12:17:54 -07:00
Jerry Zhang
8ea985f240 [quant][fx][graphmode] Rename files and functions for convert and add do_not_use suffix (#66955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66955

The new convert function are not meant to be used by users, it's a temporary function that
we use to build up the new convert path, we will bring feature parity with the old path
and deprecate the old path after that

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31810488

fbshipit-source-id: 2f65a110506683123350e619c48df090a15570fc
2021-10-21 22:17:28 -07:00
Jerry Zhang
f8f04d5424 [quant][graphmode][fx] Add support for single linear and conv2d (#66950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66950

Just to show that it works for weighted operations as well, qat/fused op not supported yet
We can start developing the backend_config_dict and work towards making the support more complete afterwards

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31801782

fbshipit-source-id: 8491bab7939a7a1c23ffa87c351844b82e390027
2021-10-20 19:13:27 -07:00
Jerry Zhang
a89851a0d9 [quant][fx][graphmode] Adding a new convert function that produces reference pattern by default (#66925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66925

Current convert_fx implementation is using "The Interpreter Pattern" in https://pytorch.org/docs/stable/fx.html
There are two things that's changed which make the approach in this PR possible and needed:
1). original convert implementation is developed at the initial prototype where fx does not allow mutations, now fx
supports mutations
2). original convert needs to work for a lot of fbgemm/qnnpack specific logic, which is not needed for reference patterns

Therefore it makes sense for us to write a new convert function just for reference patterns, the implementation
is significantly easier to understand than the original convert implementation

Current support:
* we should be able to support all non-weighted ops like relu, add etc.

Missing:
* linear and conv
* some advanced features like standalone modules, input_quantized_idxs etc.

will add linear and conv support and start defining the backend_config_dict based on this version of convert

Test Plan:
python test/test_quantization.py TestQuantizeFxOpsNew

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31786241

fbshipit-source-id: 2a32156eb6d3c5271cb44906cd863055785fb5d4
2021-10-20 18:54:30 -07:00
Eshika Shah
17f07c310b Fix type checking errors in torch/ao/quantization/quantize_fx.py (#66804)
Summary:
- [x] Fix the Pyre type checking errors in `torch/ao/quantization/quantize_fx.py`
```
torch/quantization/quantize_fx.py:41:8 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:143:16 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:144:16 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:206:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:230:12 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:268:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:269:8 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:427:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:464:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:486:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:547:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/76](https://github.com/MLH-Fellowship/pyre-check/issues/76)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66804

Reviewed By: onionymous

Differential Revision: D31738171

Pulled By: 0xedward

fbshipit-source-id: 00d4c5749c469aff39a1531365461ced747e52fc
2021-10-19 09:45:18 -07:00
Jerry Zhang
06e49ea088 [not4land][quant][fx][graphmode] lower reference linear module example (#65723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65723

Example lowering reference linear module to fbgemm/qnnpack quantized linear module

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31567461

fbshipit-source-id: 0b8fffaf8e742ec15cb07bf6a4672cf3e856db2d
2021-10-18 13:14:39 -07:00
Jerry Zhang
d777e490a5 [bc-breaking][quant][graphmode][fx] Produce reference patterns for GeneralShapeOps (#66647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66647

Missed in the last round ,
This adds reference patterns for general shape ops like view when is_reference is True

bc-breaking:
basically disabled getitem from supporting quantized ops here, we may support it later in fbgemm

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31680379

fbshipit-source-id: 6a3a7128514baf6d92b1607308c40339469d0066
2021-10-18 11:09:17 -07:00
Vasiliy Kuznetsov
d549c8de78 fx quant: enable linear-bn1d fusion for PTQ (#66484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66484

https://github.com/pytorch/pytorch/pull/50748 added linear - bn1d fusion
in Eager mode, for PTQ only. This PR also enables this in FX graph mode.

We reuse the existing conv-bn-relu fusion handler, renaming `conv` to
`conv_or_linear` for readability.

The QAT version is saved for a future PR, for both eager and FX graph.

Test Plan:
```
python test/test_quantization.py TestFuseFx.test_fuse_linear_bn_eval
```

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D31575392

fbshipit-source-id: f69d80ef37c98cbc070099170e335e250bcdf913
2021-10-18 10:14:28 -07:00
Ben Koopman
aa7da7b09c [quant][embedding qat] Enable quint4 in EmbeddingBag QAT workflow (#66348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66348

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31691300

Pulled By: b-koopman

fbshipit-source-id: 11bd75b608b972394fe9f7c9b7bf034af42f28b5
2021-10-18 08:51:39 -07:00
Teng Zhang
f8f9a47b02 PR3: add a workaround for reference path (#66535)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66535

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31676400

Pulled By: rahxephon89

fbshipit-source-id: fd4c8e9bbc82930cc1255fb8bf8d8ac7f0934c3f
2021-10-15 11:56:11 -07:00
Teng Zhang
17e79bc76c remove is_reference from all is_output_quantized (#66456)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66456

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31562633

Pulled By: rahxephon89

fbshipit-source-id: 85c73a23e90ba9c1406f4027d447fbbe4576e39a
2021-10-12 10:43:52 -07:00
Vasiliy Kuznetsov
565cf47abf Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66380

Description:
1. creates doc pages for Eager and FX numeric suites
2. adds a link from main quantization doc to (1)
3. formats docblocks in Eager NS to render well
4. adds example code and docblocks to FX numeric suite

Test Plan:
```
cd docs
make html
python -m http.server
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31543173

Pulled By: vkuzo

fbshipit-source-id: feb291bcbe92747495f45165f738631fa5cbffbd
2021-10-11 18:47:58 -07:00
Vasiliy Kuznetsov
8b1258698e Improve quantization API docs (#66379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379

Description:

Creates a quantization API reference and fixes all the docblock errors.

This is #66122 to #66210 squashed together

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```

Reviewed By: ejguan

Differential Revision: D31543172

Pulled By: vkuzo

fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9
2021-10-11 18:46:11 -07:00
Eshika Shah
88ed93c2ca Fix type checking errors in torch/quantization/fx/qconfig_utils.py (#66428)
Summary:
- [x] Fix the Pyre type checking errors in `torch/quantization/fx/qconfig_utils.py`
```
torch/quantization/fx/qconfig_utils.py:241:46 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/fx/qconfig_utils.py:267:46 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/fx/qconfig_utils.py:284:43 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/73](https://github.com/MLH-Fellowship/pyre-check/issues/73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66428

Reviewed By: grievejia

Differential Revision: D31545215

Pulled By: 0xedward

fbshipit-source-id: 767ae7888854c2eec2ecf14855a5b011110b9271
2021-10-11 16:48:11 -07:00
Mike Ruberry
9971113340 Revert D31447612: Create a documentation page for FX graph mode quantization APIs
Test Plan: revert-hammer

Differential Revision:
D31447612 (a89ac3138e)

Original commit changeset: 07d0a6137f15

fbshipit-source-id: f2cba7d835011500580b4ab9cff72171280ee18b
2021-10-10 01:51:13 -07:00
Mike Ruberry
b85fd4c54f Revert D31447613: Create separate documentation pages for quantization observers and fake_quants
Test Plan: revert-hammer

Differential Revision:
D31447613 (f0fa3d1110)

Original commit changeset: 63b4cf518bad

fbshipit-source-id: 67de592d1e12a5149cdb22b0725caad063f94476
2021-10-10 01:51:11 -07:00
Mike Ruberry
10633460ce Revert D31447614: Create a documentation page for torch.ao.quantization.QConfig
Test Plan: revert-hammer

Differential Revision:
D31447614 (7332ed13ed)

Original commit changeset: 5d9dd2a4e864

fbshipit-source-id: 6ac15a956222ca61f7fbb75ed36bcc58b23f0f36
2021-10-10 01:51:09 -07:00
Mike Ruberry
ad0accdecd Revert D31447610: Quantization docs: add pages for Numeric Suite (Eager and FX)
Test Plan: revert-hammer

Differential Revision:
D31447610 (9539e6216b)

Original commit changeset: 441170c4a6c3

fbshipit-source-id: b49bff54405cdb8465397077e38506a36b277921
2021-10-10 01:49:19 -07:00
Vasiliy Kuznetsov
9539e6216b Quantization docs: add pages for Numeric Suite (Eager and FX) (#66222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66222

Description:
1. creates doc pages for Eager and FX numeric suites
2. adds a link from main quantization doc to (1)
3. formats docblocks in Eager NS to render well
4. adds example code and docblocks to FX numeric suite

Test Plan:
```
cd docs
make html
python -m http.server
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31447610

Pulled By: vkuzo

fbshipit-source-id: 441170c4a6c3ddea1e7c7c5cc2f1e1cd5aa65f2f
2021-10-09 06:46:06 -07:00
Vasiliy Kuznetsov
7332ed13ed Create a documentation page for torch.ao.quantization.QConfig (#66129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66129

Adds a documentation page for `torch.ao.quantization.QConfig`. It is useful
for this to have a separate page since it shared between Eager and FX graph
mode quantization.

Also, ensures that all important functions and module attributes in this
module have docstrings, so users can discover these without reading the
source code.

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, renders correctly
```

Reviewed By: jerryzh168

Differential Revision: D31447614

Pulled By: vkuzo

fbshipit-source-id: 5d9dd2a4e8647fa17b96cefbaae5299adede619c
2021-10-09 06:45:58 -07:00
Vasiliy Kuznetsov
f0fa3d1110 Create separate documentation pages for quantization observers and fake_quants (#66125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125

Before this PR, the documentation for observers and fake_quants was inlined in the
Eager mode quantization page.  This was hard to discover, especially
since that page is really long, and we now have FX graph mode quantization reusing
all of this code.

This PR moves observers and fake_quants into their own documentation pages. It also
adds docstrings to all user facing module attributes such as the default observers
and fake_quants, so people can discover them from documentation without having
to inspect the source code.

For now, enables autoformatting (which means all public classes, functions, members
with docstrings will get docs).  If we need to exclude something in these files from
docs in the future, we can go back to manual docs.

Test Plan:
```
cd docs
make html
python -m server.http
// inspect docs on localhost, renders correctly
```

Reviewed By: dagitses

Differential Revision: D31447613

Pulled By: vkuzo

fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599
2021-10-09 06:45:56 -07:00
Vasiliy Kuznetsov
a89ac3138e Create a documentation page for FX graph mode quantization APIs (#66122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66122

Description:

Adds a documentation page for FX graph mode quantization APIs which
reads from the docstrings in `quantize_fx`, and links it from the main
quantization documentation page.

Also, updates the docstrings in `quantize_fx` to render well with reStructuredText.

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```

Reviewed By: dagitses

Differential Revision: D31447612

Pulled By: vkuzo

fbshipit-source-id: 07d0a6137f1537af82dce0a729f9617efaa714a0
2021-10-09 06:44:38 -07:00
Eshika Shah
85b562dd2b Fix type checking errors in fx/utils.py (#66311)
Summary:
- [x] Fix the Pyre type checking errors in `torch/quantization/fx/utils.py`
```
torch/quantization/fx/utils.py:490:4 Incompatible variable type [9]: target_module_type is declared to have type `Type[nn.modules.module.Module]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/75](https://github.com/MLH-Fellowship/pyre-check/issues/75)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66311

Reviewed By: pradeep90

Differential Revision: D31506399

Pulled By: 0xedward

fbshipit-source-id: 3d866fba6005452378d4a2613b8689fa2d7a8b67
2021-10-08 19:14:22 -07:00
Ben Koopman
a58ff186e8 [quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443

Test Plan: Imported from OSS

Reviewed By: dagitses, supriyar

Differential Revision: D31456445

Pulled By: b-koopman

fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de
2021-10-07 20:19:29 -07:00
Supriya Rao
8a974a482c [quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674

Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules.
With this PR they can use either the static or dynamic quantization APIs for Embedding quantization

The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float
method of the quantized Embedding/Embedding modules.

To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type.

The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32.

Addresses Issue #65185
ghstack-source-id: 139935419

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: gchanan

Differential Revision: D31211199

fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4
2021-10-06 23:19:38 -07:00
Peter Bell
747a5782e3 [quant][fx] Don't assume bias is a keyword argument (#61647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61647

`prepare_fx` currently assumes that bias is always a positional argument to
convolutions, and only a keyword argument to other functions. This happens to work
today due to a quirk in how `__torch_function__` is handled for python
functions but shouldn't be considered stable.

Instead, we should support `bias` for both positional and keyword forms.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31401360

Pulled By: albanD

fbshipit-source-id: 1e2f53d80e2176b870f326dc498e251e2386136e
2021-10-06 07:25:47 -07:00
Zafar
0d020effab [quant] Fix the parts that were missing after initial migration (#66058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058

After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change.
This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location.
This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace.

Test Plan: `python test/test_quantization.py`

Reviewed By: vkuzo

Differential Revision: D31366066

Pulled By: z-a-f

fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5
2021-10-05 11:45:37 -07:00
Supriya Rao
458a00bacb Back out "[quant] update fused_obs_fake_quant op to accept output_fake_quant argument" (#66063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66063

Original commit changeset: bffe776216d0

Test Plan: CI

Reviewed By: vkuzo

Differential Revision: D31347042

fbshipit-source-id: f56f628dc4690187bf284a8f2fda4c6aae10c1d6
2021-10-05 11:02:54 -07:00
Zafar
c27b427cd9 [sparsity] Add m-out-of-n support in the WeightNormSparsifier (#65295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295

The m-out-of-n is implemented as follows:

1. Compute the blocks that need to be sparsified using the weight-norm criterion
2. Within each block below the threshold find the smallest absolute value elements
3. Zero out only the smallest values within each block

m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out.
Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset.

This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835,
as well as meets the support of the nVidia cusparselt requirements.
To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0.
That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme.

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31186828

Pulled By: z-a-f

fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618
2021-10-01 03:19:15 -07:00
Zafar
8b1aa85388 [sparsity] Change API to take FQNs as configuration (#65296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65296

The original API described in the https://github.com/pytorch/pytorch/issues/59835
assumed that the per-layer configuration would take a module/layer
reference. However, a more useful approach is to refer to the layers
by their fully qualified names (FQN). That allows us to store the
configuration in a file without serializing the models.

We define a layer's FQN as it's "path" within a model. For example,
if one can refer to a model using `model.layer0.sublayerX`, the FQN
of the sublayerX is `'layer0.sublayerX'`.

Test Plan:
```
python test/test_ao_sparsity.py -- TestBaseSparsifier
buck test mode/opt //caffe2:test -- TestBaseSparsifier
```

Reviewed By: gchanan

Differential Revision: D31186830

Pulled By: z-a-f

fbshipit-source-id: d8d87f1c054e5c10d470e67837476a11e0a9b1d4
2021-10-01 03:17:31 -07:00
Supriya Rao
4666e3f192 [quant] update fused_obs_fake_quant op to accept output_fake_quant argument (#65621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65621

Add a new attribute to the FusedMovingAvgObsFakeQuantize that controls if the Fake Quant operation should be applied at the output of a particular layer. The motivation is to give the users additional control to control the numerics of the fake_quant operators during training. It defaults to always fake quant the output (True).

Note: We will still observer the tensors as before (only the fake_quant operation is controlled using this flag)

For example
```
input model
x -> fc1 -> fc2 -> non_quantizable_op -> fc3

After fake_quant
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> fake_quant(fc2) -> non_quantizable_op -> fake_quant() -> fc3 -> fake_quantize(fc3)

With output_fake_quant disabled at the output of fc2 and fc3 (since their outputs are non-quantizable)
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> non_quantizable_op -> fake_quant() -> fc3
```

Test Plan: ./buck-out/gen/caffe2/test/quantization_fx\#binary.par -r test_disable_output_fake_quant

Reviewed By: jerryzh168

Differential Revision: D31174526

fbshipit-source-id: bffe776216d041fb09133a6fb09bfc2c0bb46b89
2021-09-30 01:08:01 -07:00
Charles David Hernandez
6d4b93bd96 [quant] adding memoryless observers for embeddingbag QAT work (#65699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699

related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425

The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters.

This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory)

In addition to the above, I altered the reset_min_max_vals
function for MinMaxObserver so that it would preserve the device of the
existing self.min_val and self.max_val which was not preserved
previously compared to how it is initialized (using factory_kwargs)

Test Plan:
python test/test_quantization.py TestObserver

(added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver)

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31209773

fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a
2021-09-30 00:55:32 -07:00
Zafar Takhirov
c7ef620a14 [quant] Add imports to the torch/ao/quantization/__init__.py (#64911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64911

The import statements that involve the `quantize.py` were not added to the module level __init__ file. Those imports are necessary to mimic the behavior of the old import locations. Otherwise, the user would need to change their import statements to `from torch.ao.quantization.quantize import quantize` (instead of `from torch.ao.quantization import quantize`.

Another change in this diff is that we don't use `__all__` anymore. The all dunder was never used in quantization anyway, and just creates a potential bug when using `from ... import *`.
ghstack-source-id: 139342483

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30897663

fbshipit-source-id: a7b4919a191755e3ba690a79ce3362889f416689
2021-09-29 19:08:45 -07:00
Zafar
609384c056 [sparsity][doc] Docstring for WeightNormSparsifier (#65294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65294

This adds the docstring documentation to the WeightNormSparsifier and adds the typehints for the constructor args.
Note, this does not require testing as only the doc is changed.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186827

Pulled By: z-a-f

fbshipit-source-id: c5010c9bba25b074c4cc6c88f251474b758f950d
2021-09-28 14:14:51 -07:00
Zafar
92ee5cc2e2 [sparsity] Fix for accumulation bug in WeightNormSparsifier (#65293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65293

This fixes a bug in the WeightNormSparsifier, where the mask is being multiplied by the newly computed mask.
Because the mask elements are binary 0/1, this accumulates the mask over every iteration, eventually collapsing the mask to zero.
This bug accidentally bled through from old versions.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186829

Pulled By: z-a-f

fbshipit-source-id: 3f5b2c833148ab0bd8084e7410ce398f1252e65e
2021-09-28 14:14:49 -07:00
Zafar
a90912ecc5 [sparsity] Remove the pack_param from the sparsifier state_dict (#65292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65292

That was the original design, that we decided to simplify by removing the packing in the sparsifier.
The state of the sparsifier is saved directly, and the old behavior accidentally bled through to the current version.
This change removes the `_pack_params` method, and changes the state_dict to include the state directly.
We don't have to change the load_state_dict, as it will work with either the old or the new format.

The main reason for this PR is the simplification. The original design didn't achieve anything useful by packing the sparsification parameters.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186826

Pulled By: z-a-f

fbshipit-source-id: 4ad72a7e669f048d2f2d269269ee11b63fa169db
2021-09-28 14:12:52 -07:00
Jerry Zhang
b77c979102 [quant][fx][graphmode] Make FixedQParam ops work for dtypes other than quint8 (#65484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65484

This PR makes sure we only use FixedQParamFakeQuantize for quint8 dtype and allows user
to use other dtypes for ops like sigmoid, this is useful for producing reference pattern for
these ops that can be used in other backends like TensorRT

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31120377

fbshipit-source-id: 3b529d588e2b6ff0377a89c181f6237f8f0cc2f5
2021-09-23 18:29:56 -07:00
Supriya Rao
767a104698 [quant] change observer FQNs generated in prepare step (#65420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65420

Context: In some FB use cases we have a need to map observer stats from train model checkpoint to inference model. We observerd that some buffer names are different becuase the intermediate activation tensors
are generated differently across train and inference model. More details in https://fb.quip.com/PtGcAR0S5CQP

Currently, for each observer (activation_post_process), the FQN of the module inserted is determined based on the FQN of the input tensor it is observing.

In this change we change the observer FQN to include the FQN of the op/module it is observing rather than tensor/intermediate op names along with the “input”/“output” detail.

Before
```
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    mods1_w = self.mods1.w
    mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w);  mods1_w = None
    mods1_b = self.mods1.b
    linear = torch.nn.functional.linear(x_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b);  x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None
    linear_activation_post_process_0 = self.linear_activation_post_process_0(linear);  linear = None
    return linear_activation_post_process_0
```

After
```
def forward(self, x):
    mods1_input_activation_post_process_0 = self.mods1_input_activation_post_process_0(x);  x = None
    mods1_w = self.mods1.w
    mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w);  mods1_w = None
    mods1_b = self.mods1.b
    linear = torch.nn.functional.linear(mods1_input_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b);  x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None
    mods1_output_activation_post_process_0 = self.mods1_output_activation_post_process_0(linear);  linear = None
    return mods1_output_activation_post_process_0
```

Test Plan:
python test/test_quantization.py test_observer_fqn

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31088652

fbshipit-source-id: 2f1526f578a13000b34cfd30d11f16f402fd3447
2021-09-23 09:08:10 -07:00
Jerry Zhang
508845f2b5 [quant] AO migration of the torch/quantization/quantize_fx.py and torch/quantization/fx/* (#65033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033

1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: vkuzo, z-a-f

Differential Revision: D30949749

fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
2021-09-22 09:29:15 -07:00
Yuan Shangguan (June)
3f5f721ab3 Pass through allow-list from prepare_qat into propagate_qconfig_ to allow custom mapping and custom QAT module (#65119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65119

Pytorch Quantization: allow prepare_qat to include custom module by passing allow_list into the prepare_qat.

When we are implementing custom module and custom mapping for Quantization Aware Training (QAT), we need to add the custom module to the mappings and to the allow_list during prepare_qat. The allow_list needs to be surfaced to the  propagate_qconfig_.

Test Plan: relying on general unit test

Reviewed By: supriyar

Differential Revision: D30982060

fbshipit-source-id: 1114115b6a3b853238d33d72b5cbaafc60f463e0
2021-09-21 17:15:25 -07:00
Zafar Takhirov
02dec91212 [quant] AO migration of the torch/quantization/utils.py (phase 1) (#64919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id: 138303325

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: jerryzh168

Differential Revision: D30899082

fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9
2021-09-16 21:30:18 -07:00
Charles David Hernandez
8a094e3270 [quant]ao migration for quantization mappings and fuser method mappings hg mv (#64985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64985

moving quantization_mappings.py and fuser_method_mappings.py to the ao folder while retaining backwards compatibility

also added dict test

ghstack-source-id: 138215312

Test Plan:
buck test mode/dev //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testrun/7036874471986444

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/5348024625792701

Reviewed By: z-a-f

Differential Revision: D30982551

fbshipit-source-id: 00f53bd44009d6012a7de852000aad6885131edb
2021-09-16 12:59:20 -07:00
Charles David Hernandez
f309f8fbd4 [quant] ao migration of observer and qconfig (#64982)
Summary:
(Had to recreate this diff so it wasn't dependent on the stack)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982

migration of qconfig.py and observer.py to torch/ao/quantization using new test format
ghstack-source-id: 138215256

Test Plan:
buck test mode/opt //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/3940649742829796

Reviewed By: z-a-f

Differential Revision: D30982534

fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9
2021-09-16 10:33:16 -07:00
Zafar Takhirov
e0ecd09011 [quant] AO migration of the _correct_bias.py, _equalize.py, and _learnable_fake_quantize.py (#64917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates from torch.quantization to torch.ao.quantization the following files:
- `_correct_bias.py`
- `_equalize.py`
- `_learnable_fake_quantize.py`

**Note:** These file are migrated completely without any warning. The old location is thus silently deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection`

Reviewed By: vkuzo

Differential Revision: D30898565

fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd
2021-09-15 18:15:39 -07:00
Zafar Takhirov
c151d62f45 [quant] AO migration of the quant_types.py (phase 1) (#64916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quant_type.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898422

fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff
2021-09-15 17:30:00 -07:00
Zafar Takhirov
a42996f16e [quant] AO migration of the fuse_modules.py (phase 1) (#64913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the fuse_module.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30882819

fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4
2021-09-15 17:28:47 -07:00
Vasiliy Kuznetsov
6101cbcedb torch.ao migration: fake_quantize.py, phase 1 (#64814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814

1. move the file
```
hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/
```

2. create a new file in the old location and copy the imports
3. fix all callsites inside `torch`

Test Plan:
```
buck test mode/dev //caffe2/test:quantization
```

Reviewed By: z-a-f

Differential Revision: D30866792

fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e
2021-09-13 15:22:28 -07:00
Supriya Rao
3d976d9ceb torch.ao migration: quantize_jit.py phase1 (#64860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64860

ghstack-source-id: 137885395

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: jerryzh168

Differential Revision: D30880574

fbshipit-source-id: 9629027dd3b00bb8d45633e1564fc03a866f8c31
2021-09-13 08:41:48 -07:00
Supriya Rao
9d52651d4e torch.ao migration: stubs.py phase 1 (#64861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64861

1. move the file
  ```
  hg mv caffe2/torch/quantization/stubs.py caffe2/torch/ao/quantization/
  ```

  2. create a new file in the old location and copy the imports
  3. fix all call sites inside `torch`
ghstack-source-id: 137885365

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: jerryzh168

Differential Revision: D30879678

fbshipit-source-id: a2d24f25d01064212aca15e94e8c78240ba48953
2021-09-13 08:40:29 -07:00
Vasiliy Kuznetsov
1577c106dc torch.ao migration: numeric suite, eager and fx (#64817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64817

This migrates `torch.quantization._numeric_suite` to `torch.ao.ns._numeric_suite`, and `torch.quantization._numeric_suite_fx` to `torch.ao.ns._numeric_suite_fx`.

1. move the files
```
HG: move eager mode
hg mv caffe2/torch/quantization/_numeric_suite.py caffe2/torch/ao/ns/
HG: move fx
hg mv caffe2/torch/quantization/_numeric_suite_fx.py caffe2/torch/ao/ns/
hg mv caffe2/torch/quantization/ns/* caffe2/torch/ao/ns/fx/
```

2. create new versions of `_numeric_suite.py` and `_numeric_suite_fx.py` with
imports

3. update all FB callsites

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: z-a-f

Differential Revision: D30867538

fbshipit-source-id: 120ee830434ca490c1183a187a518eebcbbaf22c
2021-09-12 12:00:45 -07:00
Zafar Takhirov
9cc44aad21 [quant] AO migration of the quantize.py (resubmission) (#64445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: HDCharles

Differential Revision: D30734870

fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
2021-09-08 04:58:47 -07:00
Zafar Takhirov
046ed57a4d Revert D30055886: [quant] AO migration of the quantize.py
Test Plan: revert-hammer

Differential Revision:
D30055886 (44e3ed88c9)

Original commit changeset: 8ef7470f9fa6

fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
2021-09-02 16:59:59 -07:00
Zafar Takhirov
44e3ed88c9 [quant] AO migration of the quantize.py (#64086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.

This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.

At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/opt //caffe2/test:quantization`

Reviewed By: jerryzh168, raghuramank100

Differential Revision: D30055886

fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
2021-08-29 20:30:01 -07:00
Karen Zhou
6257f5b168 [pruner] add README to repo (#64099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64099

adding readme to pruner in OSS
ghstack-source-id: 136867516

Test Plan: should not affect behavior

Reviewed By: z-a-f

Differential Revision: D30608045

fbshipit-source-id: 3e9899a853395b2e91e8a69a5d2ca5f3c2acc646
2021-08-27 11:52:59 -07:00
Karen Zhou
eebac46282 [pruner] add getter for pruned outputs in base pruner (#63520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63520

Rather than having to call `module.parametrizations.weight[0].pruned_outputs` each time we need to access the set of pruned indices, we add a getter `get_module_pruned_outputs` which takes the module as an argument and returns the set.

This is used for testing.
ghstack-source-id: 136561130

Test Plan:
` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1N4gK

Reviewed By: z-a-f

Differential Revision: D30374558

fbshipit-source-id: e38dfee0879cadde52b942e899a3d8d7151ee493
2021-08-25 09:57:29 -07:00
Karen Zhou
83b132b112 [pruner] add support for pruning BatchNorm2d (#63519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63519

If the pruner should be pruning biases along with weights, then if the model has BatchNorm2d following pruned Conv2d layers, then the corresponding channels of the BatchNorm must also be pruned.

Specifically, they need to zeroed out, rather than fully removed, since in eager mode, the dimensions between layers need to be preserved.

To do this, we add a pruning parametrization called `ZeroesParametrization` which zeroes out pruned channels, rather than removing them.

The user must provide in the config, a tuple of the Conv2d and BatchNorm layers that go together. The `prepare` method will add the tuple to the `module_groups`; then it will add a PruningParametrization to the Conv2d layer, and a ZeroesParametrization to BatchNorm, and then set their pruned sets to be the same set. That way, during `step`, both masks are updated with the same pruned indices.

ghstack-source-id: 136562278

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1N1P6

Reviewed By: z-a-f

Differential Revision: D30349855

fbshipit-source-id: 3199d3688d5a70963f9b32d7a8fdac3962ae6a65
2021-08-25 09:56:19 -07:00
Karen Zhou
1256dcd509 [pruner] modify base pruner to prune bias by default (#63202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202

By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned.
ghstack-source-id: 136466671

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MV32

modify `fusion_tests` according to API change
`buck test mode/opt //scripts/kazhou:fusion_tests`

https://pxl.cl/1NbKz

Reviewed By: z-a-f

Differential Revision: D30294494

fbshipit-source-id: c84655648bee0035559195ca855b98fb7edaa134
2021-08-24 10:25:45 -07:00
Karen Zhou
16ba20507a [pruner] amend base pruner API to match base sparsifier (#63178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178

Update base pruner API to match base sparsifier API as defined in D28970960 / PR58955

Changes include:
- `enable_mask_update = True` in `__init__`
- `prepare` takes model and config instead of constructor
- convert functionality renamed to `squash_mask`, `convert` method call now raises Error
- `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor
ghstack-source-id: 136467595

Test Plan:
Function names updates according to changes

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MTgH

TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API

Reviewed By: z-a-f

Differential Revision: D30287179

fbshipit-source-id: d4727bea1873b500f2d4bb784db26d532bf26cce
2021-08-24 10:25:43 -07:00
Karen Zhou
5dee15401c [pruner] refactor ActivationReconstruction forward hooks (#63158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158

Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`.
ghstack-source-id: 136467465

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MSSv

Reviewed By: raghuramank100

Differential Revision: D30282765

fbshipit-source-id: 08a1e4e0650511019fff85cf52b41dd818b0c7f8
2021-08-24 10:24:29 -07:00
Karen Zhou
d45291613c [pruner] generalize bias hook for conv2d (#62430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62430

The bias hook is a forward hook that is part of the pruning parametrization; it is attached after the activation reconstruction forward hook, so adding the bias occurs after zeros are reinserted to the pruned activation.

This diff/PR amends the bias hook to work for Conv2d layers, in addition to Linear layers. The reshaping of the ._bias parameter ensures that it is added to the right dimension of the output.
ghstack-source-id: 135097700

Test Plan:
Added tests for `Conv2dB()`, a model with Conv2d layers that have `bias=True`.

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MfgL

Reviewed By: jerryzh168

Differential Revision: D29979571

fbshipit-source-id: c1a7e9fabc8b3c9d0050bd6b6c6a631ddfdf2a68
2021-08-05 09:27:17 -07:00
Karen Zhou
3687bbb1ed [pruner] add Conv2d support (#61778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61778

Adding Conv2d as supported modules for the pruner. Previously the pruner only supported Linear layers. This addition includes:
- adding a Conv2d activation reconstruction forward hook to match Conv2d weight shapes
- in `prepare`, checking the type of the module and using the corresponding activation forward hook
ghstack-source-id: 134143557

Test Plan:
Added conv2d tests
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1LLf3

Reviewed By: jerryzh168

Differential Revision: D29719045

fbshipit-source-id: 6a9f91b96992c552fff32f0e5a6e22f16eb7077b
2021-07-22 23:00:31 -07:00
Karen Zhou
9b3cbeaf7d [pruner] fix activation handles logic (#61592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61592

Add activation handles for each layer (stored in a list), so they can each be removed.

We don't remove them in the `convert` in eager mode because we aren't modifying output/input layer dimensions. We will need this in Fx mode though.
ghstack-source-id: 133497376

Test Plan:
Added some tests to make sure `model(x)` runs without error.

`buck test mode/dev-nosan //caffe2/test:ao --
TestBasePruner`

https://pxl.cl/1LBf4

Reviewed By: z-a-f

Differential Revision: D29682789

fbshipit-source-id: 9185702736e5f7f4320754ffef441610738ac154
2021-07-14 11:07:23 -07:00
Karen Zhou
962c9fbf85 [pruner] add handles for hooks (#61425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61425

Adding handle for activation reconstruction and bias forward hooks so they can be removed later
ghstack-source-id: 133244536

Test Plan:
This change should not affect behavior yet, but to double check:

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1LpM9

Reviewed By: z-a-f

Differential Revision: D29619720

fbshipit-source-id: c7428d2d0325cd11ce7919e0b67321e8cc196041
2021-07-09 11:28:35 -07:00
Karen Zhou
21ad978d4f [sparsity] rename sparsity_pattern to sparse_block_shape (#59898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59898

In `weight_norm_sparsifier`, the name of the argument `sparsity_pattern` is not intuitive for an argument describing the shape of the sparse block. It has been changed to `sparse_block_shape`.

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier`
https://pxl.cl/1LhRM

Reviewed By: z-a-f

Differential Revision: D29077045

fbshipit-source-id: 0cf9c5387d41ca8e839ee050d71f4fe477374143
2021-07-07 15:27:16 -07:00
Zafar
05c1e5b655 [sparsity] Lambda Scheduler (#59771)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59771

Implements a specific sparsity scheduler, that uses a user-provided lambda's to change the levels.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D29070604
D29070604

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: c7ccbe63fe4cd6a0c3563541b7fcf93a99d0e62f
2021-07-02 21:39:38 -07:00
Zafar
37ebf2e3cd [sparsity] Base sparsity level scheduler class (#59770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59770

Implements the base scheduler class for changing the sparsity levels in the sparsifier.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D29070603
D29070603

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 0b160e4eb0a2a303d2d19e6a3beb4784002b2cb7
2021-07-02 21:38:24 -07:00
Zafar
d42f1751d4 [sparsity] WeightNormSparsifier (#58955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58955

Implements the weight norm sparsifier.
This type of sparsifier computes the norm of the weights, sorts them, and zeroes-out the target fraction of them.

The main imeplemented method is `update_mask`, which holds the main logic of changing the masks.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970960
D28970960

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 8f2a4360ad877f430cdc1065c6777106938b58d5
2021-07-02 17:35:27 -07:00
Zafar
7ab2729481 [sparsity][refactor] Import factoring out (#58707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58707

Minor refactor that changes the format of the import.
This is done to avoid accidental circular dependencies.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970961
D28970961

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: c312742f5e218c435a1a643532f5842116bfcfff
2021-07-02 16:32:39 -07:00
Zafar
973e9266ff [sparsity] Sparsifier class (#58704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58704

Implements the base sparsifier class based on the #59835 RFC documents.

This PR implements the base class for the sparsification. Specifically, the prepare method is implemented.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970958
D28970958

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 0ef98a445c0a0aca22ce5708e34a9f94606d0e2b
2021-07-02 16:31:21 -07:00
Zafar
80cab10534 [sparsity] Sparsity parametrization (#58705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58705

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970959
D28970959

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 2a0bea1e0a81816690e05f83051d607c90925d32
2021-07-02 11:12:31 -07:00
Zafar
5d34b7955b [sparsity][refactor] Changing linear row/col control (#60850)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60850

Test Plan:
```
python test/test_ao_sparsity.py
```

```
python test/test_ao_sparsity.py
```

Differential Revision:
D29465900
D29465900

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 412f50da857f377898fea79d378ae54a049b81fe
2021-07-02 11:12:30 -07:00
Karen Zhou
ca2702a776 [pruner] Make bias hook stateless (#61077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61077

Removing `BiasHook` class, using function instead.
ghstack-source-id: 132899223

Test Plan:
` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1L7Tg

Reviewed By: z-a-f

Differential Revision: D29504119

fbshipit-source-id: 6dd9689d18b17ac64e8a461f466e2c9018bc530b
2021-07-01 14:59:00 -07:00
Karen Zhou
0a7875231b [pruner] Add bias support (#60970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60970

Support adding bias in eager mode
ghstack-source-id: 132695883

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1L3K3

Reviewed By: z-a-f

Differential Revision: D29441499

fbshipit-source-id: 47e0fff5b3014612bd021e145160ea54e2645e24
2021-07-01 14:57:09 -07:00
Karen Zhou
007ba37c9a [pruning] Speedup activation reconstruction (#60683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60683

Vectorized reconstruction without for loops

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KSJQ

Reviewed By: z-a-f

Differential Revision: D29370805

fbshipit-source-id: 75402437654a0b6f6391c8590bbe3f6fe3f43d8f
2021-06-28 12:58:21 -07:00
Karen Zhou
8d4a6ef962 [pruning] Activation reconstruction (#60292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60292

Added activation reconstruction in the `reconstruct` method

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KLl1

Reviewed By: z-a-f

Differential Revision: D29236569

fbshipit-source-id: 1ad085f4143eb9fa3efca51e00d810e0fdb7e9b1
2021-06-28 12:58:18 -07:00
Karen Zhou
71b83c27e2 [pruning] Move pruning directory into experimental folder (#60395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60395

Experimental folder so other developers know this is work in progress

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KGJD

Reviewed By: z-a-f

Differential Revision: D29272319

fbshipit-source-id: 93eeeceba0376753efc9a5bb69a155278ceb2fca
2021-06-22 11:08:48 -07:00
Karen Zhou
f75ea51e67 [pruning] Move pruning files to their own directory (#60293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60293

Move pruning files to their own directory

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1KCfz

Reviewed By: z-a-f

Differential Revision: D29238159

fbshipit-source-id: 0173a278b39ff5ee4cbd54f333f558b6fe412be5
2021-06-22 11:08:47 -07:00
Karen Zhou
b25db5251a [pruning] Base pruner class (#60278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60278

Implemented `PruningParametrization`, which removes pruned rows, and `BasePruner`, which is the base class for structured pruning.

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KC2n

Reviewed By: z-a-f

Differential Revision: D29208349

fbshipit-source-id: f34e8e258bf13fa80292c2bd64d56f5ad1e72b6a
2021-06-22 11:07:31 -07:00
Zafar
b0fd3ca542 [sparse] Add the AO namespace to torch (#58703)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58703

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28970962

Pulled By: z-a-f

fbshipit-source-id: 0d0f62111a0883af4143a933292dfaaf8fae220d
2021-06-09 19:47:21 -07:00
Zafar Takhirov
375687839e [sparsity] Moving the sparsity python files to OSS (#56617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56617

This migrates the sparsity to the open source

Test Plan: `buck test mode/opt //caffe2/test:ao`

Reviewed By: raghuramank100

Differential Revision: D27812207

fbshipit-source-id: cc87d9d2b486269901a4ad9b483615741a1cd712
2021-04-22 14:07:31 -07:00