Commit Graph

27 Commits

Author SHA1 Message Date
Jerry Zhang
7ddf212f33 [quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863

This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).

This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D34778506

fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
2022-03-11 17:11:30 +00:00
andrewor
4a8f27445d [Quant] Add dynamic QAT Linear module (#67325)
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325

Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`

**Reviewers:** Charles David Hernandez, Jerry Zhang

**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu

**Tasks:** 99696812

**Tags:** pytorch

Reviewed By: malfet, jerryzh168

Differential Revision: D32178739

Pulled By: andrewor14

fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
2021-11-08 10:24:25 -08:00
Zafar Takhirov
b23709df03 [ao_migration] torch/nn/quantized: torch.quantization -> torch.ao.quantization (#65900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900

This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations.

```
codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Test Plan: `python test/run_test.py`

Reviewed By: jerryzh168

Differential Revision: D31301193

fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f
2021-10-08 16:19:53 -07:00
Supriya Rao
c7027f19ef [quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799

Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).

Fixes #55393

Test Plan:
python test/test_quantization.py test_dynamic_fusion

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30502812

fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
2021-08-26 21:10:46 -07:00
Basil Hosmer
58d1b3639b fix nn.MHA scriptability (#58727)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28593830

Pulled By: bhosmer

fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
2021-05-26 15:29:49 -07:00
Zafar
e12008d110 [quant] Mapping for the _LinearWithBias (#49964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49964

`torch.nn.modules.linear._LinearWithBias` is only used in the transformers, and is completely identical to the `torch.nn.Linear`.
This PR creates a mapping so that this module would be treated the same as the Linear.

Test Plan:
```
python test/test_quantization.py TestDynamicQuantizedModule TestStaticQuantizedModule
```

Differential Revision: D25731589

Reviewed By: jerryzh168

Pulled By: z-a-f

fbshipit-source-id: 1b2697014e250e97d3010cdb542f9d130b71fbc3
2021-01-07 13:57:29 -08:00
Jerry Zhang
be2e3dd2a1 [quant][graphmode][fx][fix] Linear work with float_qparam_dynamic_qconfig (#47068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47068

Filter the dtype config before performing the quantization in linear

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D24627907

fbshipit-source-id: 162fa47b3fcf6648049f8bc0438e41ee97ac19e9
2020-11-02 16:28:33 -08:00
Gao, Xiang
37658b144b Remove useless py2 compatibility import __future__, part 1 (#43808)
Summary:
To avoid conflicts, this PR does not remove all imports. More are coming in further PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808

Reviewed By: wanchaol

Differential Revision: D23436675

Pulled By: ailzhang

fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd
2020-09-02 19:15:11 -07:00
Raghuraman Krishnamoorthi
480851ad2c Docstring changes for dynamic quantized classes (#40931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931

Fix docstrings for dynamic quantized Linear/LSTM and associated classes
ghstack-source-id: 107064446

Test Plan: Docs show up in correctly

Differential Revision: D22360787

fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2
2020-07-03 21:04:12 -07:00
Supriya Rao
25a6c5f60f [quant] Dynamic Linear module to use reduce_range (#39125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39125

switch to setting reduce_range to true for version > 3.
Models serialized with older state_dict will have version <=3 so will be run with reduce_range=false

Verified with backward compatibility tests (works with no changes to these tests)

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D21769689

fbshipit-source-id: 131f2ae736e31705222e82bdc77480f2f1826fe8
2020-05-29 18:21:57 -07:00
Jianyu Huang
8224398c14 [pytorch] Fix the extra_repr print message for float16 dynamic quantization (#36044)
Summary:
When applying the float16 dynamic quantization with
```
        model = torch.quantization.quantize_dynamic(
            model, {torch.nn.Linear}, dtype=torch.float16
        )
        print(model)
```
there is an issue when we try to print the model. Basically we cannot print the `qscheme` information for float16 weight (It is not per-tensor or per-channel quantization defined for int8 dynamic quantization).

Before this PR:
```
Traceback (most recent call last):
  File "dlrm_s_pytorch.py", line 860, in <module>
    print(dlrm)
  File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__
    mod_str = repr(module)
  File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__
    mod_str = repr(module)
  File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1136, in __repr__
    extra_repr = self.extra_repr()
  File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/quantized/dynamic/modules/linear.py", line 55, in extra_repr
    self.in_features, self.out_features, self.weight().qscheme()
RuntimeError: Could not run 'aten::qscheme' with arguments from the 'CPUTensorId' backend. 'aten::qscheme' is only available for these back
ends: [QuantizedCPUTensorId, VariableTensorId].
```

After this PR:
```
    (4): DynamicQuantizedLinear(
      in_features=2, out_features=1, dtype=torch.float16
      (_packed_params): LinearPackedParams()
    )
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36044

Differential Revision: D20860811

Pulled By: jianyuh

fbshipit-source-id: d1405a185f46a8110e6d27982b40534c854f4d1c
2020-04-05 14:27:42 -07:00
James Reed
812b1ad869 [quantization] FP16 dynamic quantized Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331

Test Plan: Imported from OSS

Differential Revision: D19441158

Pulled By: jamesr66a

fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11
2020-01-27 15:45:32 -08:00
Jianyu Huang
0bebfe2143 Add the explicit per-tensor/per-channel quant info when we print the module (#30591)
Summary:
As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module.

Here is an example for Lenet after applying the per-channel dynamic quantization:

Before this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10
    (_packed_params): LinearPackedParams()
  )
)
```

After this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591

Differential Revision: D18764366

Pulled By: jianyuh

fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a
2019-12-02 20:14:46 -08:00
James Reed
97fae401f0 Use LinearPackedParams everywhere
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198

Test Plan: Imported from OSS

Differential Revision: D18628003

Pulled By: jamesr66a

fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa
2019-11-22 11:31:17 -08:00
Lingyi Liu
7d3afc4186 enable the per channel dynamic quantization (#30122)
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122

Differential Revision: D18630541

Pulled By: lly-zero-one

fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
2019-11-21 10:12:05 -08:00
Jianyu Huang
b1ea19ca17 Update the misleading comments for zero_points and scale in dynamic quant linear module [1/2] (#28767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28767

The scale and zero_point are for the output activation tensor, not for the weight tensor. We removed them here because we don't need the zero points and scales for the output tensors in dynamic quantization.

ghstack-source-id: 92807318

Test Plan: CI

Differential Revision: D18164949

fbshipit-source-id: 0f9172bfef615c30dc28e1dd4448a9f3cc897c2e
2019-10-29 17:20:32 -07:00
Jianyu Huang
ef5a6b2262 Avoid the misleading zero_point and scale [2/2] (#28827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28827

When we print the `DynamicLinear` module, we don't want to print the scale and zero points as they are not needed for the dynamic quantization.

Let's take the output of RoBERTa model as an example:

Before this PR:
```
      (19): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (20): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
```

After this PR:
```
      (19): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (20): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
```
ghstack-source-id: 92807317

Test Plan: CI

Differential Revision: D18197022

fbshipit-source-id: e41635330cfdfb008a0468d6a8ff67a06f7e1c59
2019-10-29 12:02:45 -07:00
Zafar Takhirov
dc8785a022 Refactoing names for consistency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27670

Test Plan: Imported from OSS

Differential Revision: D17846269

Pulled By: z-a-f

fbshipit-source-id: ed3c7441c185bf11b2e62879aa3ecbc654aa2d4e
2019-10-16 12:18:26 -07:00
James Reed
4d7bec5f3e Improve repr for quantized modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27008

Test Plan: Imported from OSS

Differential Revision: D17649174

Pulled By: jamesr66a

fbshipit-source-id: e3e6c4bb31e1ad8ed1ebe27f803f90d564ecfe53
2019-09-28 15:15:14 -07:00
Dmytro Dzhulgakov
128a65e2e0 Use noop observer to pass dtype for dynamic quantization (#26709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709

Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.

One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.

Test Plan: Imported from OSS

Differential Revision: D17544103

Pulled By: dzhulgakov

fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
2019-09-24 09:24:39 -07:00
Jerry Zhang
254122dd4e quantize_linear -> quantize_per_tensor (#26574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574

Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut

Test Plan:
ci

Imported from OSS

Differential Revision: D17514876

fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
2019-09-20 21:58:48 -07:00
Supriya Rao
9d2d31e626 Store bias in PackedLinearWeight struct in fbgemm (#25428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428

Added bias as an optional param to the quantized_linear_prepack function.
Bias is quantized during runtime using input scale and weight scale.
ghstack-source-id: 89601399

Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer

Differential Revision: D17121304

fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a
2019-09-06 08:37:34 -07:00
Jianyu Huang
6cf14361f4 Add the default_weight_observer for the dynamic quantization path (#24231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24231

As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r309528932, we will add a default weight observer for the dynamic quantization path.

We need to move `observer` and `qconfig` to a separate namespace.
ghstack-source-id: 88583658

Differential Revision: D16781092

fbshipit-source-id: 5cd59c881a7f98b82704ca318b1e63650d73062a
2019-08-19 14:54:22 -07:00
James Reed
a1b111709d Assert weight_observer has the correct dtype
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24436

Test Plan: Imported from OSS

Differential Revision: D16847141

Pulled By: jamesr66a

fbshipit-source-id: 1dde5c26449115b53e71d410b41204d743787c44
2019-08-15 19:40:14 -07:00
Jerry Zhang
754bf383b1 Change return type of observer to two tensors (#24339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339

Att

Differential Revision: D16820813

fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254
2019-08-15 10:26:44 -07:00
James Reed
a919fc3704 test {__init__,from_float} on nnq{,d}.Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24364

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D16812543

Pulled By: jamesr66a

fbshipit-source-id: be05a658fa4562f3fcf3548e30b1fe9a77d1151c
2019-08-14 17:42:23 -07:00
Jianyu Huang
e94ba742b0 Dynamic Quantized Linear Module (#23128)
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module  in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232

Differential Revision: D16258664

fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
2019-08-13 21:01:23 -07:00