Commit Graph

829 Commits

Author SHA1 Message Date
Lingyi Liu
68758b2fa0 Add the quantized batch_norm3d and also batch_norm3d fused with relu operators (#34702)
Summary:
as title, for bringing up the quantized video model. Will add the batch_norm_relu test in another PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34702

Differential Revision: D20436092

Pulled By: lly-zero-one

fbshipit-source-id: 116bd306f7880bfd763d8575654fbd6c92818338
2020-03-13 20:30:28 -07:00
Jerry Zhang
90ca7a1feb [quant][graphmode] Add Finalize function that inlines graph and produce quantized ops (#33927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927

Test Plan:
test will be added in later PRs

Imported from OSS

Differential Revision: D20354879

fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404
2020-03-12 14:52:58 -07:00
Supriya Rao
434af5d94a [quant] Speed up per-channel min-max observer (#34118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34118

Previously calc_per_channel_qparams was using for loops and python primitives, which called `item` many times causing slowdown during training.
    These changes uses torch primitives on the tensor to speed up the operation over 60x

    Perf results on MobileNetV2 during training using autograd profiler

    FP32 forward call -
    Self CPU time total: 47.222ms
    CUDA time total: 124.001ms

    before change
    FakeQuant Model -
    Self CPU time total: 19.107s
    CUDA time total: 27.177s

    after change
    FakeQuant Model -
    Self CPU time total: 404.667ms
    CUDA time total: 446.344ms

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D20287841

fbshipit-source-id: 6b706b8206e0d0da3c3c217b014e8da5b71b870d
2020-03-05 18:29:41 -08:00
Supriya Rao
1cf12b7e53 [quant] Fix histogram observer to work with QAT on GPU (#34232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34232

By default `torch.zeros` creates the tensor on GPU. Need to specify the device argument to get it to work correctly on GPU during QAT.

Test Plan:
1. Tested by running QAT on GPU

2. python test/test_quantization.py

Imported from OSS

Differential Revision: D20286351

fbshipit-source-id: 745723c85d902870c56c1c7492f26cb027ae9dc6
2020-03-05 17:19:12 -08:00
Dmytro Dzhulgakov
a8fc3d8c2a Fix HistogramObserver to not do detach on input (#34114)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33545, added a unittest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34114

Differential Revision: D20224719

Pulled By: dzhulgakov

fbshipit-source-id: 053d3b3b0c86340027ba1b95b5f3c247aa151aee
2020-03-03 13:15:22 -08:00
Gao, Xiang
45e4b614d1 Per channel quantization performance improvement (#33772)
Summary:
Benchmark:
NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X
```python
import torch
print(torch.__version__)

for i in range(1000):
    torch.randn(1024 * 128, device='cuda')

def cuda(e):
    a = torch.randn(2 ** e, 32, device='cuda')
    s = torch.randn(32, device='cuda')
    z = torch.randn(32, device='cuda')
    torch.cuda.synchronize()
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize()

def cpu(e):
    a = torch.randn(2 ** e, 32, device='cpu')
    s = torch.randn(32, device='cpu')
    z = torch.randn(32, device='cpu')
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999);

for i in range(10, 24):
    cuda(i)
print()
for i in range(10, 32):
    cpu(i)
```
Before
```
1.5.0a0+9bc922d
849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
After
```
1.5.0a0+a7ec8cc
92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Fixes https://github.com/pytorch/pytorch/issues/33647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772

Differential Revision: D20112531

Pulled By: ngimel

fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f
2020-02-26 10:19:25 -08:00
Supriya Rao
996c0adb53 [quant] Regsiter fake_quant and observer attributes as buffers (#33626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626

For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest.

Test Plan:
Tested on actual model on GPU

Imported from OSS

Differential Revision: D20038839

fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4
2020-02-24 14:16:03 -08:00
Raghuraman Krishnamoorthi
243cc20451 Enable inplace relu fusion for training (#33105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33105

Support inplace relu for Conv+BN+Relu fusion during training.
ghstack-source-id: 97944659

Test Plan: buck test caffe2/test:quantization --  'test_fuse_module_train \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D19795221

fbshipit-source-id: 056dc06050d145750c4d0044c0fc1c3febcfdafc
2020-02-14 12:15:58 -08:00
Supriya Rao
2e88d3d703 [quant] Add Quantized BatchNorm2d module (#33109)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33109

Test Plan:
python test/test_quantized_nn_mods.py ModuleAPITest.test_batch_norm

Imported from OSS

Differential Revision: D19861926

fbshipit-source-id: 67315e49b4b3577b965d422ca707d927d977feeb
2020-02-13 12:15:43 -08:00
Jerry Zhang
8ddd5bb0e9 Don't serialize None values in observer (#32733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32733

Similar to https://github.com/pytorch/pytorch/pull/32318, we should stop serializing None values since they can't be broadcasted

Test Plan: Imported from OSS

Differential Revision: D19611586

Pulled By: jerryzh168

fbshipit-source-id: 369881de0567ed8eb25bdada892227f49bb5b29d
2020-01-31 13:28:43 -08:00
James Reed
812b1ad869 [quantization] FP16 dynamic quantized Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331

Test Plan: Imported from OSS

Differential Revision: D19441158

Pulled By: jamesr66a

fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11
2020-01-27 15:45:32 -08:00
Jerry Zhang
91f10a1de1 [quant][graphmode][refactor] Better API for fold_convbn (#32380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32380

We'll clone the module first and then fold conv bn and return a new
module

Test Plan:
.

Imported from OSS

Differential Revision: D19508033

fbshipit-source-id: 328e91a2c9420761c904a7f2b62dab4cfaaa31ac
2020-01-24 15:46:47 -08:00
Jerry Zhang
d2bda53f6d [quant][graphmode] Call _jit_pass_dedup_module_ueses in quantize_script (#32303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32303

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508029

fbshipit-source-id: 468ed53fc8bb3c8fdf5d79aea186949e64be711a
2020-01-24 13:34:40 -08:00
Jerry Zhang
fe3eb09da5 [quant] Re-enable fold_convbn in quantize_script (#32302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32302

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508035

fbshipit-source-id: 2ac26585396ec8a115acd0e1d7ccb84098a76824
2020-01-24 13:03:53 -08:00
Jerry Zhang
583bb97618 [quant][graphmode] Default to non-inplace in graph mode quantization API (#32204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32204

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508030

fbshipit-source-id: 94814c3c126a196f3938f944abfa5ae2a24d8dde
2020-01-23 10:39:46 -08:00
Jerry Zhang
8c1268aad3 Use default scale/zero_point in fake_quantize module instead of None (#32318)
Summary:
Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out.
fixes: https://github.com/pytorch/pytorch/issues/32082
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318

Differential Revision: D19434801

Pulled By: jerryzh168

fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469
2020-01-17 11:04:08 -08:00
Jerry Zhang
f995ec2076 Remove qconfig_dict in top level eager mode quantization API (#31972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31972

Since eager mode quantization requires many user modifications, we can't
consistently quantize a given model by just changing qconfig_dict, therefore
the top level `qconfig_dict` is not that useful.
fixes: https://github.com/pytorch/pytorch/issues/31549

Test Plan:
.

Imported from OSS

Differential Revision: D19330691

fbshipit-source-id: 8aee6e5249e0c14e8a363ac1a83836e88887cd7d
2020-01-10 11:04:37 -08:00
Jerry Zhang
3a02ed822b Remove insert_prepack_unpack and fold_prepack for now (#30909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30909

`fold_prepack` doesn't work anymore after we change `scale`, `zero_point`
to be attributes, but since the freeze API is coming up, I don't want to
spend time to make this work since this will be thrown away later.

Test Plan:
.

Imported from OSS

Differential Revision: D18864537

fbshipit-source-id: 649e6b91f2b04b8babacc0afb6bc1530ed7259d3
2019-12-12 07:44:31 -08:00
Michael Suo
62b10721fb Actually make flake8 do something (#30892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892

Fixes all outstanding lints and actually installs a properly configured
flake8

Test Plan: Imported from OSS

Differential Revision: D18862825

Pulled By: suo

fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
2019-12-06 17:50:50 -08:00
Jerry Zhang
7023e13fbb Fix mapping white list (#30636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636

Currently DeQuantStub is still in whitelist because set union has
lower precedence than set difference
fix issue: https://github.com/pytorch/pytorch/issues/29646

Test Plan:
verified locally that we don't attach qconfig for DeQuantStub

Imported from OSS

Differential Revision: D18775275

fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e
2019-12-03 11:34:28 -08:00
Raghuraman Krishnamoorthi
eccf42fd15 Bug fix: Handle missing keys in observer state dict during load (#30357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357

Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814

Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.

Differential Revision: D18668517

fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
2019-11-26 06:53:45 -08:00
Jerry Zhang
661a6c8ef2 Add get_qparams and revert the changes to calculate_qparams (#30262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262

`get_qparams` returns all parameters that's needed to call quantize function

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18645047

fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a
2019-11-26 06:53:26 -08:00
Xiaomeng Yang
c12f9a12a8 Fix quantized ConvReLU3d test (#30266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30266

Fix quantized ConvReLU3d test

Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv"

Reviewed By: hl475

Differential Revision: D18645717

fbshipit-source-id: bbe93f9daf5046f2aa05363efc7d0e59eaff37bf
2019-11-25 14:52:32 -08:00
Chris Gottbrath
7c4b9042ab Updates to quantization documentation (#30288)
Summary:
This pull request includes fixes for six quantization doc bugs.

https://github.com/pytorch/pytorch/issues/30283 - Rendering issue on QConfig
https://github.com/pytorch/pytorch/issues/26305 - Minor doc issue on fuse_modules()
https://github.com/pytorch/pytorch/issues/27451 - Issues with ConvReLU2d, ConvReLU3d, and LinearReLU doc issues
https://github.com/pytorch/pytorch/issues/26899 - Missing docstrings in torch.nn.intrinsic fused functions
https://github.com/pytorch/pytorch/issues/29735 - add discussion of QNNPack to quantization doc page
https://github.com/pytorch/pytorch/issues/27938 - some of the quantized functions lack documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30288

Differential Revision: D18653368

Pulled By: gottbrath

fbshipit-source-id: 410b3dd81ff10909a7f1a7736ca42d7cabf0beb1
2019-11-23 09:29:30 -08:00
James Reed
97fae401f0 Use LinearPackedParams everywhere
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198

Test Plan: Imported from OSS

Differential Revision: D18628003

Pulled By: jamesr66a

fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa
2019-11-22 11:31:17 -08:00
Lingyi Liu
7d3afc4186 enable the per channel dynamic quantization (#30122)
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122

Differential Revision: D18630541

Pulled By: lly-zero-one

fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
2019-11-21 10:12:05 -08:00
Raghuraman Krishnamoorthi
67b77afcdf Fast histogram observer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29790

Test Plan:
import torch
import time
import numpy as np
from torch.quantization.observer import HistogramObserver

X = torch.randn(1,1,224,224)

obs = HistogramObserver(2048)
acc_time = 0
for i in range(100):
   X = torch.randn(10,1,320,320)
   start = time.time()
   obs(X)
   #obs.forward_new(X)
   acc_time = acc_time + time.time()-start
print(acc_time)

Imported from OSS

Differential Revision: D18508562

fbshipit-source-id: 456e82360ce1b3f9d8b6e1832d23f1339655011a
2019-11-20 11:14:41 -08:00
Jerry Zhang
f2b851a9e5 Returning axis from calculate_qparams (#29494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494

`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18580905

fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
2019-11-20 11:06:48 -08:00
Jerry Zhang
b2291d4600 Make PerChannelMinMaxObserver scriptable using torch.jit.ignore (#29416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29416

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18580906

fbshipit-source-id: 5370300b89e26c2b4662b17e51284e8708cb5843
2019-11-19 19:12:55 -08:00
Vitaly Fedyunin
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
Xiaomeng Yang
510ef4b63a Add nn.quantized.Conv3d (#29813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29813

Add nn.quantized.Conv3d

Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv"

Reviewed By: jianyuh

Differential Revision: D18467749

fbshipit-source-id: 892f708179e9e836ad902851ac1838847009da15
2019-11-15 04:33:40 -08:00
Zafar Takhirov
09d359dfd9 Changed default args in quantization observers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29640

Test Plan: Imported from OSS

Differential Revision: D18447297

Pulled By: z-a-f

fbshipit-source-id: 7c86a5bb467a2fad8fe30c935d9c031c69868296
2019-11-12 23:32:05 -08:00
Jianyu Huang
bbff06ee96 Convert conv_prepack to conv2d_prepack and conv_unpack to conv2d_unpack (#29529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529

Pull Request resolved: https://github.com/pytorch/glow/pull/3771

We would like to replace `conv_prepack` with `conv2d_prepack` and  `conv_unpack` with `conv2d_unpack`.

This makes the naming consistent between 2D and 3D conv:
```
torch.ops.quantized.conv2d_prepack
torch.ops.quantized.conv2d_unpack
torch.ops.quantized.conv2d
torch.ops.quantized.conv3d_prepack
torch.ops.quantized.conv3d_unpack
torch.ops.quantized.conv3d
```

We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering.

The replacement bash command is as the follows:
```
find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \;
```
ghstack-source-id: 93661879

Test Plan: CI

Reviewed By: jackm321

Differential Revision: D18421079

fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12
2019-11-11 21:54:10 -08:00
Jerry Zhang
4bcf4796aa Make HistogramObserver scriptable with @torch.jit.ignore (#27950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27950

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18360139

fbshipit-source-id: 5459ae49c087886e4990de136198773a75b1c572
2019-11-07 18:02:44 -08:00
Jerry Zhang
5ac3df7712 Minor fix and turn off fold_convbn (#27403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27403

In fold_convbn pass, we need to recompute the parameter(weight, bias) for
conv, update the attribute of conv and update the access of bias in conv
because if the original conv have no bias, the `self.bias` access will be
inline and replaced by Constant node `None = prim::Constant()`, we need to
update this to use `GetAttr[name="bias"]` to make this work. But there is
also some work going on the handle constants, so we'll fix this pass after
that is done.

Test Plan:
.

Imported from OSS

Differential Revision: D18182918

fbshipit-source-id: bba510bc41ab58e0eb76f7b77335b6e3ffe2862d
2019-11-01 12:15:38 -07:00
Jerry Zhang
0eeda56632 Add nn.ReLU6 to default mapping (#28516)
Summary:
https://discuss.pytorch.org/t/quantized-hard-sigmoid/59013
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28516

Differential Revision: D18128717

Pulled By: jerryzh168

fbshipit-source-id: 4d06d1b54cf9f84a610d79fbadde2c8ef38c33f8
2019-10-25 14:52:44 -07:00
なるみ
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
Jerry Zhang
e280f93e31 Prepack folding for conv2d (#27119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27119

att

Test Plan:
python test/test_jit.py 'TestJit.test_fold_prepack'

Imported from OSS

Differential Revision: D17717636

fbshipit-source-id: 97e9f8d927f7eacedf09f47b8ae1bf8216b8cad4
2019-10-23 09:03:14 -07:00
Raghuraman Krishnamoorthi
94757e035d Do not insert observers for empty sequential modules (#28384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28384

ghstack-source-id: 92340259

Test Plan:
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details

 buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D18047293

fbshipit-source-id: 7e18b1aa76cc0fd26e8ee48a70c3a45688e73549
2019-10-21 20:32:13 -07:00
Zafar Takhirov
783c9c8445 Adding docstring to the observers (#27791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27791

This is the first part of the change. The next ones will amend more :)

Test Plan: Imported from OSS

Differential Revision: D17889913

Pulled By: z-a-f

fbshipit-source-id: ff74007903dd789d4c68684e83b50c0c86a25149
2019-10-21 19:09:50 -07:00
Zafar Takhirov
07b5666a87 Add default arg to prepare_qat mapping. (#28193)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28193

Fixes #28015

Test Plan: Imported from OSS

Differential Revision: D17973121

Pulled By: z-a-f

fbshipit-source-id: 03b3f70c70b89060c1f03d7ed8ab6002fe60bd49
2019-10-17 14:11:54 -07:00
Zafar Takhirov
a5ac7f6387 Changing observer name
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27779

Test Plan: Imported from OSS

Differential Revision: D17886605

Pulled By: z-a-f

fbshipit-source-id: 68c50b482e65015336ff27171fd730da493525b6
2019-10-17 11:36:03 -07:00
Zafar Takhirov
dc8785a022 Refactoing names for consistency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27670

Test Plan: Imported from OSS

Differential Revision: D17846269

Pulled By: z-a-f

fbshipit-source-id: ed3c7441c185bf11b2e62879aa3ecbc654aa2d4e
2019-10-16 12:18:26 -07:00
zou3519
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
zou3519
23bffc4f14 Fix most documentation warnings (#27782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782

Warnings show up when running `make html` to build documentation. All of
the warnings are very reasonable and point to bugs in our docs. This PR
attempts to fix most of those warnings.

In the future we will add something to the CI that asserts that there
are no warnings in our docs.

Test Plan: - build and view changes locally

Differential Revision: D17887067

Pulled By: zou3519

fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166
2019-10-13 10:34:01 -07:00
Michael Suo
341262754f module dedupe (#26666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26666

Changes:
- Introduce a `ConcreteModuleType` concept. This acts both as the key into the type
  cache, and as the source of truth for `ModuleValue::attr` queries. It needs
  to do both jobs because that's how we ensure correctness (if the types are
  different, it's because `ModuleValue::attr` would return different things).
- Now `recursive_script` will first construct a `ConcreteModuleType` and search for a
  pre-existing type before starting compilation.
- All previous paths to creating a `ScriptModule` (including inheriting from
  `ScriptModule`) are now rewritten to go through `create_script_module`, so
  that we have only a single place where construction happens.

Behavioral changes:
- Big change to `torch.jit.ScriptModule` inheritance: all attributes are now
  recursively scripted if possible, matching recursive scripting semantics.
  This makes it hard to keep something from being scripted (for example, a
  Python submodule). Possibly we'll need an `ignore()` type thing for
  attributes. In particular, this adds `self.training` to *every* ScriptModule, since
  it's present on every `nn.Module`.
- I believe this change to be transparent to existing users of the inheritance API, since if you had an attribute that is unscriptable that you never used, there is no error. In some cases, we will create new attributes (even if they are unused), which will increase serialized model size from before.

Test Plan: Imported from OSS

Differential Revision: D17551196

Pulled By: suo

fbshipit-source-id: b476d1c9feb3ddfd63406d90989aaf9dfe890591
2019-10-12 09:51:57 -07:00
Chris Gottbrath
a96b003b39 docstring only formatting changes: quantize.py, fake_quantize.py, observer.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27415

Reviewed By: zafartahirov

Differential Revision: D17783101

Pulled By: gottbrath

fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f
2019-10-08 09:21:03 -07:00
davidriazati
0046092178 Reduce special casing around 'training' (#27109)
Summary:
Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling.

Fixes #26884
](https://our.intern.facebook.com/intern/diff/17728129/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27109

Pulled By: driazati

Differential Revision: D17728129

fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57
2019-10-07 13:52:59 -07:00
Raghuraman Krishnamoorthi
ac0f18437f MovingAverage Observer (#27396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396

Observer that estimates moving averages of min and max values per batch,  more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018

Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details

buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17727213

fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
2019-10-04 16:28:59 -07:00
Zafar Takhirov
6bb7433ad5 Replacing the skip_list with white_list in the qconfig propagation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183

Test Plan: Imported from OSS

Differential Revision: D17700548

Pulled By: zafartahirov

fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16
2019-10-03 20:40:17 -07:00
Zafar Takhirov
111da77912 Factored out the default mappings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27164

Test Plan: Imported from OSS

Differential Revision: D17694475

Pulled By: zafartahirov

fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
2019-10-03 11:52:21 -07:00
James Reed
a423817055 Fix reprs for _intrinsic modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27184

Test Plan: Imported from OSS

Differential Revision: D17717481

Pulled By: jamesr66a

fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
2019-10-02 19:55:49 -07:00
James Reed
1affa7c32c Allow set for qconfig for dynamic_quantize
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27181

Test Plan: Imported from OSS

Differential Revision: D17717482

Pulled By: jamesr66a

fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
2019-10-02 19:55:45 -07:00
Zafar Takhirov
27dc595215 Rename _intrinsic to intrinsic
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27194

Test Plan: Imported from OSS

Differential Revision: D17704957

Pulled By: zafartahirov

fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
2019-10-02 18:53:06 -07:00
Raghuraman Krishnamoorthi
4abfb5493e Handle uninitialized min/max values in histogram observer (#27151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27151

We need to be ab le to handle observers with no min/max data correctly as models sometimes have modules that do not get any data.
ghstack-source-id: 91113403

Test Plan:
buck test caffe2/test:quantization -- test_minmax_observer

buck test caffe2/test:quantization -- test_per_channel_minmax_observer

buck test caffe2/test:quantization --test_histogram_observer

Reviewed By: csummersea

Differential Revision: D17690828

fbshipit-source-id: e95709333ea0f66d79ddb8141b7cba5a83347dbd
2019-10-01 14:56:37 -07:00
Jerry Zhang
98c02e6df3 Enable tests (#27103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27103

att

Test Plan:
python test/test_quantization.py 'GraphModePostTrainingQuantTest'

Imported from OSS

Differential Revision: D17678261

fbshipit-source-id: 5caa7512c6ff4a613980c86b5b221e0cfbe0a173
2019-10-01 12:10:21 -07:00
Jerry Zhang
f742ceaa46 API - add more passes to graph mode (#27093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27093

Add `insert_prepack_unpack` and `fold_prepack` to `convert_script`

Test Plan:
.

Imported from OSS

Differential Revision: D17678262

fbshipit-source-id: 4bfd6681af6fce226cc77aed8dd84066cbd8ed17
2019-10-01 11:26:02 -07:00
Raghuraman Krishnamoorthi
dddae3f854 Fuse module enhancements (#26457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26457

Enhancement to fuse module to support sequentials, fuse list can now be just like the state dict.
Also add support for Conv-Relu and linear-relu fusion
Also support inplace and out of place fusion of models.
ghstack-source-id: 91076386

Test Plan:
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D17466382

fbshipit-source-id: 0a548f8f4c366f3ecc59db693bac725ccd62328e
2019-09-30 22:00:20 -07:00
Raghuraman Krishnamoorthi
9e3ba35500 Add control for observers in Fake-quantize module (#27113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113

Fix bug in fake quant control of observer and fake-quantize operations.
Add test to ensure that features work as expected
ghstack-source-id: 91071181

Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control

Differential Revision: D17678875

fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b
2019-09-30 18:23:26 -07:00
Raghuraman Krishnamoorthi
d5298b6e66 Default observer and fake-quant for backends (#26627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26627

ghstack-source-id: 91008337

Test Plan: buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17518194

fbshipit-source-id: 1eb8a7a85dc811c4ee5228d68563abb157613ceb
2019-09-30 00:37:11 -07:00
Raghuraman Krishnamoorthi
32b0e8c980 Emulate weight and activation only quant with fake quant, numerics test (#26625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26625

ghstack-source-id: 91008296

Test Plan: buck test caffe2/test:quantized -- 'test_weight_only_activation_only_fakequant \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Differential Revision: D17520342

fbshipit-source-id: 26e148d3299afcfdfb1187aff6ab80687ed8df47
2019-09-30 00:37:07 -07:00
Raghuraman Krishnamoorthi
7dc7075795 Per channel fake quant (#26623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26623

Per-channel fake quant cpu and cuda operators,
per-channel support in fake quant module,
tests for per-channel fake-quant and serializability of fake quant modules

ghstack-source-id: 91008299
ghstack-source-id: 91008299

Test Plan:
buck test mode/dev caffe2/test:fake_quant  --
 Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929
      ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed)
      ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed)
      ✓ caffe2/test:fake_quant - main 0.000 (passed)

Differential Revision: D17439406

fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077
2019-09-30 00:21:25 -07:00
Raghuraman Krishnamoorthi
2ccbdb79c8 Per-channel baseline (#26516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516

ghstack-source-id: 90982010

Test Plan:
Integrate per-channel support into conv and linear modules.
The following tests pass:
buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Differential Revision: D17342622

fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
2019-09-28 14:05:06 -07:00
Jerry Zhang
09f0e949cd PyTorch Graph Mode Quantization API (#26390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26390

`quantize_script`: top level API for graph mode quantization

Test Plan:
there are some known issues, we can enable test after all known issues are fixed.

Imported from OSS

Differential Revision: D17645132

fbshipit-source-id: 61f261d5607409d493b39a2f4e05ebd017279f6b
2019-09-27 19:23:51 -07:00
Raghuraman Krishnamoorthi
8fa9900c28 control of observer/fake-quant operations (#26520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520

Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT
ghstack-source-id: 90897063

Test Plan: buck test caffe2/test:quantization --  --print-passing-details

Differential Revision: D17491155

fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc
2019-09-27 11:01:34 -07:00
Raghuraman Krishnamoorthi
102a148641 Default histogram observer (#26622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26622

ghstack-source-id: 90897064

Test Plan: buck test caffe2/test:quantization --  --print-passing-details

Differential Revision: D17508787

fbshipit-source-id: ae733ab35ec9b0233264014b8054d4d870fb05e1
2019-09-27 10:39:21 -07:00
Raghuraman Krishnamoorthi
b0a2f6f2f5 Serialization and range reduction support for Fake Quant/Observer (#26519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26519

ghstack-source-id: 90895631

Test Plan:
buck test caffe2/test:quantization -- 'test_histogram_observer \(test_quantization\.ObserverTest\)' --print-passing-details
and
buck test caffe2/test:fake_quant -- 'test_fq_serializable \(test_fake_quant\.TestFakeQuantizePerTensorAffine\)' --print-passing-details

Differential Revision: D17217408

fbshipit-source-id: 0da7efdcdae0c065dd035c5dd2b6a78231545ece
2019-09-27 10:09:39 -07:00
Raghuraman Krishnamoorthi
9a5e2e80b8 Fake quantization enhancements for QAT/PTQ support- fix tests (#26876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26876

Add ability to turn fake quantization and observers independently.
ghstack-source-id: 90892132

Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details

Differential Revision: D17592961

fbshipit-source-id: 24c60c94ed7c6c9fa55c634a8545731614e4f52f
2019-09-27 08:59:29 -07:00
Dmytro Dzhulgakov
0a8a779abe Add more inplace arguments to quantization top level API (#26782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782

At least we should be consistent on top-level APIs and prepare/convert/etc.

Logic is inplace=False by default but top-level APIs take care of doing fewer copies.

Also renames always-inplace methods like add_observer to have underscore in the end.

One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around

Test Plan: Imported from OSS

Differential Revision: D17595956

Pulled By: dzhulgakov

fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
2019-09-26 00:07:07 -07:00
Richard Zou
be93d30e37 Revert D17458232: Fake quantization enhancements for QAT/PTQ support
Test Plan: revert-hammer

Differential Revision:
D17458232

Original commit changeset: f44380c60f1a

fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2
2019-09-25 04:56:30 -07:00
Raghuraman Krishnamoorthi
e2c3d7e52c Fake quantization enhancements for QAT/PTQ support (#26420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420

Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant.
ghstack-source-id: 90704254

Test Plan:
buck test caffe2/test:fake_quant --  --print-passing-details
buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17458232

fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62
2019-09-25 02:02:00 -07:00
Raghuraman Krishnamoorthi
bc4519dc27 Handle DeQuantStub() for QAT (#26518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518

Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant.
ghstack-source-id: 90704220

Test Plan:
buck test caffe2/test:quantization -- --print-passing-details

Tests in test_quantization pass with changes:
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989
Summary (total time 73.03s):
  PASS: 28
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17439333

fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571
2019-09-25 00:35:34 -07:00
Dmytro Dzhulgakov
128a65e2e0 Use noop observer to pass dtype for dynamic quantization (#26709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709

Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.

One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.

Test Plan: Imported from OSS

Differential Revision: D17544103

Pulled By: dzhulgakov

fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
2019-09-24 09:24:39 -07:00
Dmytro Dzhulgakov
a79b3685db Simplify observers declaration with functools.partial (#26492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492

Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided.

Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention.

Test Plan: Imported from OSS

Differential Revision: D17521265

Pulled By: dzhulgakov

fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e
2019-09-23 10:15:59 -07:00
Lingyi Liu
11f9fe2433 Fix the API for record observer (#26413)
Summary:
Mainly want to resolve comments from https://github.com/pytorch/pytorch/pull/25830.

Overall, we want to provide a recording observer for recording the runtime tensor values of activation path in order to debug the numerical accuracy loss offline.

According to the feedback from https://github.com/pytorch/pytorch/issues/25830, it might be better to record all the observers in a dict and query the dict to get corresponding tensor values. hx89 is working on how to insert the recording observers into model under debug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26413

Differential Revision: D17506502

Pulled By: llyfacebook

fbshipit-source-id: 3ab90dc78920e7ec3fa572c2a07327a9991c530a
2019-09-20 14:27:56 -07:00
Jianyu Huang
f433ee1499 Add the FP16 weight support for LSTM in dynamic_quantize (#25975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975

We would like to add the FP16 weight support for the dynamic quantized LSTM.

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details

```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization
-- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated
  Total time: 13.9 sec
Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep  9
22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
      ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.184s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
Summary (total time 4.35s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D17299116

fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166
2019-09-19 22:19:22 -07:00
Haixin Liu
dcbfc3bdbf Add per channel observer (#25887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25887

ghstack-source-id: 90383258

Add per channel observer to compute the qparams for each channel.

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer'

buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer_scriptable'

Differential Revision: D17137226

fbshipit-source-id: 0b1c93e3cbcda86f5c4e30f7cd94c670f2665063
2019-09-18 22:16:45 -07:00
Haixin Liu
f2e9622ed8 Add l2 norm minimization (#24022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24022

In histogram observer add an approximation for L2 error minimization for selecting min/max.
By selecting new min/max, we filter out outliers in input distribution.

This follows the implementation of NormMinimization::NonlinearQuantizationParamsSearch in caffe2/quantization/server/norm_minimization.cc
ghstack-source-id: 90298789

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'

Differential Revision: D16713239

fbshipit-source-id: 82631ba47974e25689c9c66bc3088117090e26d4
2019-09-18 00:07:10 -07:00
Sebastian Messmer
9f6b6b8101 Back out "[quant][observer] Add histogram observer" (#26236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236

Original diff broke oss CI. Reverting.

Original commit changeset: 0f047d3349cb
ghstack-source-id: 90125990

Test Plan: testinprod

Reviewed By: hx89

Differential Revision: D17385490

fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a
2019-09-14 12:48:46 -07:00
Haixin Liu
1563fdb591 Add histogram observer (#23959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959

Add histogram observer that records the running histogram of tensor values along with min/max values.
ghstack-source-id: 90076996

Test Plan:
Added a test test_histogram_observer
buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'

Differential Revision: D16692835

fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4
2019-09-13 19:24:04 -07:00
Lingyi Liu
62767077c3 add the tensor_observer to record the runtime tensor for quantization … (#25830)
Summary:
…accuracy analsyis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25830

Differential Revision: D17327147

Pulled By: llyfacebook

fbshipit-source-id: 095d5537a31b8d7541081000eaeb8b8474dfb8d0
2019-09-11 13:36:28 -07:00
Jerry Zhang
c475ef72f9 Change order of activation and weight in QConfig (#25950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25950

I feel that is a more natural order

Test Plan:
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17294963

fbshipit-source-id: ed8ffdfe788a5e81966bda856e8d046ab68ee229
2019-09-11 09:51:01 -07:00
Jianyu Huang
9b4f3fd7d3 Add torch.nn.LSTM into the default dynamic quantize mappings (#25954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25954

Add torch.nn.LSTM into the default dynamic quantize mappings. We will by default dynamic quantize LSTM when we apply the quantize_dynamic API.
ghstack-source-id: 89839673

Test Plan: CI

Differential Revision: D17294958

fbshipit-source-id: 824aceef821276b3e28c52ce3bebafaf9b0a0833
2019-09-10 21:03:12 -07:00
Haixin Liu
9c10f729de Add Dropout to blacklist (#25881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25881

Add Dropout to blacklist to avoid the error in eager mode quantization.
ghstack-source-id: 89759536

Test Plan: Test locally in python notebook.

Reviewed By: jianyuh

Differential Revision: D17270826

fbshipit-source-id: bcf43483976740564d7f407838f25c2dbb67b016
2019-09-10 10:57:38 -07:00
Raghuraman Krishnamoorthi
17c1b2c715 Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (#25667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25667

Relax scale and zero-point for activations to ensure that fbgemm implementations of conv and linear do not saturate due to 16 bit intermediate accumulation.

Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in
handling layouts for quantized ops in addition to saturation/quantization errors.
ghstack-source-id: 89587942

Test Plan:
buck test caffe2/test:quantized -- 'test_float_quant_compare \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Passes when SQNR > 35 dB

buck test caffe2/test:quantization -- 'test_minmax_observer \(test_quantization\.ObserverTest\)' --print-passing-details
Passes with additional coverage for observer changes

Differential Revision: D17140498

fbshipit-source-id: 42c58e726bb0b0f51890590ee2525428f9a8d24e
2019-09-06 17:18:01 -07:00
Jianyu Huang
0483d537ab Add the dynamic quantized LSTM module (#25157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25157

Add the dynamic quantized LSTM module.

TODO (separate PRs):
- Serialization.
- Bias can be Null.

ghstack-source-id: 89443731

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
```
[jianyuhuang@devvm2816.prn3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_q
uantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.4 sec
Building: finished in 4.0 sec (100%) 8122/8122 jobs, 2 updated
  Total time: 5.5 sec
Trace available for this run at /tmp/testpilot.20190902-164918.1275502.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision b61bc0e3b71033578eddfe0a28b0739bc685663f fbpkg 3b1c1aed1c534c0cb161a981eca6e2f0 at Sun Sep  1 20:58:52 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/690/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
      ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 1.048 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 1.049s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
Summary (total time 5.53s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16955662

fbshipit-source-id: 61cf1a74913105fa02e44b3941813eabac0006b5
2019-09-03 19:18:28 -07:00
Haixin Liu
c59540b7b1 Change exception to warning (#25408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25408

Change exception to warning so that observer can be called with no data and still provide a scale and zero-point.
ghstack-source-id: 89267768

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer'

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'

Differential Revision: D17116524

fbshipit-source-id: db4d76e882b57f23161dced846df3a0760194a41
2019-08-29 20:12:57 -07:00
Zafar Takhirov
e44c09ecae making quant utilities inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25054

Test Plan: Imported from OSS

Differential Revision: D16974198

Pulled By: zafartahirov

fbshipit-source-id: 54befc8429990adafe746d1255d117fca5f12e11
2019-08-29 16:03:13 -07:00
Haixin Liu
06757acb30 Refactor MinMax observer (#23902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23902

Copied from Daya's diff in pytorch/pytorch #23191

Refactor MinMax observer and create the base observer class to prepare for future observers such as histogram observer.
ghstack-source-id: 89146014

Test Plan:
Added a test test_minmax_observer

buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer'

```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635
      ✓ caffe2/test:quantization - test_minmax_observer (test_quantization.ObserverTest) 0.055 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635
Summary (total time 4.26s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195
      ✓ caffe2/test:quantization - test_observer_scriptable (test_quantization.ObserverTest) 1.762 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195
Summary (total time 6.02s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16663221

fbshipit-source-id: 3d0e1aa9e4d27808e61b10604782606de067a34a
2019-08-28 13:12:38 -07:00
Raghuraman Krishnamoorthi
f5a3d59254 Handle empty qconfig for functional Modules (#25215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25215

ghstack-source-id: 89044252

Test Plan: Test implemented in D16879132/

Differential Revision: D17064670

fbshipit-source-id: 08d3d566aa123bedf318ab5a8bc9b71457930ff2
2019-08-27 12:31:26 -07:00
Raghuraman Krishnamoorthi
c142dbf876 Fix scriptability for Observer (#25219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25219

Ensure that observer code remains scriptable after addition of warnings
ghstack-source-id: 89055664

Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17065218

fbshipit-source-id: b3599613b4835bf1c5241aff191b40ba5f40d7be
2019-08-27 08:54:40 -07:00
Raghuraman Krishnamoorthi
f622ec8084 Update mapping dictionary to support functionalmodules and pooling operations (#25216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25216

ghstack-source-id: 89045562

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ \(test_quantization.PostTrainingQuantTest\)' --print-passing-details

Differential Revision: D17065029

fbshipit-source-id: b248abf6de162f38e35e6bace17bde1be9e38c57
2019-08-26 23:00:01 -07:00
Raghuraman Krishnamoorthi
17f69eff22 Revert D16879133: Handle empty qconfig for functional Modules
Test Plan: revert-hammer

Differential Revision:
D16879133

Original commit changeset: 230f5204cfbd

fbshipit-source-id: 29b4bfe066b173797f3d9f2fcf7cbf5ee21ff8fb
2019-08-26 16:25:29 -07:00
Raghuraman Krishnamoorthi
a9fdc1923b Revert D16879132: Update mapping dictionary to support functionalmodules and pooling operations
Test Plan: revert-hammer

Differential Revision:
D16879132

Original commit changeset: cd8c10182aa7

fbshipit-source-id: 9b67ccf73f43d15ef50bf0331d3df4d57835931b
2019-08-26 16:25:25 -07:00
Raghuraman Krishnamoorthi
77ee1f5f3c Revert D16923660: Support observer without any data calibration
Test Plan: revert-hammer

Differential Revision:
D16923660

Original commit changeset: 9927ed4e4ee9

fbshipit-source-id: 31a2b28584aae3808df6508b4caedb54de32156d
2019-08-26 15:36:26 -07:00
Raghuraman Krishnamoorthi
ff30201fff Revert D17059486: Fix scriptability for Observer
Test Plan: revert-hammer

Differential Revision:
D17059486

Original commit changeset: 70ea9ee39f0b

fbshipit-source-id: 6f39057b264e4d4213cf07496929274240bce917
2019-08-26 15:32:21 -07:00
Raghuraman Krishnamoorthi
85d1ebd26e Fix scriptability for Observer (#25197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25197

Ensure that observer code remains scriptable after addition of warnings
ghstack-source-id: 89022474

Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17059486

fbshipit-source-id: 70ea9ee39f0b896c7801e168666f88c156dbf15b
2019-08-26 15:27:27 -07:00
Raghuraman Krishnamoorthi
a5710e2303 Support observer without any data calibration (#24923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24923

Replace exception with warning for un initialized min/max values to support creation of quantized models without observers.
ghstack-source-id: 89003800

Test Plan: Replace error message with warning for observers

Differential Revision: D16923660

fbshipit-source-id: 9927ed4e4ee977c1388595ddef042204f71076a4
2019-08-26 12:16:53 -07:00
Raghuraman Krishnamoorthi
794f63fe92 Update mapping dictionary to support functionalmodules and pooling operations (#24804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24804

ghstack-source-id: 89003799

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ \(test_quantization.PostTrainingQuantTest\)' --print-passing-details

Differential Revision: D16879132

fbshipit-source-id: cd8c10182aa732ddf655bcda17f72ea08033a633
2019-08-26 12:16:49 -07:00
Raghuraman Krishnamoorthi
d7f6ac1dbb Handle empty qconfig for functional Modules (#24803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24803

ghstack-source-id: 89003797

Test Plan: Test implemented in D16879132/

Differential Revision: D16879133

fbshipit-source-id: 230f5204cfbd149fea1c0985578a2572a0e0f2a8
2019-08-26 12:16:46 -07:00
James Reed
049284e14d Make observer scriptable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24996

Test Plan: Imported from OSS

Differential Revision: D16952938

Pulled By: jamesr66a

fbshipit-source-id: 3d08e0c746603d0fe090fb3dbf13c5fc9dc022f4
2019-08-22 11:28:45 -07:00
James Reed
a0b13b4fa5 extra_repr for quantized modules (#24443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24443

This gives us useful information about the Module when we print it, like so:

```
FloatModule(
  (quant): Quantize()
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1), scale=0.08209919929504395, zero_point=128)
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1), scale=0.16885940730571747, zero_point=128)
  (fc1): Linear(in_features=800, out_features=500, bias=True, scale=0.12840059399604797, zero_point=128)
  (fc2): Linear(in_features=500, out_features=10, bias=True, scale=0.260015606880188, zero_point=128)
  (dequant): DeQuantize()
)
```

Test Plan: Imported from OSS

Differential Revision: D16847140

Pulled By: jamesr66a

fbshipit-source-id: 8c995108f17ed1b086d1fb30471a41c532c68080
2019-08-16 22:38:45 -07:00
Zafar Takhirov
1a74bd407d Fixes the adding of the observer to the FloatFunctional (#24418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24418

Fixes #24394

The observer is not added correctlty, because one of the conditions is not met.

Test Plan: Imported from OSS

Differential Revision: D16833951

Pulled By: zafartahirov

fbshipit-source-id: bb4699e6a1cf6368c7278272a68e5e7c6d3f59a8
2019-08-15 17:27:00 -07:00
Raghuraman Krishnamoorthi
696cabae9b Baseline observer module, ensuring that (min,max) range includes zero. (#24297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24297

ghstack-source-id: 88252409

Differential Revision: D16635637

fbshipit-source-id: fcef20b9c88b2c3bd97e311514e5b2d0339ff28a
2019-08-15 15:25:23 -07:00
James Reed
f03700b997 Fix QConfig_dynamic typename (#24431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24431

Pickle's fully-qualified name lookup would fail when trying to serialize QConfig_dynamic since the __name__ on the instance would refer to the wrong class name

Test Plan: Imported from OSS

Differential Revision: D16835705

Pulled By: jamesr66a

fbshipit-source-id: e146835cbe10b08923d77298bc93b0f5b0ba37c5
2019-08-15 15:25:19 -07:00
Jerry Zhang
754bf383b1 Change return type of observer to two tensors (#24339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339

Att

Differential Revision: D16820813

fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254
2019-08-15 10:26:44 -07:00
Jerry Zhang
761ae8e9b6 Add intrinsic module mappings (#23753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753

Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ

Differential Revision: D16820749

fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
2019-08-15 09:37:24 -07:00
Jianyu Huang
0f64043b49 Remove the activation observer for default_qconfig (#24299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24299

As suggested in https://github.com/pytorch/pytorch/pull/24232, we will remove the activation observer for dynamic quantization path.
ghstack-source-id: 88287094

Differential Revision: D16798590

fbshipit-source-id: 07a245d5584b5b15c6895d9b09deef4a0605073a
2019-08-14 17:21:50 -07:00
Jianyu Huang
e8d2ddc2c4 Make the default qconfig_dict (#24232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24232

As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r306650311, we will make the keys of default_qconfig_dict as `torch.nn.Linear`. That is, we will do the dynamic quantization on the `torch.nn.Linear` by default, if the user just specify `torch.quantize_dynamic(model)`.
ghstack-source-id: 88287089

Differential Revision: D16781191

fbshipit-source-id: 991a5e151a9ea32b879d6897cd9862855d747135
2019-08-14 15:12:55 -07:00
Jianyu Huang
584c6986fd Add the type matching rule for qconfig_dict (#23212)
Summary:
We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization.

Before this Diff, to dynamic quantize the BERT model, we have to specify each layer:
```
qconfig_dict = {
    'encoder.layer.0.attention.self.query': default_qconfig,
    'encoder.layer.0.attention.self.key': default_qconfig,
    'encoder.layer.0.attention.self.value': default_qconfig,
    'encoder.layer.0.attention.output.dense': default_qconfig,
    'encoder.layer.0.intermediate.dense': default_qconfig,
    'encoder.layer.0.output.dense': default_qconfig,
    'encoder.layer.1.attention.self.query': default_qconfig,
    'encoder.layer.1.attention.self.key': default_qconfig,
    'encoder.layer.1.attention.self.value': default_qconfig,
    'encoder.layer.1.attention.output.dense': default_qconfig,
    'encoder.layer.1.intermediate.dense': default_qconfig,
    'encoder.layer.1.output.dense': default_qconfig,
   ...
}
```
After this Diff, we only need the following
```
qconfig_dict = {
     torch.nn.Linear : default_qconfig
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212
ghstack-source-id: 88287091

Reviewed By: zafartahirov

Differential Revision: D16436542

fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89
2019-08-14 13:07:36 -07:00
Jianyu Huang
e94ba742b0 Dynamic Quantized Linear Module (#23128)
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module  in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232

Differential Revision: D16258664

fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
2019-08-13 21:01:23 -07:00
Raghuraman Krishnamoorthi
1c5e48bbd0 Observer returns original tensor for post training quantization (#24196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24196

Observer returns output with no changes for post training quant. This unifies observer semantics for QAT and PTQ.
ghstack-source-id: 88140887

Differential Revision: D16768277

fbshipit-source-id: fae7c94e3dc0eeda363e9982b3865a15113e11bd
2019-08-13 14:01:37 -07:00
Zafar Takhirov
4cc16782f3 Removing the make_module script. (#23635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635

It appears it is the same complexity to add new modules using a base class and using a generation script.

Test Plan: Imported from OSS

Differential Revision: D16593364

Pulled By: zafartahirov

fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb
2019-08-13 09:58:28 -07:00
Jerry Zhang
89956374c3 Remove qconfig_dict from API (#23465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465

We decided not to allow user to use qconfig_dict to do quantization
since that API is not robust.

Differential Revision: D16611504

fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5
2019-08-02 10:28:48 -07:00
Jerry Zhang
6cf9ed4a54 ConvBn2d/ConvBnReLU2d (#23357)
Summary:
Added _intrinsic.qat.ConvBn2d/_intrinsic.qat.ConvBnReLU2d.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23357
ghstack-source-id: 87519573

Differential Revision: D16295500

fbshipit-source-id: 81e6d1d10d05bf6e343721fc5701d3d6bd7e07e6
2019-08-01 10:07:00 -07:00
Zafar Takhirov
9c549dfdc1 make_module: First version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23288

Test Plan: Imported from OSS

Differential Revision: D16455390

Pulled By: zafartahirov

fbshipit-source-id: 4352f0a17cd0382b48502b93e51574cc3acdfdcc
2019-07-30 22:14:44 -07:00
Jerry Zhang
bc64324da9 Change condition in swap module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23561

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D16570928

Pulled By: jerryzh168

fbshipit-source-id: 70f36f577ac657d015f3d7738819867742088e5a
2019-07-30 17:25:02 -07:00
Jerry Zhang
7364aa796d skip nn.Identity in add_observer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23500

Test Plan:
e2e test in quantizing resnext 101

Imported from OSS

Differential Revision: D16550190

Pulled By: jerryzh168

fbshipit-source-id: 6128d7c3419235152b43739fcc5cade34342ba3d
2019-07-30 11:00:36 -07:00
Zafar Takhirov
058645acb1 Fusion and _intrinsic modules (#23003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003

torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu

Fusion function to combine specific modules: (conv,bn) and  (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.

TODO: Add tests for _intrinsic modules.

Conv BN fusion code is based on DsKhudia's implementation

Differential Revision: D16199720

fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
2019-07-23 14:54:19 -07:00
Jerry Zhang
d7448c7812 quantized conv module (#23178)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23178
ghstack-source-id: 86973164

Differential Revision: D16426871

fbshipit-source-id: a2ebb38997acfeb61b7dfd6b11dd8ee9b3a7a8ed
2019-07-22 20:47:40 -07:00
Jerry Zhang
77353636de Conv module (#23084)
Summary:
Added Conv module for qat

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23084
ghstack-source-id: 86862445

Differential Revision: D16379417

fbshipit-source-id: 742cc8b8e0f132070ca4943a1c2e3db60c2b5bdc
2019-07-19 18:49:52 -07:00
Jerry Zhang
7cc029cb75 Quantization aware training in eager mode (#23082)
Summary:
Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650

Differential Revision: D16379374

fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
2019-07-19 14:57:25 -07:00
Soumith Chintala
84c2c89e2c Revert D16199356: [qat] Quantization aware training in eager mode
Differential Revision:
D16199356

Original commit changeset: 62aeaf47c12c

fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666
2019-07-19 03:18:48 -07:00
Soumith Chintala
f19aa12ae5 Revert D16274792: [qat] Conv module
Differential Revision:
D16274792

Original commit changeset: 1da10194123b

fbshipit-source-id: 71b34774b463f2350289bd39b8cfd798e095ffa5
2019-07-19 03:18:45 -07:00
Jerry Zhang
12d9d768b8 Conv module (#22899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22899

Added Conv module for qat

Reviewed By: zafartahirov

Differential Revision: D16274792

fbshipit-source-id: 1da10194123b2759a6a35c60d1c2d2c0b569ccdc
2019-07-18 18:58:07 -07:00
Jerry Zhang
65ef671d11 Quantization aware training in eager mode (#22732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732

Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Reviewed By: zafartahirov

Differential Revision: D16199356

fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
2019-07-18 18:58:03 -07:00
Jerry Zhang
f7de9be3c0 Add FakeQuantize Module (#21767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767

Adding FakeQuantize Module
for quantization aware training

Reviewed By: dzhulgakov

Differential Revision: D15728503

fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6
2019-07-15 14:08:55 -07:00
Jerry Zhang
b984b0ab4b fix print (#22689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22689

att

Reviewed By: Lucaskabela

Differential Revision: D16184260

fbshipit-source-id: 1a6ad51a37918d0c81d6e3baa0ca0baa32cb9673
2019-07-10 11:26:34 -07:00
Jerry Zhang
5040d52a5a torch.quantization conversion utilities, observers for eager mode quantization (#22010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22010

torch.quantization module with observers and conversion routines

Reviewed By: zafartahirov

Differential Revision: D15554183

fbshipit-source-id: 05a3fabe28dd701978b8ecebf5bfc3a4c044ba5c
2019-07-09 10:51:38 -07:00