Commit Graph

38 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
5977d1d864 FixedQParamsFakeQuantize: adjust default quant_min and quant_max (#47423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47423

Since the dtype of this fake_quant is `quint8`, the output range should be
from 0 to 255.  Fixing.  This should address the numerical inaccuracies with
sigmoid and hardsigmoid with `FixedQParamsFakeQuantize` attached compared
to their quantized counterparts.

In a future PR, might be safer to also make the activation functions
using `FixedQParamsFakeQuantize` to explicitly specify their expected
output range and zero_point.  Leaving that for later, as this bugfix
should be landed urgently.

Test Plan:
Manual script which gives low SQNR before this PR and high SQNR after
this PR: https://gist.github.com/vkuzo/9906bae29223da72b10d6b6aafadba42

https://github.com/pytorch/pytorch/pull/47376, which can be landed after
this, adds a proper test.

Imported from OSS

Reviewed By: ayush29feb, jerryzh168

Differential Revision: D24751497

fbshipit-source-id: 4c32e22a30116caaceeedb4cd47146d066054a89
2020-11-05 09:06:55 -08:00
Jerry Zhang
6b50ccc41c [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738) (#46871)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46871

Test Plan:
Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24547180

fbshipit-source-id: d2eb9aa74c6e5436204376b1a2ebcc6188d3562f
2020-10-26 23:52:07 -07:00
Alban Desmaison
25db74bf5e Revert D24486972: [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat
Test Plan: revert-hammer

Differential Revision:
D24486972 (e927b62e73)

Original commit changeset: c9f139bfdd54

fbshipit-source-id: 2a75f5ec93d55a62b40d1cdd49adcf65436058f7
2020-10-26 12:47:05 -07:00
Jerry Zhang
e927b62e73 [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46738

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D24486972

fbshipit-source-id: c9f139bfdd54973da1a93a45e32937595dbe67fc
2020-10-26 12:04:42 -07:00
Jerry Zhang
13decddae2 [reland][quant] Add FixedQParamsFakeQuantize module (#45538) (#46657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657

This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid

Test Plan:
Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24451406

fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074
2020-10-21 16:47:11 -07:00
Ashkan Aliabadi
2181449068 Revert D24004795: [quant] Add FixedQParamsFakeQuantize module
Test Plan: revert-hammer

Differential Revision:
D24004795 (253918ec55)

Original commit changeset: fc4797f80842

fbshipit-source-id: 663169e90a2f58e5a89e4d382291ae41c24d0fee
2020-10-20 19:40:21 -07:00
Jerry Zhang
253918ec55 [quant] Add FixedQParamsFakeQuantize module (#45538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538

This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24004795

fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd
2020-10-20 17:43:25 -07:00
Sam Estep
24187a0b42 Enable type check for torch.quantization.fake_quantize (#45701)
Summary:
Addresses part of https://github.com/pytorch/pytorch/issues/42969.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45701

Reviewed By: walterddr

Differential Revision: D24066672

Pulled By: samestep

fbshipit-source-id: 53bb5e7b4703738d3de86fa89fb0980f1d6251f3
2020-10-02 09:27:34 -07:00
Supriya Rao
1fde54d531 [quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773

The model is created and prepared using fx APIs and then scripted for training.
In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant
and observer modules on it.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23741354

fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532
2020-09-17 10:21:52 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Supriya Rao
3f512b0de2 [quant][qat] Ensure observers and fq modules are scriptable (#44749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749

Ensure fx module is scriptable after calling prepare_qat on it

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23718380

fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
2020-09-16 09:30:07 -07:00
Jerry Zhang
85752b989d [quant][doc] Print more info for fake quantize module (#43031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43031

fixes: https://github.com/pytorch/pytorch/issues/43023

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23116200

fbshipit-source-id: faa90ce8711da0785d635aacd0362c45717cfacc
2020-08-13 20:27:36 -07:00
Vasiliy Kuznetsov
94dfc76e3f graph mode qat: make fake_quantize scriptable (#39750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39750

Add a test to make the default QAT qconfig scriptable, and fix
all the errors.

Test Plan:
```
python test/test_quantization.py TestQATScript.fake_quant_scriptable
```

Imported from OSS

Differential Revision: D21975879

fbshipit-source-id: 8c48ad9f24b2c941d2267cb53eb70ebecd103744
2020-06-10 21:34:18 -07:00
Vasiliy Kuznetsov
8292742ba0 fake_quant: move observer and fake_quant flags into buffers (#38368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38368

There is a need for some customers to enable/disable these flags
in the middle of QAT.  To make it work properly with DDP,
we need to implement them using buffers so that they are replicated
properly to all the nodes.

This should solve issue https://github.com/pytorch/pytorch/issues/38081

Test Plan:
CI

Imported from OSS

Differential Revision: D21537607

fbshipit-source-id: 8c9da022beb7aaa44c658268f02f99dd5aee93fd
2020-05-18 09:30:07 -07:00
Vasiliy Kuznetsov
b57c8b720e [wip] Make quantization modules work with DataParallel (#37032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032

DataParallel requires all params and buffers of child modules to be updated
in place because of how it implements model replication during the
forward pass (see https://github.com/pytorch/pytorch/pull/12671 for
context). Any params or buffers not updated in place are lost and not
propagated back to the master.

This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut
point) to do their parameter update in-place. This will enable static
quant and QAT to work correctly with DataParallel.

TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR.

Test Plan:
script failed before and passes after the diff:
https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40

TODO before land: add integration testing

Imported from OSS

Differential Revision: D21206454

fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7
2020-05-05 13:06:43 -07:00
Gao, Xiang
45e4b614d1 Per channel quantization performance improvement (#33772)
Summary:
Benchmark:
NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X
```python
import torch
print(torch.__version__)

for i in range(1000):
    torch.randn(1024 * 128, device='cuda')

def cuda(e):
    a = torch.randn(2 ** e, 32, device='cuda')
    s = torch.randn(32, device='cuda')
    z = torch.randn(32, device='cuda')
    torch.cuda.synchronize()
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize()

def cpu(e):
    a = torch.randn(2 ** e, 32, device='cpu')
    s = torch.randn(32, device='cpu')
    z = torch.randn(32, device='cpu')
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999);

for i in range(10, 24):
    cuda(i)
print()
for i in range(10, 32):
    cpu(i)
```
Before
```
1.5.0a0+9bc922d
849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
After
```
1.5.0a0+a7ec8cc
92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Fixes https://github.com/pytorch/pytorch/issues/33647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772

Differential Revision: D20112531

Pulled By: ngimel

fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f
2020-02-26 10:19:25 -08:00
Supriya Rao
996c0adb53 [quant] Regsiter fake_quant and observer attributes as buffers (#33626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626

For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest.

Test Plan:
Tested on actual model on GPU

Imported from OSS

Differential Revision: D20038839

fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4
2020-02-24 14:16:03 -08:00
Jerry Zhang
8c1268aad3 Use default scale/zero_point in fake_quantize module instead of None (#32318)
Summary:
Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out.
fixes: https://github.com/pytorch/pytorch/issues/32082
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318

Differential Revision: D19434801

Pulled By: jerryzh168

fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469
2020-01-17 11:04:08 -08:00
Raghuraman Krishnamoorthi
eccf42fd15 Bug fix: Handle missing keys in observer state dict during load (#30357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357

Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814

Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.

Differential Revision: D18668517

fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
2019-11-26 06:53:45 -08:00
Jerry Zhang
661a6c8ef2 Add get_qparams and revert the changes to calculate_qparams (#30262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262

`get_qparams` returns all parameters that's needed to call quantize function

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18645047

fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a
2019-11-26 06:53:26 -08:00
Jerry Zhang
f2b851a9e5 Returning axis from calculate_qparams (#29494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494

`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18580905

fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
2019-11-20 11:06:48 -08:00
Zafar Takhirov
a5ac7f6387 Changing observer name
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27779

Test Plan: Imported from OSS

Differential Revision: D17886605

Pulled By: z-a-f

fbshipit-source-id: 68c50b482e65015336ff27171fd730da493525b6
2019-10-17 11:36:03 -07:00
Chris Gottbrath
a96b003b39 docstring only formatting changes: quantize.py, fake_quantize.py, observer.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27415

Reviewed By: zafartahirov

Differential Revision: D17783101

Pulled By: gottbrath

fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f
2019-10-08 09:21:03 -07:00
Raghuraman Krishnamoorthi
ac0f18437f MovingAverage Observer (#27396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396

Observer that estimates moving averages of min and max values per batch,  more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018

Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details

buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17727213

fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
2019-10-04 16:28:59 -07:00
Raghuraman Krishnamoorthi
9e3ba35500 Add control for observers in Fake-quantize module (#27113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113

Fix bug in fake quant control of observer and fake-quantize operations.
Add test to ensure that features work as expected
ghstack-source-id: 91071181

Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control

Differential Revision: D17678875

fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b
2019-09-30 18:23:26 -07:00
Raghuraman Krishnamoorthi
7dc7075795 Per channel fake quant (#26623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26623

Per-channel fake quant cpu and cuda operators,
per-channel support in fake quant module,
tests for per-channel fake-quant and serializability of fake quant modules

ghstack-source-id: 91008299
ghstack-source-id: 91008299

Test Plan:
buck test mode/dev caffe2/test:fake_quant  --
 Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929
      ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed)
      ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed)
      ✓ caffe2/test:fake_quant - main 0.000 (passed)

Differential Revision: D17439406

fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077
2019-09-30 00:21:25 -07:00
Raghuraman Krishnamoorthi
2ccbdb79c8 Per-channel baseline (#26516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516

ghstack-source-id: 90982010

Test Plan:
Integrate per-channel support into conv and linear modules.
The following tests pass:
buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Differential Revision: D17342622

fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
2019-09-28 14:05:06 -07:00
Raghuraman Krishnamoorthi
8fa9900c28 control of observer/fake-quant operations (#26520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520

Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT
ghstack-source-id: 90897063

Test Plan: buck test caffe2/test:quantization --  --print-passing-details

Differential Revision: D17491155

fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc
2019-09-27 11:01:34 -07:00
Raghuraman Krishnamoorthi
b0a2f6f2f5 Serialization and range reduction support for Fake Quant/Observer (#26519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26519

ghstack-source-id: 90895631

Test Plan:
buck test caffe2/test:quantization -- 'test_histogram_observer \(test_quantization\.ObserverTest\)' --print-passing-details
and
buck test caffe2/test:fake_quant -- 'test_fq_serializable \(test_fake_quant\.TestFakeQuantizePerTensorAffine\)' --print-passing-details

Differential Revision: D17217408

fbshipit-source-id: 0da7efdcdae0c065dd035c5dd2b6a78231545ece
2019-09-27 10:09:39 -07:00
Raghuraman Krishnamoorthi
9a5e2e80b8 Fake quantization enhancements for QAT/PTQ support- fix tests (#26876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26876

Add ability to turn fake quantization and observers independently.
ghstack-source-id: 90892132

Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details

Differential Revision: D17592961

fbshipit-source-id: 24c60c94ed7c6c9fa55c634a8545731614e4f52f
2019-09-27 08:59:29 -07:00
Richard Zou
be93d30e37 Revert D17458232: Fake quantization enhancements for QAT/PTQ support
Test Plan: revert-hammer

Differential Revision:
D17458232

Original commit changeset: f44380c60f1a

fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2
2019-09-25 04:56:30 -07:00
Raghuraman Krishnamoorthi
e2c3d7e52c Fake quantization enhancements for QAT/PTQ support (#26420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420

Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant.
ghstack-source-id: 90704254

Test Plan:
buck test caffe2/test:fake_quant --  --print-passing-details
buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17458232

fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62
2019-09-25 02:02:00 -07:00
Dmytro Dzhulgakov
a79b3685db Simplify observers declaration with functools.partial (#26492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492

Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided.

Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention.

Test Plan: Imported from OSS

Differential Revision: D17521265

Pulled By: dzhulgakov

fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e
2019-09-23 10:15:59 -07:00
Jerry Zhang
754bf383b1 Change return type of observer to two tensors (#24339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339

Att

Differential Revision: D16820813

fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254
2019-08-15 10:26:44 -07:00
Jerry Zhang
7cc029cb75 Quantization aware training in eager mode (#23082)
Summary:
Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650

Differential Revision: D16379374

fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
2019-07-19 14:57:25 -07:00
Soumith Chintala
84c2c89e2c Revert D16199356: [qat] Quantization aware training in eager mode
Differential Revision:
D16199356

Original commit changeset: 62aeaf47c12c

fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666
2019-07-19 03:18:48 -07:00
Jerry Zhang
65ef671d11 Quantization aware training in eager mode (#22732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732

Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Reviewed By: zafartahirov

Differential Revision: D16199356

fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
2019-07-18 18:58:03 -07:00
Jerry Zhang
f7de9be3c0 Add FakeQuantize Module (#21767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767

Adding FakeQuantize Module
for quantization aware training

Reviewed By: dzhulgakov

Differential Revision: D15728503

fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6
2019-07-15 14:08:55 -07:00