Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850
Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).
Test Plan: - built and viewed the documentation for each change locally.
Differential Revision: D17908123
Pulled By: zou3519
fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782
At least we should be consistent on top-level APIs and prepare/convert/etc.
Logic is inplace=False by default but top-level APIs take care of doing fewer copies.
Also renames always-inplace methods like add_observer to have underscore in the end.
One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around
Test Plan: Imported from OSS
Differential Revision: D17595956
Pulled By: dzhulgakov
fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518
Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant.
ghstack-source-id: 90704220
Test Plan:
buck test caffe2/test:quantization -- --print-passing-details
Tests in test_quantization pass with changes:
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989
Summary (total time 73.03s):
PASS: 28
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
Differential Revision: D17439333
fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709
Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.
One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.
Test Plan: Imported from OSS
Differential Revision: D17544103
Pulled By: dzhulgakov
fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
Summary:
Mainly want to resolve comments from https://github.com/pytorch/pytorch/pull/25830.
Overall, we want to provide a recording observer for recording the runtime tensor values of activation path in order to debug the numerical accuracy loss offline.
According to the feedback from https://github.com/pytorch/pytorch/issues/25830, it might be better to record all the observers in a dict and query the dict to get corresponding tensor values. hx89 is working on how to insert the recording observers into model under debug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26413
Differential Revision: D17506502
Pulled By: llyfacebook
fbshipit-source-id: 3ab90dc78920e7ec3fa572c2a07327a9991c530a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975
We would like to add the FP16 weight support for the dynamic quantized LSTM.
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization
-- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated
Total time: 13.9 sec
Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep 9
22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.184s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
Summary (total time 4.35s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D17299116
fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25954
Add torch.nn.LSTM into the default dynamic quantize mappings. We will by default dynamic quantize LSTM when we apply the quantize_dynamic API.
ghstack-source-id: 89839673
Test Plan: CI
Differential Revision: D17294958
fbshipit-source-id: 824aceef821276b3e28c52ce3bebafaf9b0a0833
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25881
Add Dropout to blacklist to avoid the error in eager mode quantization.
ghstack-source-id: 89759536
Test Plan: Test locally in python notebook.
Reviewed By: jianyuh
Differential Revision: D17270826
fbshipit-source-id: bcf43483976740564d7f407838f25c2dbb67b016
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25157
Add the dynamic quantized LSTM module.
TODO (separate PRs):
- Serialization.
- Bias can be Null.
ghstack-source-id: 89443731
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
```
[jianyuhuang@devvm2816.prn3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_q
uantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.4 sec
Building: finished in 4.0 sec (100%) 8122/8122 jobs, 2 updated
Total time: 5.5 sec
Trace available for this run at /tmp/testpilot.20190902-164918.1275502.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision b61bc0e3b71033578eddfe0a28b0739bc685663f fbpkg 3b1c1aed1c534c0cb161a981eca6e2f0 at Sun Sep 1 20:58:52 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/690/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 1.048 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 1.049s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
Summary (total time 5.53s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D16955662
fbshipit-source-id: 61cf1a74913105fa02e44b3941813eabac0006b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24418Fixes#24394
The observer is not added correctlty, because one of the conditions is not met.
Test Plan: Imported from OSS
Differential Revision: D16833951
Pulled By: zafartahirov
fbshipit-source-id: bb4699e6a1cf6368c7278272a68e5e7c6d3f59a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753
Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ
Differential Revision: D16820749
fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24232
As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r306650311, we will make the keys of default_qconfig_dict as `torch.nn.Linear`. That is, we will do the dynamic quantization on the `torch.nn.Linear` by default, if the user just specify `torch.quantize_dynamic(model)`.
ghstack-source-id: 88287089
Differential Revision: D16781191
fbshipit-source-id: 991a5e151a9ea32b879d6897cd9862855d747135
Summary:
We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization.
Before this Diff, to dynamic quantize the BERT model, we have to specify each layer:
```
qconfig_dict = {
'encoder.layer.0.attention.self.query': default_qconfig,
'encoder.layer.0.attention.self.key': default_qconfig,
'encoder.layer.0.attention.self.value': default_qconfig,
'encoder.layer.0.attention.output.dense': default_qconfig,
'encoder.layer.0.intermediate.dense': default_qconfig,
'encoder.layer.0.output.dense': default_qconfig,
'encoder.layer.1.attention.self.query': default_qconfig,
'encoder.layer.1.attention.self.key': default_qconfig,
'encoder.layer.1.attention.self.value': default_qconfig,
'encoder.layer.1.attention.output.dense': default_qconfig,
'encoder.layer.1.intermediate.dense': default_qconfig,
'encoder.layer.1.output.dense': default_qconfig,
...
}
```
After this Diff, we only need the following
```
qconfig_dict = {
torch.nn.Linear : default_qconfig
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212
ghstack-source-id: 88287091
Reviewed By: zafartahirov
Differential Revision: D16436542
fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232
Differential Revision: D16258664
fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635
It appears it is the same complexity to add new modules using a base class and using a generation script.
Test Plan: Imported from OSS
Differential Revision: D16593364
Pulled By: zafartahirov
fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465
We decided not to allow user to use qconfig_dict to do quantization
since that API is not robust.
Differential Revision: D16611504
fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5
Summary:
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650
Differential Revision: D16379374
fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Reviewed By: zafartahirov
Differential Revision: D16199356
fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c