Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24451406
fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24004795
fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45712
Eager mode will still be able to use functional leaky relu, but it will be less accurate than
LeakyReLU module.
FX graph mode will support both leaky relu functional and module
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D24069961
fbshipit-source-id: 8d91c3c50c0bcd068ba3072378ebb4da9549be3b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43587
Add tests for graph mode quantization on torchvision and make sure it matches
current eager mode quantization
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: z-a-f
Differential Revision: D23331253
fbshipit-source-id: 0445a44145d99837a2c975684cd0a0b7d965c8f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43526
Add tests for graph mode quantization on torchvision and make sure it matches
current eager mode quantization
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23306683
fbshipit-source-id: 30d27e225d4557bfc1d9aa462086e416aa9a9c0e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43175
This PR added graph mode quantization on fx: https://github.com/pytorch/pytorch/pull/42741
Currently it matches eager mode quantization for torchvision with static/dynamic/qat
ddp/synbn test is still wip
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23178602
fbshipit-source-id: 8e7e0322846fbda2cfa79ad188abd7235326f879
Summary:
This diff adds FakeQuantizeWithBackward. This works the same way as the regular FakeQuantize module, allowing QAT to occur in the forward pass, except it has an additional quantize_backward parameter. When quantize_backward is enabled, the gradients are fake quantized as well (dynamically, using hard-coded values). This allows the user to see whether there would be a significant loss of accuracy if the gradients were quantized in their model.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40532
Test Plan: The relevant test for this can be run using `python test/test_quantization.py TestQATBackward.test_forward_and_backward`
Reviewed By: supriyar
Differential Revision: D22217029
Pulled By: durumu
fbshipit-source-id: 7055a2cdafcf022f1ea11c3442721ae146d2b3f2
Summary:
The goal is to implement cross layer equalization as described in section 4.1 in this paper: https://arxiv.org/pdf/1906.04721.pdf
Given two adjacent submodules in a trained model, A,B quantization might hurt one of the submodules more than the other. The paper poses the idea that a loss in accuracy from quantizing can be due to a difference in the channel ranges between the two submodules (the output channel range of A can be small, while the input channel range of B can be large). To minimize this source of error, we want to scale the tensors of A,B s.t. their channel ranges are equal (them being equal means no difference in ranges and minimizes this source of error).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41685
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D22630219
Pulled By: edmundw314
fbshipit-source-id: ccc91ba12c10b652d7275222da8b85455b8a7cd5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40076
Pull Request resolved: https://github.com/pytorch/glow/pull/4606
[PyPer][quant] Add quantized embedding operators to OSS.
This is the first step in supporting Graph Mode Quantization for EmbeddingBag.
At a high level, the next steps would be
a) Implementation of Embedding prepack/unpack operators,
b) Implementation of torch.nn.quantized.dynamic.EmbeddingBag Module,
c) Implementation of torch.nn.quantized.EmbeddingBag Module,
d) Implementation (modification) of IR passes to support graph quantization of EmbeddingBag module.
More in-depth details regarding each step will be in the follow up diffs. Consider this as an initial diff that moves operators to respective places that's required for us to proceed.
Test Plan: ```buck test mode/no-gpu caffe2/test:quantization -- --stress-runs 100 test_embedding_bag```
Reviewed By: supriyar
Differential Revision: D21949828
fbshipit-source-id: cad5ed0a855db7583bddb1d93e2da398c128024a
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39677
Test Plan:
Moved a test class suite between files, wanted to have same functionality (simple code refactor) so tested to make sure the test output was the same before/after the refactor.
Image below shows the output of TestGraphModePostTrainingStatic before refactor
{F239676498}
This image shows the output of TestQuantizeScript (renamed version that is in test_quantize_script.py instead of test_quantize.py)
{F239676509}
Differential Revision: D21940638
Pulled By: edmundw314
fbshipit-source-id: 54160a5151aadf3a34bdac2bcaeb52904e6653ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40144
as title, split remaining quantization tests out of test_jit to reduce
the size of test_jit
Test Plan: Imported from OSS
Differential Revision: D22085034
Pulled By: wanchaol
fbshipit-source-id: 0c8639da01ffc3e6a72e6f470837786c73a6b3f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40127
Reland PR.
Similar to static quant, break it up into op level tests and tests for jit passes
Test Plan:
python test/test_quantization.py TestQuantizeScriptPTDQOps
python test/test_quantization.py TestDynamicQuantizeScriptJitPasses
Imported from OSS
Differential Revision: D22081259
fbshipit-source-id: cef8f78f89ef8789683b52508379ae1b9ad00700
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40039
Similar to static quant, break it up into op level tests and tests for jit passes
Test Plan:
python test/test_quantization.py TestQuantizeScriptPTDQOps
python test/test_quantization.py TestDynamicQuantizeScriptJitPasses
Imported from OSS
Differential Revision: D22071278
fbshipit-source-id: 54292addcfbc00f7af960fb333921db2ff9fda04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032
DataParallel requires all params and buffers of child modules to be updated
in place because of how it implements model replication during the
forward pass (see https://github.com/pytorch/pytorch/pull/12671 for
context). Any params or buffers not updated in place are lost and not
propagated back to the master.
This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut
point) to do their parameter update in-place. This will enable static
quant and QAT to work correctly with DataParallel.
TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR.
Test Plan:
script failed before and passes after the diff:
https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40
TODO before land: add integration testing
Imported from OSS
Differential Revision: D21206454
fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37366
- we can put both fake quant module and observer module tests in the test_workflow_module.py
- added test_quantized_functional.py
- moved tests in test_numerics.py to test_quantize.py and removed test_numerics.py
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D21282198
fbshipit-source-id: 60107cee7d1ed2cd14a45650e91ec28b8a262c52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35586
This pass fuses the choose_qparams-quant-dequant sequence
Fusion for weight tensor is the same as static quant.
Test Plan:
python test/test_quantize_script.py
Imported from OSS
Differential Revision: D20755680
fbshipit-source-id: b7443770642b6e6fa0fa9da8a44637e9b2d4df70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35455
In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator.
Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point.
We mimic those calculations in the new observer.
Test Plan:
python test/test_quantization.py ObserverTest
Imported from OSS
Differential Revision: D20664586
fbshipit-source-id: e987ea71fff777c21e00c498504e6586e92568a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35265
In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator.
Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point.
We mimic those calculations in the new observer.
Test Plan:
python test/test_quantization.py ObserverTest
Imported from OSS
Differential Revision: D20630988
fbshipit-source-id: 7e7aca77590f965dcb423a705e68d030aaf98550
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927
Test Plan:
test will be added in later PRs
Imported from OSS
Differential Revision: D20354879
fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33852
This fixes an issue for QAT models. During eval if we call `prepare_qat` and `convert` before calling `load_state_dict` it throws an error because the weight info (num channels) is not updated in the observer module.
It is not an issue for per-tensor case
Fixes issue #33830
Test Plan:
python test/test_quantization.py EagerModePostTrainingQuantTest.test_eval_after_train
python test/test_quantization.py EagerModeQuantizationAwareTrainingTest.test_eval_after_train
Imported from OSS
Differential Revision: D20212996
fbshipit-source-id: a04af8fe4df2e555270ae4d6693f5777d86f8a46
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757
This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile
Test Plan:
python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn
Imported from OSS
Differential Revision: D19632220
fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31970
Now that the ClassType can be shared among different module instances, we'll
preserve the sharing in clone as well, that is if the original module has
a ClassType that is shared, we'll clone this ClassType once and share it between
different module instances as well.
Test Plan:
build/test/test_jit
Imported from OSS
Differential Revision: D19406251
fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892
Fixes all outstanding lints and actually installs a properly configured
flake8
Test Plan: Imported from OSS
Differential Revision: D18862825
Pulled By: suo
fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890
We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing
Test Plan: Imported from OSS
Differential Revision: D18857597
Pulled By: jamesr66a
fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6
Summary:
In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245
Differential Revision: D18671597
Pulled By: lly-zero-one
fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122
Differential Revision: D18630541
Pulled By: lly-zero-one
fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331Closes#27954
This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:
1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.
The `forward` of the traced LSTM before and after this change are as follows:
Before
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
hx, hx0, = argument_2
_0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_0, (_1, _2))
```
After
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
_0 = self.cell._all_weight_values
_1 = getattr(_0, "0").param
_2 = getattr(_0, "1").param
hx, hx0, = argument_2
_3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_3, (_4, _5))
```
Test Plan: Imported from OSS
Differential Revision: D18374904
Pulled By: jamesr66a
fbshipit-source-id: f1a9b58998bc365b9baad38c21fd4bb510dd639c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331Closes#27954
This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:
1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.
The `forward` of the traced LSTM before and after this change are as follows:
Before
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
hx, hx0, = argument_2
_0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_0, (_1, _2))
```
After
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
_0 = self.cell._all_weight_values
_1 = getattr(_0, "0").param
_2 = getattr(_0, "1").param
hx, hx0, = argument_2
_3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_3, (_4, _5))
```
Test Plan: Imported from OSS
Differential Revision: D18359880
Pulled By: jamesr66a
fbshipit-source-id: 0ff2cad294a1871123015dfc704eaf73a7ac1d9e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27347
it's already done in the op, we don't need to permute again
Test Plan:
test_jit.py
we'll test in e2e tests
Imported from OSS
Differential Revision: D18182919
fbshipit-source-id: 04dd2a19a719828fbc7b62e451b81752187e0fcb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27269
Remove `test_quantizer.py`, add and rewrite one of the tests in `test_quantizer`
in `test_quantization.py`
The conv test is removed for now since conv pattern is still broken, we'll add another test
later
ghstack-source-id: 92869823
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D18182916
fbshipit-source-id: 325b5d8e877228d6a513e3ddf52c974479250d42
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396
Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018
Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details
Differential Revision: D17727213
fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27151
We need to be ab le to handle observers with no min/max data correctly as models sometimes have modules that do not get any data.
ghstack-source-id: 91113403
Test Plan:
buck test caffe2/test:quantization -- test_minmax_observer
buck test caffe2/test:quantization -- test_per_channel_minmax_observer
buck test caffe2/test:quantization --test_histogram_observer
Reviewed By: csummersea
Differential Revision: D17690828
fbshipit-source-id: e95709333ea0f66d79ddb8141b7cba5a83347dbd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26457
Enhancement to fuse module to support sequentials, fuse list can now be just like the state dict.
Also add support for Conv-Relu and linear-relu fusion
Also support inplace and out of place fusion of models.
ghstack-source-id: 91076386
Test Plan:
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details
Differential Revision: D17466382
fbshipit-source-id: 0a548f8f4c366f3ecc59db693bac725ccd62328e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26612
Add support for add relu functional module, this allows for fusion of add and relu quantized operations
ghstack-source-id: 91055976
Test Plan: buck test caffe2/test:quantization -- 'test_functional_module \(test_quantization\.FunctionalModuleTest\)' --print-passing-details
Differential Revision: D17518268
fbshipit-source-id: e1e8b4655d6b32405863ab9d1c7da111fb4343cc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516
ghstack-source-id: 90982010
Test Plan:
Integrate per-channel support into conv and linear modules.
The following tests pass:
buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details
Differential Revision: D17342622
fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26390
`quantize_script`: top level API for graph mode quantization
Test Plan:
there are some known issues, we can enable test after all known issues are fixed.
Imported from OSS
Differential Revision: D17645132
fbshipit-source-id: 61f261d5607409d493b39a2f4e05ebd017279f6b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782
At least we should be consistent on top-level APIs and prepare/convert/etc.
Logic is inplace=False by default but top-level APIs take care of doing fewer copies.
Also renames always-inplace methods like add_observer to have underscore in the end.
One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around
Test Plan: Imported from OSS
Differential Revision: D17595956
Pulled By: dzhulgakov
fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709
Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.
One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.
Test Plan: Imported from OSS
Differential Revision: D17544103
Pulled By: dzhulgakov
fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492
Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided.
Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention.
Test Plan: Imported from OSS
Differential Revision: D17521265
Pulled By: dzhulgakov
fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574
Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut
Test Plan:
ci
Imported from OSS
Differential Revision: D17514876
fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
Summary:
Mainly want to resolve comments from https://github.com/pytorch/pytorch/pull/25830.
Overall, we want to provide a recording observer for recording the runtime tensor values of activation path in order to debug the numerical accuracy loss offline.
According to the feedback from https://github.com/pytorch/pytorch/issues/25830, it might be better to record all the observers in a dict and query the dict to get corresponding tensor values. hx89 is working on how to insert the recording observers into model under debug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26413
Differential Revision: D17506502
Pulled By: llyfacebook
fbshipit-source-id: 3ab90dc78920e7ec3fa572c2a07327a9991c530a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975
We would like to add the FP16 weight support for the dynamic quantized LSTM.
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization
-- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated
Total time: 13.9 sec
Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep 9
22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.184s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
Summary (total time 4.35s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D17299116
fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25887
ghstack-source-id: 90383258
Add per channel observer to compute the qparams for each channel.
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer'
buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer_scriptable'
Differential Revision: D17137226
fbshipit-source-id: 0b1c93e3cbcda86f5c4e30f7cd94c670f2665063
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24022
In histogram observer add an approximation for L2 error minimization for selecting min/max.
By selecting new min/max, we filter out outliers in input distribution.
This follows the implementation of NormMinimization::NonlinearQuantizationParamsSearch in caffe2/quantization/server/norm_minimization.cc
ghstack-source-id: 90298789
Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'
Differential Revision: D16713239
fbshipit-source-id: 82631ba47974e25689c9c66bc3088117090e26d4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959
Add histogram observer that records the running histogram of tensor values along with min/max values.
ghstack-source-id: 90076996
Test Plan:
Added a test test_histogram_observer
buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'
buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'
Differential Revision: D16692835
fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976
As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051:
> We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability.
ghstack-source-id: 89929947
Test Plan: CI
Differential Revision: D17310441
fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25667
Relax scale and zero-point for activations to ensure that fbgemm implementations of conv and linear do not saturate due to 16 bit intermediate accumulation.
Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in
handling layouts for quantized ops in addition to saturation/quantization errors.
ghstack-source-id: 89587942
Test Plan:
buck test caffe2/test:quantized -- 'test_float_quant_compare \(test_quantized_models\.ModelNumerics\)' --print-passing-details
Passes when SQNR > 35 dB
buck test caffe2/test:quantization -- 'test_minmax_observer \(test_quantization\.ObserverTest\)' --print-passing-details
Passes with additional coverage for observer changes
Differential Revision: D17140498
fbshipit-source-id: 42c58e726bb0b0f51890590ee2525428f9a8d24e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25157
Add the dynamic quantized LSTM module.
TODO (separate PRs):
- Serialization.
- Bias can be Null.
ghstack-source-id: 89443731
Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
```
[jianyuhuang@devvm2816.prn3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_q
uantization\.PostTrainingDynamicQuantTest\)' --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.4 sec
Building: finished in 4.0 sec (100%) 8122/8122 jobs, 2 updated
Total time: 5.5 sec
Trace available for this run at /tmp/testpilot.20190902-164918.1275502.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision b61bc0e3b71033578eddfe0a28b0739bc685663f fbpkg 3b1c1aed1c534c0cb161a981eca6e2f0 at Sun Sep 1 20:58:52 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/690/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 1.048 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 1.049s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
Summary (total time 5.53s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D16955662
fbshipit-source-id: 61cf1a74913105fa02e44b3941813eabac0006b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753
Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ
Differential Revision: D16820749
fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
Summary:
We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization.
Before this Diff, to dynamic quantize the BERT model, we have to specify each layer:
```
qconfig_dict = {
'encoder.layer.0.attention.self.query': default_qconfig,
'encoder.layer.0.attention.self.key': default_qconfig,
'encoder.layer.0.attention.self.value': default_qconfig,
'encoder.layer.0.attention.output.dense': default_qconfig,
'encoder.layer.0.intermediate.dense': default_qconfig,
'encoder.layer.0.output.dense': default_qconfig,
'encoder.layer.1.attention.self.query': default_qconfig,
'encoder.layer.1.attention.self.key': default_qconfig,
'encoder.layer.1.attention.self.value': default_qconfig,
'encoder.layer.1.attention.output.dense': default_qconfig,
'encoder.layer.1.intermediate.dense': default_qconfig,
'encoder.layer.1.output.dense': default_qconfig,
...
}
```
After this Diff, we only need the following
```
qconfig_dict = {
torch.nn.Linear : default_qconfig
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212
ghstack-source-id: 88287091
Reviewed By: zafartahirov
Differential Revision: D16436542
fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232
Differential Revision: D16258664
fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635
It appears it is the same complexity to add new modules using a base class and using a generation script.
Test Plan: Imported from OSS
Differential Revision: D16593364
Pulled By: zafartahirov
fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23891
This adds an initial set of testing coverage for quantization that checks if the modules can be scripted. Testing for tracing and serialization is forthcoming
Test Plan: Imported from OSS
Differential Revision: D16698045
Pulled By: jamesr66a
fbshipit-source-id: 96d80d938b816220af72359165a7b96d998a30c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23858
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23718
Changes:
- Enable tests for quantization test files in `run_tests.py`
- Remove `__future__` imports from `torch/nn/qat/modules/__init__.py`, since `unicode_literals` messes up imports on python2 because the elements in `__all__` will be Unicode and not string
- Skip PostTrainingQuantTests if the build doesn't have FBGEMM (only a small subset of targets in tests) or if testing under UBSAN (the suppression file doesn't seem to work)
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D16639467
Pulled By: jamesr66a
fbshipit-source-id: 532766797c216976dd7e07d751f768ff8e0fc207
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465
We decided not to allow user to use qconfig_dict to do quantization
since that API is not robust.
Differential Revision: D16611504
fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003
torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu
Fusion function to combine specific modules: (conv,bn) and (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.
TODO: Add tests for _intrinsic modules.
Conv BN fusion code is based on DsKhudia's implementation
Differential Revision: D16199720
fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
Summary:
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650
Differential Revision: D16379374
fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732
Add support for quantization aware training in eager mode
Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
* previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
* def forward_pre_hook(self, input):
self.float_weight = self.weight
self.weight = self.fake_quantize(self.float_weight)
def forward_hook(self, input):
self.weight = self.float_weight
```
* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function
## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.
Reviewed By: zafartahirov
Differential Revision: D16199356
fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22706
Moved the models used for quantization test from the test_quantization.py file to common_quantization.py
Reviewed By: jerryzh168
Differential Revision: D16189865
fbshipit-source-id: 409b43454b6b3fe278ac16b1affb9085d6ed6835
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22694
Move quantization and quantized utility functions for testing to common_quantized.py and common_quantization.py. Addditionally, add a quantized test case base class which contains common methods for checking the results of quantization on modules. As a consequence of the move, fixed the import at the top of test_quantized.py, and test_quantization to use the new utility
Reviewed By: jerryzh168
Differential Revision: D16172012
fbshipit-source-id: 329166af5555fc829f26bf1383d682c25c01a7d9