Summary:
The goal is to implement cross layer equalization as described in section 4.1 in this paper: https://arxiv.org/pdf/1906.04721.pdf
Given two adjacent submodules in a trained model, A,B quantization might hurt one of the submodules more than the other. The paper poses the idea that a loss in accuracy from quantizing can be due to a difference in the channel ranges between the two submodules (the output channel range of A can be small, while the input channel range of B can be large). To minimize this source of error, we want to scale the tensors of A,B s.t. their channel ranges are equal (them being equal means no difference in ranges and minimizes this source of error).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41685
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D22630219
Pulled By: edmundw314
fbshipit-source-id: ccc91ba12c10b652d7275222da8b85455b8a7cd5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40076
Pull Request resolved: https://github.com/pytorch/glow/pull/4606
[PyPer][quant] Add quantized embedding operators to OSS.
This is the first step in supporting Graph Mode Quantization for EmbeddingBag.
At a high level, the next steps would be
a) Implementation of Embedding prepack/unpack operators,
b) Implementation of torch.nn.quantized.dynamic.EmbeddingBag Module,
c) Implementation of torch.nn.quantized.EmbeddingBag Module,
d) Implementation (modification) of IR passes to support graph quantization of EmbeddingBag module.
More in-depth details regarding each step will be in the follow up diffs. Consider this as an initial diff that moves operators to respective places that's required for us to proceed.
Test Plan: ```buck test mode/no-gpu caffe2/test:quantization -- --stress-runs 100 test_embedding_bag```
Reviewed By: supriyar
Differential Revision: D21949828
fbshipit-source-id: cad5ed0a855db7583bddb1d93e2da398c128024a
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39677
Test Plan:
Moved a test class suite between files, wanted to have same functionality (simple code refactor) so tested to make sure the test output was the same before/after the refactor.
Image below shows the output of TestGraphModePostTrainingStatic before refactor
{F239676498}
This image shows the output of TestQuantizeScript (renamed version that is in test_quantize_script.py instead of test_quantize.py)
{F239676509}
Differential Revision: D21940638
Pulled By: edmundw314
fbshipit-source-id: 54160a5151aadf3a34bdac2bcaeb52904e6653ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40144
as title, split remaining quantization tests out of test_jit to reduce
the size of test_jit
Test Plan: Imported from OSS
Differential Revision: D22085034
Pulled By: wanchaol
fbshipit-source-id: 0c8639da01ffc3e6a72e6f470837786c73a6b3f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40127
Reland PR.
Similar to static quant, break it up into op level tests and tests for jit passes
Test Plan:
python test/test_quantization.py TestQuantizeScriptPTDQOps
python test/test_quantization.py TestDynamicQuantizeScriptJitPasses
Imported from OSS
Differential Revision: D22081259
fbshipit-source-id: cef8f78f89ef8789683b52508379ae1b9ad00700
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40039
Similar to static quant, break it up into op level tests and tests for jit passes
Test Plan:
python test/test_quantization.py TestQuantizeScriptPTDQOps
python test/test_quantization.py TestDynamicQuantizeScriptJitPasses
Imported from OSS
Differential Revision: D22071278
fbshipit-source-id: 54292addcfbc00f7af960fb333921db2ff9fda04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032
DataParallel requires all params and buffers of child modules to be updated
in place because of how it implements model replication during the
forward pass (see https://github.com/pytorch/pytorch/pull/12671 for
context). Any params or buffers not updated in place are lost and not
propagated back to the master.
This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut
point) to do their parameter update in-place. This will enable static
quant and QAT to work correctly with DataParallel.
TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR.
Test Plan:
script failed before and passes after the diff:
https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40
TODO before land: add integration testing
Imported from OSS
Differential Revision: D21206454
fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37366
- we can put both fake quant module and observer module tests in the test_workflow_module.py
- added test_quantized_functional.py
- moved tests in test_numerics.py to test_quantize.py and removed test_numerics.py
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D21282198
fbshipit-source-id: 60107cee7d1ed2cd14a45650e91ec28b8a262c52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35586
This pass fuses the choose_qparams-quant-dequant sequence
Fusion for weight tensor is the same as static quant.
Test Plan:
python test/test_quantize_script.py
Imported from OSS
Differential Revision: D20755680
fbshipit-source-id: b7443770642b6e6fa0fa9da8a44637e9b2d4df70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35455
In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator.
Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point.
We mimic those calculations in the new observer.
Test Plan:
python test/test_quantization.py ObserverTest
Imported from OSS
Differential Revision: D20664586
fbshipit-source-id: e987ea71fff777c21e00c498504e6586e92568a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35265
In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator.
Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point.
We mimic those calculations in the new observer.
Test Plan:
python test/test_quantization.py ObserverTest
Imported from OSS
Differential Revision: D20630988
fbshipit-source-id: 7e7aca77590f965dcb423a705e68d030aaf98550
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927
Test Plan:
test will be added in later PRs
Imported from OSS
Differential Revision: D20354879
fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33852
This fixes an issue for QAT models. During eval if we call `prepare_qat` and `convert` before calling `load_state_dict` it throws an error because the weight info (num channels) is not updated in the observer module.
It is not an issue for per-tensor case
Fixes issue #33830
Test Plan:
python test/test_quantization.py EagerModePostTrainingQuantTest.test_eval_after_train
python test/test_quantization.py EagerModeQuantizationAwareTrainingTest.test_eval_after_train
Imported from OSS
Differential Revision: D20212996
fbshipit-source-id: a04af8fe4df2e555270ae4d6693f5777d86f8a46
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757
This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile
Test Plan:
python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn
Imported from OSS
Differential Revision: D19632220
fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31970
Now that the ClassType can be shared among different module instances, we'll
preserve the sharing in clone as well, that is if the original module has
a ClassType that is shared, we'll clone this ClassType once and share it between
different module instances as well.
Test Plan:
build/test/test_jit
Imported from OSS
Differential Revision: D19406251
fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892
Fixes all outstanding lints and actually installs a properly configured
flake8
Test Plan: Imported from OSS
Differential Revision: D18862825
Pulled By: suo
fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890
We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing
Test Plan: Imported from OSS
Differential Revision: D18857597
Pulled By: jamesr66a
fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6
Summary:
In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245
Differential Revision: D18671597
Pulled By: lly-zero-one
fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122
Differential Revision: D18630541
Pulled By: lly-zero-one
fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331Closes#27954
This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:
1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.
The `forward` of the traced LSTM before and after this change are as follows:
Before
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
hx, hx0, = argument_2
_0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_0, (_1, _2))
```
After
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
_0 = self.cell._all_weight_values
_1 = getattr(_0, "0").param
_2 = getattr(_0, "1").param
hx, hx0, = argument_2
_3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_3, (_4, _5))
```
Test Plan: Imported from OSS
Differential Revision: D18374904
Pulled By: jamesr66a
fbshipit-source-id: f1a9b58998bc365b9baad38c21fd4bb510dd639c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331Closes#27954
This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:
1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.
The `forward` of the traced LSTM before and after this change are as follows:
Before
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
hx, hx0, = argument_2
_0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_0, (_1, _2))
```
After
```
def forward(self,
input: Tensor,
argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
_0 = self.cell._all_weight_values
_1 = getattr(_0, "0").param
_2 = getattr(_0, "1").param
hx, hx0, = argument_2
_3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
return (_3, (_4, _5))
```
Test Plan: Imported from OSS
Differential Revision: D18359880
Pulled By: jamesr66a
fbshipit-source-id: 0ff2cad294a1871123015dfc704eaf73a7ac1d9e