Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956
Makes buffer shapes for HistogramObserver have the
same shapes in uninitialized versus initialized states.
This is useful because the detectron2 checkpointer assumes
that these states will stay the same, so it removes the
need for manual hacks around the shapes changing.
Test Plan:
```
python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape
```
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D23785382
fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773
The model is created and prepared using fx APIs and then scripted for training.
In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant
and observer modules on it.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23741354
fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749
Ensure fx module is scriptable after calling prepare_qat on it
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23718380
fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537
Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers. They had custom
state_dict save/load code to ensure their state was saved.
At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)
In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices
This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment. We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.
There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.
Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D23644493
fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44217
Move the tests to static ones as well
Test Plan:
python test/test_quantization.py TestStaticQuantizedModule.test_embedding_bag_api
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D23547386
fbshipit-source-id: 41f81c31e1613098ecf6a7eff601c7dcd4b09c76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44208
Add quantized module in static quantization namespace. Embedding
quantization requires only weights to be quantized so it is static.
Internally it calls the embedding_bag_byte op with the offsets set corresponding to the
indices.
Future PR will move EmbeddingBag quantization from dynamic to static as well.
Test Plan:
python test/test_quantization.py test_embedding_api
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23547384
fbshipit-source-id: eddc6fb144b4a771060e7bab5853656ccb4443f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44042
Missed one case last time
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23479345
fbshipit-source-id: 30e6713120c494e9fab5584de4df9b25bec83d32
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44125
In `Quantizer._prepare`, `observed` was used for two different variables
with different types. Making the names a bit cleaner and removing the
name conflict.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: dskhudia
Differential Revision: D23504109
fbshipit-source-id: 0f73eac3d6dd5f72ad5574a4d47d33808a70174a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44092
instead submodules and weights are installed directly on the
graph_module by transferring the original modules. This makes it more
likely that scripting will succeed (since we no longer have submodules
that are not used in the trace). It also prevents layered transforms
from having to special case handling of the `root` module. GraphModules
can now be re-traced as part of the input to other transforms.
Test Plan: Imported from OSS
Reviewed By: jamesr66a
Differential Revision: D23504210
Pulled By: zdevito
fbshipit-source-id: f79e5c4cbfc52eb0ffb5d6ed89b37ce35a7dc467
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44115
Fixes device affinity in the FX prepare pass for QAT. Before this PR, observers
were always created on CPU. After this PR, observers are created on the
same device as the rest of the model. This will enable QAT prepare to
work regardless of whether users move the model to cuda before or after
calling this pass.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_qat_prepare_device_affinity
```
Imported from OSS
Reviewed By: supriyar
Differential Revision: D23502291
fbshipit-source-id: ec4ed20c21748a56a25e3395b35ab8640d71b5a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43927
Adds uninitialized placeholders for various state
used throughout the Quantizer object, with documentation
on what they are. No logic change.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFx
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23439473
fbshipit-source-id: d4ae83331cf20d81a7f974f88664ccddca063ffc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43914
Renames `matches` function to `is_match`, since there is also
a list named `matches` we are passing around in `Quantizer`,
and would be good to decrease name conflicts.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23435601
fbshipit-source-id: 394af11e0120cfb07dedc79d5219247330d4dfd6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43910
Adds a debug function to get a representation of all nodes in the
graph, such as
```
name op target args kwargs
x plchdr x () {}
linear_weight gt_prm linear.weight () {}
add_1 cl_fun <bi_fun add> (x, linear_weight) {}
linear_1 cl_mod linear (add_1,) {}
relu_1 cl_meth relu (linear_1,) {}
sum_1 cl_fun <bi_meth sum> (relu_1,) {'dim': -1}
topk_1 cl_fun <bi_meth topk> (sum_1, 3) {}
```
using only Python STL. This is useful for printing internal state of
graphs when working on FX code.
Has some on-by-default logic to shorten things so that node reprs for
toy models and unit tests fit into 80 chars.
Flexible on function name and location, I care more that this is
accessible from both inside PT as well as from debug scripts which
are not checked in.
Test Plan:
see
https://gist.github.com/vkuzo/ed0a50e5d6dc7442668b03bb417bd603 for
example usage
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23435029
fbshipit-source-id: 1a2df797156a19cedd705e9e700ba7098b5a1376
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43892
Run weight observer in the convert function, so user do not need to run calibration
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23429758
fbshipit-source-id: 5bc222e3b731789ff7a86463c449690a58dffb7b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43902
Trace back from the weight node util we hit getattr, reconstruct the graph module with the traced nodes
and run the graph module to pack the weight. then replace the original chain of ops with the packed weight.
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23432431
fbshipit-source-id: 657f21a8287494f7f87687a9d618ca46376d3aa3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43901
Add similar APIs like eager and graph mode on torchscript
- fuse_fx
- quantize_fx (for both post training static and qat)
- quantize_dynamic_fx (for post training dynamic)
- prepare_fx (for both post training static and qat)
- prepare_dynamic_fx (for post training dynamic)
- convert_fx (for all modes)
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23432430
fbshipit-source-id: fc99eb75cbecd6ee7a3aa6c8ec71cd499ff7e3c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43789
Since it's single element.. In some cases we may not be able to resize the
buffers.
Test Plan: unit tests
Reviewed By: supriyar
Differential Revision: D23393108
fbshipit-source-id: 46cd7f73ed42a05093662213978a01ee726433eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43728
Trace back from the weight node util we hit getattr, reconstruct the graph module with the traced nodes
and run the graph module to pack the weight. then replace the original chain of ops with the packed weight.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23385090
fbshipit-source-id: 11341f0af525a02ecec36f163a9cd35dee3744a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43581
Add similar APIs like eager and graph mode on torchscript
- fuse_fx
- quantize_fx (for both post training static and qat)
- quantize_dynamic_fx (for post training dynamic)
- prepare_fx (for both post training static and qat)
- prepare_dynamic_fx (for post training dynamic)
- convert_fx (for all modes)
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23385091
fbshipit-source-id: b789e54e1a0f3af6b026fd568281984e253e0433
Summary:
It's useful if we add additional attributed to nodes in the graph - it's easier to set the attribute on all nodes, even if the value would happen to be None.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43432
Reviewed By: jamesr66a
Differential Revision: D23276433
Pulled By: dzhulgakov
fbshipit-source-id: c69e7cb723bbbb4dba3b508a3d6c0e456fe610df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43587
Add tests for graph mode quantization on torchvision and make sure it matches
current eager mode quantization
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: z-a-f
Differential Revision: D23331253
fbshipit-source-id: 0445a44145d99837a2c975684cd0a0b7d965c8f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43526
Add tests for graph mode quantization on torchvision and make sure it matches
current eager mode quantization
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23306683
fbshipit-source-id: 30d27e225d4557bfc1d9aa462086e416aa9a9c0e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43286
We need to use this in graph mode quantization on fx
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23221734
fbshipit-source-id: 7c3c3840ce5bdc185b962e081aff1618f4c58e85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43175
This PR added graph mode quantization on fx: https://github.com/pytorch/pytorch/pull/42741
Currently it matches eager mode quantization for torchvision with static/dynamic/qat
ddp/synbn test is still wip
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23178602
fbshipit-source-id: 8e7e0322846fbda2cfa79ad188abd7235326f879
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43151
Using `torch.all` instead of `torch.sum` and length check.
It's unclear whether the increase in perf (~5% for small inputs) is
real, but should be a net benefit, especially for larger channel inputs.
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23170426
fbshipit-source-id: ee5c25eb93cee1430661128ac9458a9c525df8e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43150
The current logic was expensive because it created tensors on CUDA.
Switching to clamp since it can work without needing to create tensors.
Test Plan:
benchmarks
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23170427
fbshipit-source-id: 6fe3a728e737aca9f6c2c4d518c6376738577e21
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43149
This value doesn't change, making it a buffer to only pay
the cost of creating a tensor once.
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23170428
fbshipit-source-id: 6b963951a573efcc5b5a57649c814590b448dd72
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42602
In this diff, clearer semantics and namings for are introduced by splitting the original `init_dynamic_qrange` into 2 separate `Optional[int]` types `qmin` and `qmax` to avoid the confusion of the parameters with dynamic quantization.
The `qmin` and `qmax` parameters allow customers to specify their own customary quantization range and enables specific use cases for lower bit quantization.
Test Plan:
To assert the correctness and compatibility of the changes with existing observers, on a devvm, execute the following command to run the unit tests:
`buck test //caffe2/test:quantization -- observer`
Reviewed By: vkuzo, raghuramank100
Differential Revision: D22948334
fbshipit-source-id: 275bc8c9b5db4ba76fc2e79ed938376ea4f5a37c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43015
Currently activation_post_process are inserted by default in qat modules, which is not
friendly to automatic quantization tools, this PR removes them.
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D23105059
fbshipit-source-id: 3439ac39e718ffb0390468163bcbffd384802b57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42343
Currently activation_post_process are inserted by default in qat modules, which is not
friendly to automatic quantization tools, this PR removes them.
Test Plan: Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D22856816
fbshipit-source-id: 988a43bce46a992b38fd0d469929f89e5b046131
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42576
Previously we have qconfig propagate list and we only attach qconfig for modules
in the list, this works when everything is quantized in the form of module.
but now we are expanding quantization for functional/torch ops, we'll need to attach qconfig
to all modules
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D22939453
fbshipit-source-id: 7d6a1f73ff9bfe461b3afc75aa266fcc8f7db517
Summary:
This diff adds FakeQuantizeWithBackward. This works the same way as the regular FakeQuantize module, allowing QAT to occur in the forward pass, except it has an additional quantize_backward parameter. When quantize_backward is enabled, the gradients are fake quantized as well (dynamically, using hard-coded values). This allows the user to see whether there would be a significant loss of accuracy if the gradients were quantized in their model.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40532
Test Plan: The relevant test for this can be run using `python test/test_quantization.py TestQATBackward.test_forward_and_backward`
Reviewed By: supriyar
Differential Revision: D22217029
Pulled By: durumu
fbshipit-source-id: 7055a2cdafcf022f1ea11c3442721ae146d2b3f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42348
Use the dtype info in placeholderObserver to decide what ops to insert in the graph
In the next PR we can delete NoopObserver
Test Plan:
python test/test_quantization.py
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D22859457
fbshipit-source-id: a5c618f22315534ebd9a2df77b14a0aece196989
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42222
This change adds the necessary passes to perform FP16 dynamic quantization.
We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights
Test Plan:
python test/test_quantization.py TestQuantizeJitOps
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D22849220
fbshipit-source-id: 2c53594ecd2485e9e3dd0b380eceaf7c5ab5fc50
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42221
Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph
Test Plan:
python test/test_quantizaton.py TestObserver.test_fp16_observer
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D22849222
fbshipit-source-id: a301281ce38ba4d4e7a009308400d34a08c113d2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41969
In this diff, the `_LearnableFakeQuantize` module is extended to provide support for gradient scaling where the gradients for both scale and zero point are multiplied by a constant `g` (in some cases, can help with quicker convergence). In addition, it is also augmented to provide a factory method via `_with_args` such that a partial constructor of the module can be built.
Test Plan:
For correctness of the fake quantizer operators, on a devvm, enter the following command:
```
buck test //caffe2/torch:quantization -- learnable_py_module
```
Reviewed By: z-a-f
Differential Revision: D22715629
fbshipit-source-id: ff8e5764f81ca7264bf9333789f57e0b0cec7a72
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42034
In this diff, scale and zero point gradient calculations are updated to correctly reflect the actual backpropagation equation (instead of `dScale * dX`, the near-final output should be `dScale * dY`; the same applies to zero point).
Test Plan:
To execute the unit tests for all affected learnable fake quantize modules and kernels, on a devvm, execute the following command:
`buck test //caffe2/test:quantization -- learnable`
To enable the `cuda` tests, execute the following command:
`buck test mode/dev-nosan //caffe2/test:quantization -- learnable`
Reviewed By: jerryzh168
Differential Revision: D22735668
fbshipit-source-id: 45c1e0fd38cbb2d8d5e60be4711e1e989e9743b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42033
In this diff, the Python `_LearnableFakeQuantize` module is updated where the gradient with respect to the input `x` is actually computed instead of passed through. Argument naming is also updated for better clarity; and unit tests on the `PerTensor` and `PerChannel` operators are added for asserting correctness.
Test Plan:
On a devvm, execute the command:
`buck test //caffe2/test:quantization -- learnable_py_module`
To include `cuda` tests as well, run:
`buck test mode/dev-nosan //caffe2/test:quantization -- learnable_py_module`
Reviewed By: jerryzh168
Differential Revision: D22735580
fbshipit-source-id: 66bea7e9f8cb6422936e653500f917aa597c86de
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41930
As title
ghstack-source-id: 108517079
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D22698386
fbshipit-source-id: 4f748c9bae4a0b615aa69c7cc8d8e451e5d26863
Summary:
Added a logic so that if a prehook is passed into the prepare method during quantization, then the hook will be added as a prehook to all leaf nodes (and modules specified in the non_leaf_module_list).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41863
Test Plan:
Small demo, made simple module then called prepare with prehook parameter set to the numeric suite logger, printed the results to verify its what we wanted
{F245156246}
Reviewed By: jerryzh168
Differential Revision: D22671288
Pulled By: edmundw314
fbshipit-source-id: ce65a00830ff03360a82c0a075b3b6d8cbc4362e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41612
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.
To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.
Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.
NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.
Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag
Imported from OSS
Reviewed By: vkuzo, jerryzh168
Differential Revision: D22609342
fbshipit-source-id: 23e33f44a451c26719e6e283e87fbf09b584c0e6
Summary:
The goal is to implement cross layer equalization as described in section 4.1 in this paper: https://arxiv.org/pdf/1906.04721.pdf
Given two adjacent submodules in a trained model, A,B quantization might hurt one of the submodules more than the other. The paper poses the idea that a loss in accuracy from quantizing can be due to a difference in the channel ranges between the two submodules (the output channel range of A can be small, while the input channel range of B can be large). To minimize this source of error, we want to scale the tensors of A,B s.t. their channel ranges are equal (them being equal means no difference in ranges and minimizes this source of error).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41685
Test Plan: Imported from OSS
Reviewed By: z-a-f
Differential Revision: D22630219
Pulled By: edmundw314
fbshipit-source-id: ccc91ba12c10b652d7275222da8b85455b8a7cd5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41815
**All are minor changes to enable better simulations.**
The constructors of MinMaxObserver, MovingAverageMinMaxObserver, PerChannelMinMaxObserver, and MovingAveragePerChannelMinMaxObserver are augmented so they can utilize the dynamic quantization range support in the _ObserverBase class.
In addition, minor adjustments are made to the enable_static_observation function that allow observer to update parameters but do not fake quantize on the output (for constructing baseline).
Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:
```
buck test //caffe2/test:quantization -- observer
```
Reviewed By: z-a-f
Differential Revision: D22649128
fbshipit-source-id: 32393b706f9b69579dc2f644fb4859924d1f3773
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41535
A generalized fake quantization module is built to support lower-bit fake quantization with back propagation on the scale and zero point. The module supports both per tensor and per channel fake quantization.
Test Plan:
Please see diff D22337313 for a related experiment performed on the fake quantizer module.
The `_LearnableFakeQuantize` module supports the following use cases:
- Per Tensor Fake Quantization or Per Channel Fake Quantization
- Static Estimation from Observers or Quantization Parameter Learning through Back Propagation
By default, the module assumes per tensor affine fake quantization. To switch to per channel, during initialization, declare `channel_size` with the appropriate length. To toggle between utilizing static estimation and parameter learning with back propagation, you can invoke the call `enable_param_learning` or `enable_static_estimate`. For more information on the flags that support these operations, please see the doc string of the `_LearnableFakeQuantize` module.
The `_LearnableFakeQuantizer` module relies on 2 operators for its forward and backward paths: `_LearnableFakeQuantizePerTensorOp` and `_LearnableFakeQuantizePerChannelOp`. The backpropagation routine is developed based on the following literature:
- Learned Step Size Quantization: https://openreview.net/pdf?id=rkgO66VKDS
- Trained Quantization Thresholds: https://arxiv.org/pdf/1903.08066.pdf
Reviewed By: z-a-f
Differential Revision: D22573645
fbshipit-source-id: cfd9ece8a959ae31c00d9beb1acf9dfed71a7ea1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41113
In this diff, the `ObserverBase` class is augmented with 2 additional optional arguments qmin and qmax. Correspondingly the calculation of qmin and qmax and the related quantization parameters are modified to accommodate this additional flexibility should the number of bits for quantization be lower than 8 (the default value).
Additional logic in the base class `_calculate_qparams` function has also been modified to provide support for dynamic quantization range.
Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:
`buck test //caffe2/test:quantization -- observer`
This modified observer script can be tested within the experiments for lower bit fake quantization. Please see the following diffs for reference.
- Single Fake Quantizer: D22337447
- Single Conv Layer: D22338532
Reviewed By: z-a-f
Differential Revision: D22427134
fbshipit-source-id: f405e633289322078b0f4a417f54b684adff2549
Summary:
1. While do convert() preserve module's **pre and post forward** hooks
2. While do fusion preserve only module's **pre forward** hooks (because after fusion output no longer the same)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37233
Differential Revision: D22425141
Pulled By: jerryzh168
fbshipit-source-id: e69b81821d507dcd110d2ff3594ba94b9593c8da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40475
As title
ghstack-source-id: 106474870
Test Plan: CI
Differential Revision: D22200640
fbshipit-source-id: 1f4c7bbf54be8c4187c9338fefdf14b501597d98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40396
Removes activation and normalization modules from eager mode QAT.
These were incorrectly added, but we don't actually need them.
Test Plan:
```
python test/test_quantization.py TestQuantizationAwareTraining
```
Imported from OSS
Differential Revision: D22169768
fbshipit-source-id: b5bd753dafe92e90e226fb773eb18c6aae179703