Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41969
In this diff, the `_LearnableFakeQuantize` module is extended to provide support for gradient scaling where the gradients for both scale and zero point are multiplied by a constant `g` (in some cases, can help with quicker convergence). In addition, it is also augmented to provide a factory method via `_with_args` such that a partial constructor of the module can be built.
Test Plan:
For correctness of the fake quantizer operators, on a devvm, enter the following command:
```
buck test //caffe2/torch:quantization -- learnable_py_module
```
Reviewed By: z-a-f
Differential Revision: D22715629
fbshipit-source-id: ff8e5764f81ca7264bf9333789f57e0b0cec7a72
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42034
In this diff, scale and zero point gradient calculations are updated to correctly reflect the actual backpropagation equation (instead of `dScale * dX`, the near-final output should be `dScale * dY`; the same applies to zero point).
Test Plan:
To execute the unit tests for all affected learnable fake quantize modules and kernels, on a devvm, execute the following command:
`buck test //caffe2/test:quantization -- learnable`
To enable the `cuda` tests, execute the following command:
`buck test mode/dev-nosan //caffe2/test:quantization -- learnable`
Reviewed By: jerryzh168
Differential Revision: D22735668
fbshipit-source-id: 45c1e0fd38cbb2d8d5e60be4711e1e989e9743b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42033
In this diff, the Python `_LearnableFakeQuantize` module is updated where the gradient with respect to the input `x` is actually computed instead of passed through. Argument naming is also updated for better clarity; and unit tests on the `PerTensor` and `PerChannel` operators are added for asserting correctness.
Test Plan:
On a devvm, execute the command:
`buck test //caffe2/test:quantization -- learnable_py_module`
To include `cuda` tests as well, run:
`buck test mode/dev-nosan //caffe2/test:quantization -- learnable_py_module`
Reviewed By: jerryzh168
Differential Revision: D22735580
fbshipit-source-id: 66bea7e9f8cb6422936e653500f917aa597c86de
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41815
**All are minor changes to enable better simulations.**
The constructors of MinMaxObserver, MovingAverageMinMaxObserver, PerChannelMinMaxObserver, and MovingAveragePerChannelMinMaxObserver are augmented so they can utilize the dynamic quantization range support in the _ObserverBase class.
In addition, minor adjustments are made to the enable_static_observation function that allow observer to update parameters but do not fake quantize on the output (for constructing baseline).
Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:
```
buck test //caffe2/test:quantization -- observer
```
Reviewed By: z-a-f
Differential Revision: D22649128
fbshipit-source-id: 32393b706f9b69579dc2f644fb4859924d1f3773
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41535
A generalized fake quantization module is built to support lower-bit fake quantization with back propagation on the scale and zero point. The module supports both per tensor and per channel fake quantization.
Test Plan:
Please see diff D22337313 for a related experiment performed on the fake quantizer module.
The `_LearnableFakeQuantize` module supports the following use cases:
- Per Tensor Fake Quantization or Per Channel Fake Quantization
- Static Estimation from Observers or Quantization Parameter Learning through Back Propagation
By default, the module assumes per tensor affine fake quantization. To switch to per channel, during initialization, declare `channel_size` with the appropriate length. To toggle between utilizing static estimation and parameter learning with back propagation, you can invoke the call `enable_param_learning` or `enable_static_estimate`. For more information on the flags that support these operations, please see the doc string of the `_LearnableFakeQuantize` module.
The `_LearnableFakeQuantizer` module relies on 2 operators for its forward and backward paths: `_LearnableFakeQuantizePerTensorOp` and `_LearnableFakeQuantizePerChannelOp`. The backpropagation routine is developed based on the following literature:
- Learned Step Size Quantization: https://openreview.net/pdf?id=rkgO66VKDS
- Trained Quantization Thresholds: https://arxiv.org/pdf/1903.08066.pdf
Reviewed By: z-a-f
Differential Revision: D22573645
fbshipit-source-id: cfd9ece8a959ae31c00d9beb1acf9dfed71a7ea1