Summary:
This is a reland of D74910193.
We change the dtype to torch.float8_e5m2 in unit test since it is not supported.
Test Plan:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization
```
Differential Revision: D75169792
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154057
Approved by: https://github.com/Mingming-Ding
Summary:
Previously, we only support non-scaling quantization, which may lead to overflow, here we support scaling quantization, and set it as the default version.
Here, we quantize activation nodes based on the size_in_mb, the default value is 100, i.e., as long as the node has at least 100MB size, we will quantize it.
Test Plan:
### how to enable
```
torch._inductor.config.post_grad_fusion_options = {
"activation_quantization_aten_pass": {
"quant_type": "torch.float8_e5m2", -> default is this type to quantize, you can change the type
"use_scaling": False, -> default is False, if you want to use scaling verison, set it to True
"size_in_mb": 0.0, -> default is 100, you can tune the value.
"exclude_primals": False, -> whether want to exclude quantize parameters, default is False
"allowed_dtypes": "torch.float16;torch.bfloat16;torch.float32", -> dtype you consider to quant, use ";" to separate, default is torch.bfloat16
},
}
```
### toy model
```
buck2 run mode/opt //scripts/qyz/autoac:quantization
```
```
Epoch [80/200], Loss: 19227.2109
Epoch [100/200], Loss: 1353.5272
Epoch [120/200], Loss: 38630.6758
Epoch [140/200], Loss: 6239.9155
Epoch [160/200], Loss: 6039.1567
Epoch [180/200], Loss: 3994.3569
Epoch [200/200], Loss: 146.3966
```
Differential Revision: D73015996
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151770
Approved by: https://github.com/Mingming-Ding
Summary: We enable the activation quantization in the forward pass, and users can customize the dtype they want to quantize.
Test Plan:
# unit test
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization -- test_activation_quantization_aten
```
Buck UI: https://www.internalfb.com/buck2/776d3911-bb86-4ac8-a527-540cf1510b9d
Test UI: https://www.internalfb.com/intern/testinfra/testrun/4785074873051017
Network: Up: 4.3MiB Down: 42MiB (reSessionID-fef7e727-68b1-4645-a519-5652854df38d)
Executing actions. Remaining 0/4 6.7s exec time total
Command: test. Finished 2 local
Time elapsed: 3:11.5s
Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. Build failure 0
# E2E
### how to enable (you can overrite the dtype, if nothing given, the default is fp8)
```
post_grad_fusion_options={
"activation_quantization_aten_pass": {"quant_type": "torch.float8_e5m2"}
},
```
Differential Revision: D70522237
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148380
Approved by: https://github.com/Mingming-Ding, https://github.com/Hahu803