pytorch/test/quantization
Xia, Weiwen c2185dc4a5 [Quant][CPU] Enable fp8 qlinear (#155678)
**Summary**
Enable fp8 qlinear on CPU. It's part of the plan to enable fp8 static quantization on CPU. This PR only adds FP8 support of the existing int8 qlinear op. It does not add a new op nor does it affect frontend or quantization flow. The schema of the qlinear op is not changed either.

So, the FP8 qlinear shares the same op as INT8 qlinear and the difference is that src/wei dtype is fp8 instead of int8. The output dtype can be fp8/float32/bfloat16. The implementation uses the oneDNN library.

The differences of qlinear from `_scaled_mm` are that
- Qlinear supports post op fusion while `_scaled_mm` does not
- Weights are prepacked for qlinear

**Test plan**
```
pytest test/quantization/core/test_quantized_op.py -k "qlinear and fp8"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155678
Approved by: https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2025-06-25 10:01:08 +00:00
..
ao_migration Add __main__ guards to quantization tests (#154728) 2025-06-10 19:46:07 +00:00
bc Add __main__ guards to quantization tests (#154728) 2025-06-10 19:46:07 +00:00
core [Quant][CPU] Enable fp8 qlinear (#155678) 2025-06-25 10:01:08 +00:00
eager Add __main__ guards to quantization tests (#154728) 2025-06-10 19:46:07 +00:00
fx Add __main__ guards to quantization tests (#154728) 2025-06-10 19:46:07 +00:00
jit Add __main__ guards to quantization tests (#154728) 2025-06-10 19:46:07 +00:00
pt2e Typo fixes for "overridden" in comments and function names (#155944) 2025-06-14 03:37:38 +00:00
serialized
__init__.py