pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Xia, Weiwen c2185dc4a5 [Quant][CPU] Enable fp8 qlinear (#155678 ) Summary Enable fp8 qlinear on CPU. It's part of the plan to enable fp8 static quantization on CPU. This PR only adds FP8 support of the existing int8 qlinear op. It does not add a new op nor does it affect frontend or quantization flow. The schema of the qlinear op is not changed either. So, the FP8 qlinear shares the same op as INT8 qlinear and the difference is that src/wei dtype is fp8 instead of int8. The output dtype can be fp8/float32/bfloat16. The implementation uses the oneDNN library. The differences of qlinear from `_scaled_mm` are that - Qlinear supports post op fusion while `_scaled_mm` does not - Weights are prepacked for qlinear Test plan ``` pytest test/quantization/core/test_quantized_op.py -k "qlinear and fp8" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/155678 Approved by: https://github.com/leslie-fang-intel, https://github.com/jerryzh168		2025-06-25 10:01:08 +00:00
..
ao_migration	Add __main__ guards to quantization tests (#154728 )	2025-06-10 19:46:07 +00:00
bc	Add __main__ guards to quantization tests (#154728 )	2025-06-10 19:46:07 +00:00
core	[Quant][CPU] Enable fp8 qlinear (#155678 )	2025-06-25 10:01:08 +00:00
eager	Add __main__ guards to quantization tests (#154728 )	2025-06-10 19:46:07 +00:00
fx	Add __main__ guards to quantization tests (#154728 )	2025-06-10 19:46:07 +00:00
jit	Add __main__ guards to quantization tests (#154728 )	2025-06-10 19:46:07 +00:00
pt2e	Typo fixes for "overridden" in comments and function names (#155944 )	2025-06-14 03:37:38 +00:00
serialized
__init__.py