mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
# Motivation
This PR intends to enable quantized fusion `qconv+add` and `qconv+add+relu` at Intel GPU backend.
At backend level, we register the op via schema `TORCH_SELECTIVE_NAME("onednn::qconv2d_pointwise.binary")` which is the one already defined in `x86InductorQuantzer`
At Inductor level, we have small modification at `torch/_inductor/fx_passes/quantization.py` to allow signed int8 data type(s8) during op lowering. As for the pattern matching, we greatly reuse the code existing at x86InductorQuantizer.
# UT verification
```bash
python test/inductor/test_mkldnn_pattern_matcher.py -v \
-k test_qconv2d_add_xpu \
-k test_qconv2d_add_relu_xpu 2>&1
```
# Runtime exemplification
Following is the oneDNN verbose collected from UT
```bash
onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_s8::blocked:acdb::f0 wei_s8::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_s8::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:1:f32 attr-zero-points:src0:0:s32+dst:0:s32 attr-post-ops:eltwise_linear:1:0.337704+sum:0.0241217+eltwise_relu,alg:convolution_direct,mb1_ic3oc6_ih8oh6kh3sh1dh0ph0_iw8ow6kw3sw1dw0pw0,0.151123
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135189
Approved by: https://github.com/liangan1, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/jerryzh168
ghstack dependencies: #133307
Co-authored-by: guangyey <guangye.yu@intel.com>
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| composable_quantizer.py | ||
| embedding_quantizer.py | ||
| quantizer.py | ||
| utils.py | ||
| x86_inductor_quantizer.py | ||
| xnnpack_quantizer_utils.py | ||
| xnnpack_quantizer.py | ||
| xpu_inductor_quantizer.py | ||