Xia, Weiwen
6fa84fdea2
[FX][Quant] Enable FX quant for patterns like x.view(x.size(...), ...) ( #90001 )
...
**Summary**
This work continues with https://github.com/pytorch/pytorch/pull/83784 by @vkuzo and includes all the changes in that PR.
Quote from https://github.com/pytorch/pytorch/pull/83784 :
> Issue #83658 reports that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization.
Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization.
> The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way.
**Note**: This PR needs https://github.com/pytorch/pytorch/pull/91297 to be landed first otherwise there is a UT failure.
**Test plan**
```
python test/test_quantization.py -k test_linear_size_view
python test/test_quantization.py -k test_linear_shape_view
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90001
Approved by: https://github.com/jgong5 , https://github.com/jerryzh168
2023-01-27 07:56:29 +00:00
Xia, Weiwen
3a3e2002d8
[Quant] Add unified x86 quant backend ( #84329 )
...
## Description
Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM.
For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888 )
## Validation
**Correctness**
Covered by UT
**Accuracy**
By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend:
[torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx )
**Performance**
Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance.
For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx
With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP.
Models/throughput | fbgemm | x86 | improvement
-- | -- | -- | --
wide_resnet101_2 | 173.5675 | 241.815 | 39.32%
resnext101_32x8d | 174.365 | 339.8175 | 94.89%
resnet50 | 573.155 | 1174.14 | 104.86%
vgg19_bn | 260.335 | 337.92 | 29.80%
vgg19 | 257.935 | 333.265 | 29.21%
inception_v3 | 601.1175 | 1309.33 | 117.82%
densenet161 | 296.645 | 435.5625 | 46.83%
mnasnet1_0 | 1216.7 | 4057.515 | 233.49%
squeezenet1_0 | 1220.085 | 5153.3875 | 322.38%
alexnet | 2294.91 | 2624.6375 | 14.37%
fbnetc_100 | 976.2825 | 3110.1825 | 218.57%
shufflenet_v2_x0_5 | 1555.76 | 3026.125 | 94.51%
spnasnet_100 | 1059.065 | 3502.0975 | 230.68%
pytorch-unet | 192.76 | 246.77 | 28.02%
acgan | 257.32 | 333.7325 | 29.70%
cgan | 7790.6925 | 7803.1025 | 0.16%
sgan | 257.565 | 338.8875 | 31.57%
se_resnet50 | 492.3725 | 916.5175 | 86.14%
vggm | 300.2875 | 316.2075 | 5.30%
Environment:
- PyTorch version: 1.13.0a0+gitcdd625b
- Is debug build: False
- CUDA used to build PyTorch: None
- ROCM used to build PyTorch: N/A
- OS: Ubuntu 20.04.3 LTS (x86_64)
- GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
- Clang version: Could not collect
- CMake version: version 3.22.5
- Libc version: glibc-2.31
- Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime)
- Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31
- Is CUDA available: False
- CUDA runtime version: No CUDA
- GPU models and configuration: No CUDA
- Nvidia driver version: No CUDA
- cuDNN version: No CUDA
- HIP runtime version: N/A
- MIOpen runtime version: N/A
- Is XNNPACK available: True
Versions of relevant libraries:
- [pip3] intel-extension-for-pytorch==1.13.0+cpu
- [pip3] numpy==1.23.3
- [pip3] pytorch-widedeep==0.3.7
- [pip3] torch==1.13.0a0+git48b423b
- [pip3] torchvision==0.14.0a0+ebb68f3
- [conda] blas 1.0 mkl
- [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi
- [conda] mkl 2021.4.0 h06a4308_640
- [conda] mkl-include 2022.1.0 pypi_0 pypi
- [conda] mkl-service 2.4.0 py39h7f8727e_0
- [conda] mkl-static 2022.1.0 pypi_0 pypi
- [conda] mkl_fft 1.3.1 py39hd3c417c_0
- [conda] mkl_random 1.2.2 py39h51133e4_0
- [conda] numpy 1.23.3 pypi_0 pypi
- [conda] numpy-base 1.22.3 py39hf524024_0
- [conda] torch 1.13.0a0+git48b423b pypi_0 pypi
- [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329
Approved by: https://github.com/jerryzh168
2022-09-29 00:44:40 +00:00