pytorch/torch
Xia, Weiwen 3a3e2002d8 [Quant] Add unified x86 quant backend (#84329)
## Description

Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM.

For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888)

## Validation
**Correctness**
Covered by UT

**Accuracy**
By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend:
[torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx)

**Performance**
Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance.
For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx

With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP.
Models/throughput | fbgemm | x86 | improvement
-- | -- | -- | --
wide_resnet101_2 | 173.5675 | 241.815 | 39.32%
resnext101_32x8d | 174.365 | 339.8175 | 94.89%
resnet50 | 573.155 | 1174.14 | 104.86%
vgg19_bn | 260.335 | 337.92 | 29.80%
vgg19 | 257.935 | 333.265 | 29.21%
inception_v3 | 601.1175 | 1309.33 | 117.82%
densenet161 | 296.645 | 435.5625 | 46.83%
mnasnet1_0 | 1216.7 | 4057.515 | 233.49%
squeezenet1_0 | 1220.085 | 5153.3875 | 322.38%
alexnet | 2294.91 | 2624.6375 | 14.37%
fbnetc_100 | 976.2825 | 3110.1825 | 218.57%
shufflenet_v2_x0_5 | 1555.76 | 3026.125 | 94.51%
spnasnet_100 | 1059.065 | 3502.0975 | 230.68%
pytorch-unet | 192.76 | 246.77 | 28.02%
acgan | 257.32 | 333.7325 | 29.70%
cgan | 7790.6925 | 7803.1025 | 0.16%
sgan | 257.565 | 338.8875 | 31.57%
se_resnet50 | 492.3725 | 916.5175 | 86.14%
vggm | 300.2875 | 316.2075 | 5.30%

Environment:
- PyTorch version: 1.13.0a0+gitcdd625b
- Is debug build: False
- CUDA used to build PyTorch: None
- ROCM used to build PyTorch: N/A
- OS: Ubuntu 20.04.3 LTS (x86_64)
- GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
- Clang version: Could not collect
- CMake version: version 3.22.5
- Libc version: glibc-2.31
- Python version: 3.9.12 (main, Jun  1 2022, 11:38:51)  [GCC 7.5.0] (64-bit runtime)
- Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31
- Is CUDA available: False
- CUDA runtime version: No CUDA
- GPU models and configuration: No CUDA
- Nvidia driver version: No CUDA
- cuDNN version: No CUDA
- HIP runtime version: N/A
- MIOpen runtime version: N/A
- Is XNNPACK available: True

Versions of relevant libraries:
- [pip3] intel-extension-for-pytorch==1.13.0+cpu
- [pip3] numpy==1.23.3
- [pip3] pytorch-widedeep==0.3.7
- [pip3] torch==1.13.0a0+git48b423b
- [pip3] torchvision==0.14.0a0+ebb68f3
- [conda] blas                      1.0                         mkl
- [conda] intel-extension-for-pytorch 1.13.0+cpu               pypi_0    pypi
- [conda] mkl                       2021.4.0           h06a4308_640
- [conda] mkl-include               2022.1.0                 pypi_0    pypi
- [conda] mkl-service               2.4.0            py39h7f8727e_0
- [conda] mkl-static                2022.1.0                 pypi_0    pypi
- [conda] mkl_fft                   1.3.1            py39hd3c417c_0
- [conda] mkl_random                1.2.2            py39h51133e4_0
- [conda] numpy                     1.23.3                   pypi_0    pypi
- [conda] numpy-base                1.22.3           py39hf524024_0
- [conda] torch                     1.13.0a0+git48b423b          pypi_0    pypi
- [conda] torchvision               0.14.0a0+ebb68f3          pypi_0    pypi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329
Approved by: https://github.com/jerryzh168
2022-09-29 00:44:40 +00:00
..
_C Add mechanism to disable the "saved tensors hooks" feature (#85553) 2022-09-28 22:49:28 +00:00
_C_flatbuffer
_decomp Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471) 2022-09-28 23:06:59 +00:00
_dispatch New calling convention for Python dispatcher (#85133) 2022-09-16 20:38:21 +00:00
_lazy Add step closures (#84300) 2022-09-06 20:55:34 +00:00
_prims [Modes] remove enable and rewrite mode stack (squashed) (#84774) 2022-09-27 01:04:35 +00:00
_prims_common Make Python reference for permute accept varargs (#85460) 2022-09-28 03:50:42 +00:00
_refs [primTorch] Add ref for huber_loss and error inputs (#85041) 2022-09-28 19:56:17 +00:00
_subclasses Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471) 2022-09-28 23:06:59 +00:00
amp
ao [Quant] Add unified x86 quant backend (#84329) 2022-09-29 00:44:40 +00:00
autograd Add mechanism to disable the "saved tensors hooks" feature (#85553) 2022-09-28 22:49:28 +00:00
backends [Quant] Add unified x86 quant backend (#84329) 2022-09-29 00:44:40 +00:00
contrib
cpu
csrc Add mechanism to disable the "saved tensors hooks" feature (#85553) 2022-09-28 22:49:28 +00:00
cuda removed compile cache and static argnums (#85783) 2022-09-28 08:33:59 +00:00
distributed [FSDP] Add FSDPExtensions for TP support (#85039) 2022-09-28 18:34:17 +00:00
distributions Add __all__ to torch.{fx, distributed, backends} submodules (#85079) 2022-09-20 12:51:08 +00:00
fft
futures
fx Augment errors raised in fx.Interpreter with Node info (#85810) 2022-09-28 16:42:41 +00:00
jit [JIT] support freezing modules that don't have a forward method (#85779) 2022-09-28 17:05:01 +00:00
legacy
lib
linalg
masked [maskedtensor] port torch/_masked into torch/masked (#85515) 2022-09-26 23:41:13 +00:00
monitor
multiprocessing
nested Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593) 2022-09-28 20:15:02 +00:00
nn [quant][ao_migration] nn.intrinsic migration to ao (#84842) 2022-09-28 23:54:29 +00:00
onnx [ONNX] Deprecate setter functions for global variables (#85165) 2022-09-28 22:43:43 +00:00
optim [Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920) 2022-09-28 02:48:07 +00:00
package fix typo in torch/package/_mock.py (#84508) 2022-09-05 16:48:34 +00:00
profiler add itt unit test and docstrings (#84848) 2022-09-28 01:39:58 +00:00
quantization
sparse
special Adding multigammaln ref and fix arange (#85153) 2022-09-20 17:52:56 +00:00
testing [Quant] Add unified x86 quant backend (#84329) 2022-09-29 00:44:40 +00:00
utils [DataLoader] Replacing traverse function with traverse_datapipes (#85667) 2022-09-27 19:58:15 +00:00
__config__.py
__future__.py
__init__.py [maskedtensor] port torch/_masked into torch/masked (#85515) 2022-09-26 23:41:13 +00:00
_appdirs.py
_classes.py
_deploy.py
_jit_internal.py
_linalg_utils.py Remove deprecated torch.lstsq (#70980) 2022-09-23 00:16:55 +00:00
_lobpcg.py
_lowrank.py
_meta_registrations.py Registered _like metas (#85793) 2022-09-28 17:23:07 +00:00
_namedtensor_internals.py
_ops.py [Modes] remove enable and rewrite mode stack (squashed) (#84774) 2022-09-27 01:04:35 +00:00
_python_dispatcher.py [PolishComment] Polish code comment, revelant->relevant (#85238) 2022-09-19 19:43:14 +00:00
_six.py
_sources.py
_storage_docs.py
_tensor_docs.py [doc] Add pin_memory and layout to new_{zeros, ones, full} (#85605) 2022-09-25 22:23:23 +00:00
_tensor_str.py Fix printing regular tensors inside functorch transforms (#85556) 2022-09-26 15:35:47 +00:00
_tensor.py Remove deprecated torch.lstsq (#70980) 2022-09-23 00:16:55 +00:00
_torch_docs.py Revert "Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)" 2022-09-28 17:04:53 +00:00
_utils_internal.py
_utils.py
_VF.py
_vmap_internals.py
abi-check.cpp
CMakeLists.txt [CMake] Add functorch target (#83464) 2022-09-14 00:05:33 +00:00
custom_class_detail.h
custom_class.h
deploy.h
extension.h
functional.py Add path optimize kwarg to einsum (#84890) 2022-09-24 03:47:36 +00:00
hub.py
library.h
library.py Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190) 2022-09-17 03:05:43 +00:00
overrides.py Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593) 2022-09-28 20:15:02 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py Add __all__ to torch.utils submodules (#85331) 2022-09-27 14:45:26 +00:00
script.h
serialization.py
storage.py
torch_version.py
types.py New calling convention for Python dispatcher (#85133) 2022-09-16 20:38:21 +00:00

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.