pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Xia, Weiwen 3a3e2002d8 [Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168		2022-09-29 00:44:40 +00:00
..
_C	Add mechanism to disable the "saved tensors hooks" feature (#85553 )	2022-09-28 22:49:28 +00:00
_C_flatbuffer
_decomp	Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471 )	2022-09-28 23:06:59 +00:00
_dispatch	New calling convention for Python dispatcher (#85133 )	2022-09-16 20:38:21 +00:00
_lazy	Add step closures (#84300 )	2022-09-06 20:55:34 +00:00
_prims	[Modes] remove enable and rewrite mode stack (squashed) (#84774 )	2022-09-27 01:04:35 +00:00
_prims_common	Make Python reference for permute accept varargs (#85460 )	2022-09-28 03:50:42 +00:00
_refs	[primTorch] Add ref for `huber_loss` and error inputs (#85041 )	2022-09-28 19:56:17 +00:00
_subclasses	Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471 )	2022-09-28 23:06:59 +00:00
amp
ao	[Quant] Add unified x86 quant backend (#84329 )	2022-09-29 00:44:40 +00:00
autograd	Add mechanism to disable the "saved tensors hooks" feature (#85553 )	2022-09-28 22:49:28 +00:00
backends	[Quant] Add unified x86 quant backend (#84329 )	2022-09-29 00:44:40 +00:00
contrib
cpu
csrc	Add mechanism to disable the "saved tensors hooks" feature (#85553 )	2022-09-28 22:49:28 +00:00
cuda	removed compile cache and static argnums (#85783 )	2022-09-28 08:33:59 +00:00
distributed	[FSDP] Add `FSDPExtensions` for TP support (#85039 )	2022-09-28 18:34:17 +00:00
distributions	Add __all__ to torch.{fx, distributed, backends} submodules (#85079 )	2022-09-20 12:51:08 +00:00
fft
futures
fx	Augment errors raised in fx.Interpreter with Node info (#85810 )	2022-09-28 16:42:41 +00:00
jit	[JIT] support freezing modules that don't have a forward method (#85779 )	2022-09-28 17:05:01 +00:00
legacy
lib
linalg
masked	[maskedtensor] port torch/_masked into torch/masked (#85515 )	2022-09-26 23:41:13 +00:00
monitor
multiprocessing
nested	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 )	2022-09-28 20:15:02 +00:00
nn	[quant][ao_migration] nn.intrinsic migration to ao (#84842 )	2022-09-28 23:54:29 +00:00
onnx	[ONNX] Deprecate setter functions for global variables (#85165 )	2022-09-28 22:43:43 +00:00
optim	[Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920 )	2022-09-28 02:48:07 +00:00
package	fix typo in torch/package/_mock.py (#84508 )	2022-09-05 16:48:34 +00:00
profiler	add itt unit test and docstrings (#84848 )	2022-09-28 01:39:58 +00:00
quantization
sparse
special	Adding multigammaln ref and fix arange (#85153 )	2022-09-20 17:52:56 +00:00
testing	[Quant] Add unified x86 quant backend (#84329 )	2022-09-29 00:44:40 +00:00
utils	[DataLoader] Replacing `traverse` function with `traverse_datapipes` (#85667 )	2022-09-27 19:58:15 +00:00
__config__.py
__future__.py
__init__.py	[maskedtensor] port torch/_masked into torch/masked (#85515 )	2022-09-26 23:41:13 +00:00
_appdirs.py
_classes.py
_deploy.py
_jit_internal.py
_linalg_utils.py	Remove deprecated torch.lstsq (#70980 )	2022-09-23 00:16:55 +00:00
_lobpcg.py
_lowrank.py
_meta_registrations.py	Registered _like metas (#85793 )	2022-09-28 17:23:07 +00:00
_namedtensor_internals.py
_ops.py	[Modes] remove enable and rewrite mode stack (squashed) (#84774 )	2022-09-27 01:04:35 +00:00
_python_dispatcher.py	[PolishComment] Polish code comment, revelant->relevant (#85238 )	2022-09-19 19:43:14 +00:00
_six.py
_sources.py
_storage_docs.py
_tensor_docs.py	[doc] Add pin_memory and layout to new_{zeros, ones, full} (#85605 )	2022-09-25 22:23:23 +00:00
_tensor_str.py	Fix printing regular tensors inside functorch transforms (#85556 )	2022-09-26 15:35:47 +00:00
_tensor.py	Remove deprecated torch.lstsq (#70980 )	2022-09-23 00:16:55 +00:00
_torch_docs.py	Revert "Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300 )"	2022-09-28 17:04:53 +00:00
_utils_internal.py
_utils.py
_VF.py
_vmap_internals.py
abi-check.cpp
CMakeLists.txt	[CMake] Add functorch target (#83464 )	2022-09-14 00:05:33 +00:00
custom_class_detail.h
custom_class.h
deploy.h
extension.h
functional.py	Add path optimize kwarg to einsum (#84890 )	2022-09-24 03:47:36 +00:00
hub.py
library.h
library.py	Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190 )	2022-09-17 03:05:43 +00:00
overrides.py	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 )	2022-09-28 20:15:02 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py	Add __all__ to torch.utils submodules (#85331 )	2022-09-27 14:45:26 +00:00
script.h
serialization.py
storage.py
torch_version.py
types.py	New calling convention for Python dispatcher (#85133 )	2022-09-16 20:38:21 +00:00

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.