pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	1544c37520	[7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495 Approved by: https://github.com/malfet	2023-12-19 02:14:30 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
Weiwen Xia	060f1b822a	Add onednn quant backend (#74137 ) Summary: Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820 jerryzh168 Please review. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425 (cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)	2022-03-15 01:28:21 +00:00
Jerry Zhang	5a897536f3	Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend Test Plan: revert-hammer Differential Revision: D33716039 (`989b24855e`) Original commit changeset: 6f7bb807e857 Original Phabricator Diff: D33716039 (`989b24855e`) fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0 (cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)	2022-03-11 22:06:25 +00:00
Xia Weiwen	989b24855e	Add ONEDNN quantization backend (#69820 ) Summary: This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI. ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK. To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models. ```python torch.backends.quantized.engine = 'onednn' ``` ## Design docs https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983 https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096 ## File changes Add ONEDNN to qengine list - aten/src/ATen/Context.cpp - c10/core/QEngine.h - torch/ao/quantization/qconfig.py - torch/backends/quantized/\_\_init\_\_.py Implement qconv & qlinear for ONEDNN backend - aten/src/ATen/native/quantized/cpu/conv_serialization.h - aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp - aten/src/ATen/native/quantized/cpu/onednn_utils.h - aten/src/ATen/native/quantized/cpu/qconv.cpp - aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp - aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp - aten/src/ATen/native/quantized/cpu/qlinear.cpp - aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp - aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp Skip tests that are not supported by ONEDNN - test/ao/sparsity/test_kernels.py - test/quantization/core/test_quantized_module.py - test/quantization/core/test_quantized_op.py ## Validation results This PR has passed `test_quantization.py` and `test_mkldnn.py`. Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform: (Note: Tested with single instance on single core. Using the latest oneDNN library.) Table 1. Performance comparison of int8 2d convolution operator \|No.\| Shape\| FBGEMM\| ONEDNN\| Gain\| \|-\|-\|-\|-\|-\| \|1\| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 668.310us\| 535.630us\| 24.8%\| \|2\| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 290.630us\| 281.810us\| 3.1%\| \|3\| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.045ms\| 893.010us\| 17.0%\| \|4\| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 385.320us\| 373.720us\| 3.1%\| \|5\| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.876ms\| 1.641ms\| 14.3%\| \|6\| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 660.460us\| 638.470us\| 3.4%\| Table 2. Performance comparison of int8 linear operator \|No.\| Shape (m, n, k)\| FBGEMM\| ONEDNN\| Gap\| \|-\|-\|-\|-\|-\| \|1\| 64, 800, 320\| 80.550us\| 96.770us\| 20.10%\| \|2\| 64, 768, 512\| 101.230us\| 130.720us\| 29.10%\| \|3\| 16, 256, 512\| 30.230us\| 51.450us\| 70.20%\| \|4\| 128, 128, 128\| 33.810us\| 50.480us\| 49.30%\| \|5\| 256, 512, 256\| 154.490us\| 195.050us\| 26.30%\| \|6\| 1024, 1024, 1024\| 3.134ms\| 3.514ms\| 12.10%\| ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820 Reviewed By: HDCharles Differential Revision: D33716039 Pulled By: jerryzh168 fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd (cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)	2022-03-11 20:31:49 +00:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00

12 Commits