Summary:
This PR upgrades oneDNN to v2.5.2, and includes some building support for oneDNN v2.5.2.
v2.4 changes:
- Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved binary primitive performance for cases when one of the tensors is broadcasted.
- Improved performance of reduction primitive, reorder, shuffle primitives.
- Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
- Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
- Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support
v2.5 changes:
- Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16.
- Improved performance of matmul primitive for processors with Intel AVX-512 support.
v2.5.2 changes:
- Fixed performance regression in binary primitive with broadcast
- Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71546
Reviewed By: george-qi
Differential Revision: D33827108
Pulled By: VitalyFedyunin
fbshipit-source-id: 8f5a19b331c82af5b0783f081e061e1034a93952
(cherry picked from commit
|
||
|---|---|---|
| .. | ||
| FindARM.cmake | ||
| FindAtlas.cmake | ||
| FindAVX.cmake | ||
| FindBenchmark.cmake | ||
| FindBLAS.cmake | ||
| FindBLIS.cmake | ||
| FindCUB.cmake | ||
| FindFFmpeg.cmake | ||
| FindFlexiBLAS.cmake | ||
| FindGloo.cmake | ||
| FindHiredis.cmake | ||
| FindLAPACK.cmake | ||
| FindLevelDB.cmake | ||
| FindLMDB.cmake | ||
| FindMAGMA.cmake | ||
| FindMatlabMex.cmake | ||
| FindMKL.cmake | ||
| FindMKLDNN.cmake | ||
| FindNCCL.cmake | ||
| FindNuma.cmake | ||
| FindNumPy.cmake | ||
| FindOpenBLAS.cmake | ||
| FindOpenMP.cmake | ||
| Findpybind11.cmake | ||
| FindRocksDB.cmake | ||
| FindSnappy.cmake | ||
| FindvecLib.cmake | ||
| FindVSX.cmake | ||
| FindZMQ.cmake | ||
| FindZVECTOR.cmake | ||
| README.md | ||
This folder contains various custom cmake modules for finding libraries and packages. Details about some of them are listed below.
FindOpenMP.cmake
This is modified from the file included in CMake 3.13 release, with the following changes:
-
Replace
VERSION_GREATER_EQUALwithNOT ... VERSION_LESSasVERSION_GREATER_EQUALis not supported in CMake 3.5 (our min supported version). -
Update the
separate_argumentscommands to not useNATIVE_COMMANDwhich is not supported in CMake 3.5 (our min supported version). -
Make it respect the
QUIETflag so that, when it is set,try_compilefailures are not reported. -
For
AppleClangcompilers, use-Xpreprocessorinstead of-Xclangas the later is not documented. -
For
AppleClangcompilers, an extra flag option is tried, which is-Xpreprocessor -openmp -I${DIR_OF_omp_h}, where${DIR_OF_omp_h}is a obtained usingfind_pathonomp.hwithbrew's default include directory as a hint. Without this, the compiler will complain about missing headers as they are not natively included in Apple's LLVM. -
For non-GNU compilers, whenever we try a candidate OpenMP flag, first try it with directly linking MKL's
libompif it has one. Otherwise, we may end up linking twolibomps and end up with this nasty error:OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/See NOTE [ Linking both MKL and OpenMP ] for details.