pytorch/cmake/Modules
sanchitintel 4ee29d6033 [Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5)
Re-landing #68111/#74596

## Description
v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of #50256, the below improvements are included:

 * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used
 * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

 ### User API:
The optimization pass is disabled by default. Users could enable it by:

```
 torch.jit.enable_onednn_fusion(True)
```
`torch.jit.freeze` should be used after tracing (recommended) or scripting a model.

 ### Performance:
 [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:

 * SkyLake 8180 (1 socket of 28 cores):
   ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)
* SkyLake 8180 (single thread):
   ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
   * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
   ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

 ### Directory structure of the integration code
 Fuser-related code is placed under:

 ```
 torch/csrc/jit/codegen/onednn/
 ```

 Optimization pass registration is done in:

 ```
 torch/csrc/jit/passes/onednn_graph_fuser.h
 ```

 CMake for the integration code is in:

 ```
 caffe2/CMakeLists.txt
 cmake/public/mkldnn.cmake
 cmake/Modules/FindMKLDNN.cmake
 ```

 ## Limitations
 * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step.
 * We have only optimized the inference use-case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622
Approved by: https://github.com/eellison
2022-05-05 16:57:03 +00:00
..
FindARM.cmake Make PyTorch partially cross-compilable for Apple M1 (#49701) 2020-12-22 09:33:12 -08:00
FindAtlas.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindAVX.cmake Add AVX512 support in ATen & remove AVX support (#61903) 2021-07-22 08:51:49 -07:00
FindBenchmark.cmake cmake: stop including files from the install directory 2017-09-01 23:33:14 -07:00
FindBLAS.cmake Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831) 2021-11-03 08:53:27 -07:00
FindBLIS.cmake Adding a new include directory in BLIS search path (#58166) 2021-05-24 08:57:02 -07:00
FindCUB.cmake Update CMake and use native CUDA language support (#62445) 2021-10-11 09:05:48 -07:00
FindFFmpeg.cmake Fix compilation error when buildng with FFMPEG (#27589) 2020-02-13 11:23:48 -08:00
FindFlexiBLAS.cmake Add FlexiBLAS build support per #64752 (#64815) 2021-10-28 11:28:00 -07:00
FindGloo.cmake [c10d] NCCL Process Group implementation (#8182) 2018-06-08 10:33:27 -07:00
FindHiredis.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindLAPACK.cmake Add FlexiBLAS build support per #64752 (#64815) 2021-10-28 11:28:00 -07:00
FindLevelDB.cmake Initial building with deps 2016-12-13 09:29:01 -05:00
FindLMDB.cmake Added Ninja generator support on Windows 2017-07-26 00:32:20 -07:00
FindMAGMA.cmake CMake: Clean up unused definitions (#69216) 2022-01-31 22:49:11 +00:00
FindMatlabMex.cmake Initial building with deps 2016-12-13 09:29:01 -05:00
FindMKL.cmake CMake option for using static MKL libraries 2022-03-07 19:32:33 +00:00
FindMKLDNN.cmake [Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) 2022-05-05 16:57:03 +00:00
FindNCCL.cmake Fix NCCL version check when nccl.h in non-standard location. (#40982) 2020-07-17 13:54:17 -07:00
FindNuma.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindNumPy.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindOpenBLAS.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindOpenMP.cmake Allow linking against vcomp on Windows (#54132) 2021-03-19 14:36:07 -07:00
Findpybind11.cmake Convert all tabs to spaces, add CI. (#18959) 2019-04-09 08:12:26 -07:00
FindRocksDB.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindSnappy.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindvecLib.cmake Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
FindVSX.cmake Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
FindZMQ.cmake Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
FindZVECTOR.cmake ibm z14/15 SIMD support (#66407) 2022-01-04 09:40:18 -08:00
README.md Update the cmake build configuration for AppleClang compiler (#15820) 2019-02-04 08:53:47 -08:00

This folder contains various custom cmake modules for finding libraries and packages. Details about some of them are listed below.

FindOpenMP.cmake

This is modified from the file included in CMake 3.13 release, with the following changes:

  • Replace VERSION_GREATER_EQUAL with NOT ... VERSION_LESS as VERSION_GREATER_EQUAL is not supported in CMake 3.5 (our min supported version).

  • Update the separate_arguments commands to not use NATIVE_COMMAND which is not supported in CMake 3.5 (our min supported version).

  • Make it respect the QUIET flag so that, when it is set, try_compile failures are not reported.

  • For AppleClang compilers, use -Xpreprocessor instead of -Xclang as the later is not documented.

  • For AppleClang compilers, an extra flag option is tried, which is -Xpreprocessor -openmp -I${DIR_OF_omp_h}, where ${DIR_OF_omp_h} is a obtained using find_path on omp.h with brew's default include directory as a hint. Without this, the compiler will complain about missing headers as they are not natively included in Apple's LLVM.

  • For non-GNU compilers, whenever we try a candidate OpenMP flag, first try it with directly linking MKL's libomp if it has one. Otherwise, we may end up linking two libomps and end up with this nasty error:

    OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already
    initialized.
    
    OMP: Hint This means that multiple copies of the OpenMP runtime have been
    linked into the program. That is dangerous, since it can degrade performance
    or cause incorrect results. The best thing to do is to ensure that only a
    single OpenMP runtime is linked into the process, e.g. by avoiding static
    linking of the OpenMP runtime in any library. As an unsafe, unsupported,
    undocumented workaround you can set the environment variable
    KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but
    that may cause crashes or silently produce incorrect results. For more
    information, please see http://openmp.llvm.org/
    

    See NOTE [ Linking both MKL and OpenMP ] for details.