pytorch/cmake
Nikita Shulga a1f854f270 [MPS] Compile kernels into Metallib (#138636)
PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using [`- id<MTLLibrary>newLibraryWithSource:options:error:`](https://developer.apple.com/documentation/metal/mtldevice/1433431-newlibrarywithsource) (first metal kernel to MPS backend was added in https://github.com/pytorch/pytorch/pull/82307 )
Later on, as number of operator grew, those were refactored into `MetalShaderLibrary` convenience class (see  https://github.com/pytorch/pytorch/pull/125550 )

But as number of kernels keeps growing, it's time to make a next step and properly compile them into `.metalib`

This PR does exactly that by:
 - Moving shader sources into separate .metal files
 - Adds check on whether full Xcode installed or just DeveloperTools
 - If full Xcode is installed, compiles and links shaders into .metallib for Metal-3.0(Available on MacOS 13) and Metal-3.1 standard (available on MacOS 14, can use bfloat) and bundles both using `-sectcreate` linker option and `getsectiondata` API call. `metallib_dummy.cpp` file is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt.
 - If only DeveloperTools CLI is installed, automatically wraps .metal into `_metallib.h` that contains shader source wrapped in `MetalShaderLibrary`

Bulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in `aten/src/ATen/native/mps/operators` folder, corresponding `.metal` file is created in `aten/src/ATen/native/mps/kernels` folder and embedded shader definition is replaced with the following
```cpp
#ifndef PYTORCH_JIT_COMPILE_SHADERS
static auto& lib = MetalShaderLibrary::getBundledLibrary();
#else
#include <ATen/native/mps/OpName_metallib.h>
#endif
```

Some historical stats:
| PyTorch Version  | Number of shaders in MPS | Ops added |
| ------------- | ------------- | ---- |
| 1.12  | 0  | |
| 1.13  | 2  | bitwise_ops and  index.out |
| 2.0  | 4  | cross repeat and view)  |
| 2.1  | 9   | unary_ops, histogram, renorm, binary_ops |
| 2.2  | 11   | gamma and bucketization |
| 2.3  | 12  | naive_matmul (to workaround crash) |
| 2.4 | 13 | quantized_mm |
| 2.5 | 14 | fused_adam |

Pros:
  - Better code structure/readability
  - Eventually allows one to use shared headers (and implement something like `TensorIterator`)
  - Faster runtime (as compilation is done ahead of time) and perhaps better optimized compiled kernels

Cons:
  - Build process is a bit more complicated that it used to be
  - Need to maintain two codepath (as our CI builders only has DeveloperTools installed)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138636
Approved by: https://github.com/manuelcandales
2024-11-01 21:47:20 +00:00
..
External Remove legacy Caffe2 pthreadpool from CMake (#134936) 2024-10-17 05:22:08 +00:00
Modules Enable Windows Arm64 (#133088) 2024-10-24 16:10:44 +00:00
Modules_CUDA_fix Allow building for sm90a (#125523) 2024-05-06 20:03:12 +00:00
public Add USE_SYSTEM_NVTX option (#138287) 2024-10-19 04:26:01 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in [Submodule] Remove third-party onnx-tensorrt (#126542) 2024-05-19 22:34:24 +00:00
CheckAbi.cmake
cmake_uninstall.cmake.in
Codegen.cmake Extending the Pytorch vec backend for SVE (ARM) (#119571) 2024-09-18 18:59:10 +00:00
DebugHelper.cmake
Dependencies.cmake Enable Windows Arm64 (#133088) 2024-10-24 16:10:44 +00:00
FlatBuffers.cmake
GoogleTestPatch.cmake
IncludeSource.cpp.in
iOS.cmake [executorch] Update iOS toolchain with a modern cmake syntax. (#115799) 2023-12-15 00:51:30 +00:00
Metal.cmake [MPS] Compile kernels into Metallib (#138636) 2024-11-01 21:47:20 +00:00
MiscCheck.cmake Add SVE implementation of embedding_lookup_idx (#133995) 2024-10-15 18:52:44 +00:00
prioritized_text.txt [Build] Add linker script optimization (#121975) 2024-04-09 20:22:25 +00:00
ProtoBuf.cmake
ProtoBufPatch.cmake
Summary.cmake [MPS] Compile kernels into Metallib (#138636) 2024-11-01 21:47:20 +00:00
TorchConfig.cmake.in Remove legacy Caffe2 pthreadpool from CMake (#134936) 2024-10-17 05:22:08 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake [BE][CMake] Use FindPython module (#124613) 2024-05-29 13:17:35 +00:00
VulkanDependencies.cmake