pytorch/cmake
Xinya Zhang a37e22de70 Add Flash Attention support on ROCM (#121561)
This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton)

- [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`).
    * MI300X is supported. More architectures will be added once Triton support them.
- [x] Only supports power of two sequence lengths.
    * Now it support arbitrary sequence length
- [ ] No support for varlen APIs.
    * varlen API will be supported in the next release of AOTriton
- [x] Only support head dimension 16,32,64,128.
    * Now it support arbitrary head dimension <= 256
- [x] Performance is still being optimized.
    * Kernel is selected according to autotune information from Triton.

Other improvements from AOTriton include
* Allow more flexible Tensor storage layout
* More flexible API

This is a more extensive fix to #112997

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561
Approved by: https://github.com/malfet, https://github.com/atalman
2024-03-12 01:16:53 +00:00
..
External Add Flash Attention support on ROCM (#121561) 2024-03-12 01:16:53 +00:00
Modules enable mkl_gemm_f16f16f32 in cpublas::gemm (#118367) 2024-01-31 18:37:42 +00:00
Modules_CUDA_fix fix CMake FindCUDA module for cross-compiling (#121590) 2024-03-11 20:09:52 +00:00
public [cuDNN] Cleanup cuDNN < 8.1 ifdefs (#120862) 2024-03-07 01:46:25 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in [2/4] Intel GPU Runtime Upstreaming for Device (#116833) 2024-01-18 05:02:42 +00:00
CheckAbi.cmake remove abi uncertainty and potential abi conflict (#94306) 2023-02-09 09:54:04 +00:00
cmake_uninstall.cmake.in
Codegen.cmake [Cmake] Check that gcc-9.4 or newer is used (#112858) 2023-11-06 17:19:53 +00:00
DebugHelper.cmake
Dependencies.cmake Add Flash Attention support on ROCM (#121561) 2024-03-12 01:16:53 +00:00
FlatBuffers.cmake
GoogleTestPatch.cmake Simplify cmake code (#91546) 2023-02-08 01:05:19 +00:00
IncludeSource.cpp.in
iOS.cmake [executorch] Update iOS toolchain with a modern cmake syntax. (#115799) 2023-12-15 00:51:30 +00:00
Metal.cmake [CI] Compile on M1 natively (#95719) 2023-03-01 04:20:42 +00:00
MiscCheck.cmake [BE] Cleanup CMake flag suppressions (#97584) 2023-03-27 18:46:09 +00:00
ProtoBuf.cmake [BE] Cleanup CMake flag suppressions (#97584) 2023-03-27 18:46:09 +00:00
ProtoBufPatch.cmake Migrate PyTorch to C++17 (#85969) 2022-12-08 02:27:48 +00:00
Summary.cmake [1/4] Intel GPU Runtime Upstreaming for Device (#116019) 2024-01-12 07:36:25 +00:00
TorchConfig.cmake.in Revert "[Reland2] Update NVTX to NVTX3 (#109843)" 2023-12-05 16:10:20 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake [pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation (#115948) 2023-12-20 05:47:33 +00:00
VulkanDependencies.cmake [Vulkan] Remove GLSL Code Gen (#91912) 2023-01-10 20:29:47 +00:00