pytorch/cmake/External
Xinya Zhang e3ca7346ce Re-add initial Flash Attention support on ROCM (#115981)
Note about the Updates:

This PR:
1. skips more flash attention related UTs on MI200
2. Fix additional ATen compiling errors after hipification
3. Fix the author "root" of a specific commit
4. Includes the patch from Nikita in favor of block level static initialization.

CAVEAT: This revised PR has a commit that modifies the CI to force its running on MI200 nodes. That specific commit must be reverted before merge.

Original PR (https://github.com/pytorch/pytorch/pull/114309) Note:

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project.

Know limitations:

- Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`.
- Only supports power of two sequence lengths.
- No support for varlen APIs.
- Only support head dimension 16,32,64,128.
- Performance is still being optimized.

Fixes #112997

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115981
Approved by: https://github.com/malfet
2024-01-04 22:21:31 +00:00
..
EigenBLAS.cmake Build EigenBlas as static library (#44747) 2020-09-16 10:25:26 -07:00
nccl.cmake Update submodule NCCL to v2.18.3 (#104993) 2023-08-18 23:43:01 +00:00
nnpack.cmake [BC BREAKING] Remove outdated python submodules (#108236) 2023-09-02 06:24:20 +00:00
oort.cmake Re-add initial Flash Attention support on ROCM (#115981) 2024-01-04 22:21:31 +00:00
rccl.cmake find rccl properly (#42072) 2020-08-05 21:46:38 -07:00
ucc.cmake UCC PG build in CI (#81583) 2022-08-10 00:23:47 +00:00