pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Xinya Zhang e3ca7346ce Re-add initial Flash Attention support on ROCM (#115981 ) Note about the Updates: This PR: 1. skips more flash attention related UTs on MI200 2. Fix additional ATen compiling errors after hipification 3. Fix the author "root" of a specific commit 4. Includes the patch from Nikita in favor of block level static initialization. CAVEAT: This revised PR has a commit that modifies the CI to force its running on MI200 nodes. That specific commit must be reverted before merge. Original PR (https://github.com/pytorch/pytorch/pull/114309) Note: This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - Only supports power of two sequence lengths. - No support for varlen APIs. - Only support head dimension 16,32,64,128. - Performance is still being optimized. Fixes #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115981 Approved by: https://github.com/malfet		2024-01-04 22:21:31 +00:00
..
EigenBLAS.cmake	Build EigenBlas as static library (#44747 )	2020-09-16 10:25:26 -07:00
nccl.cmake	Update submodule NCCL to v2.18.3 (#104993 )	2023-08-18 23:43:01 +00:00
nnpack.cmake	[BC BREAKING] Remove outdated python submodules (#108236 )	2023-09-02 06:24:20 +00:00
oort.cmake	Re-add initial Flash Attention support on ROCM (#115981 )	2024-01-04 22:21:31 +00:00
rccl.cmake	find rccl properly (#42072 )	2020-08-05 21:46:38 -07:00
ucc.cmake	UCC PG build in CI (#81583 )	2022-08-10 00:23:47 +00:00