pytorch/cmake/External
Xinya Zhang a37e22de70 Add Flash Attention support on ROCM (#121561)
This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton)

- [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`).
    * MI300X is supported. More architectures will be added once Triton support them.
- [x] Only supports power of two sequence lengths.
    * Now it support arbitrary sequence length
- [ ] No support for varlen APIs.
    * varlen API will be supported in the next release of AOTriton
- [x] Only support head dimension 16,32,64,128.
    * Now it support arbitrary head dimension <= 256
- [x] Performance is still being optimized.
    * Kernel is selected according to autotune information from Triton.

Other improvements from AOTriton include
* Allow more flexible Tensor storage layout
* More flexible API

This is a more extensive fix to #112997

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561
Approved by: https://github.com/malfet, https://github.com/atalman
2024-03-12 01:16:53 +00:00
..
aotriton.cmake Add Flash Attention support on ROCM (#121561) 2024-03-12 01:16:53 +00:00
EigenBLAS.cmake
nccl.cmake Update submodule NCCL to v2.18.3 (#104993) 2023-08-18 23:43:01 +00:00
nnpack.cmake [BC BREAKING] Remove outdated python submodules (#108236) 2023-09-02 06:24:20 +00:00
rccl.cmake
ucc.cmake UCC PG build in CI (#81583) 2022-08-10 00:23:47 +00:00