pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Xinya Zhang a37e22de70 Add Flash Attention support on ROCM (#121561 ) This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton) - [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`). * MI300X is supported. More architectures will be added once Triton support them. - [x] Only supports power of two sequence lengths. * Now it support arbitrary sequence length - [ ] No support for varlen APIs. * varlen API will be supported in the next release of AOTriton - [x] Only support head dimension 16,32,64,128. * Now it support arbitrary head dimension <= 256 - [x] Performance is still being optimized. * Kernel is selected according to autotune information from Triton. Other improvements from AOTriton include * Allow more flexible Tensor storage layout * More flexible API This is a more extensive fix to #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561 Approved by: https://github.com/malfet, https://github.com/atalman		2024-03-12 01:16:53 +00:00
..
aotriton.cmake	Add Flash Attention support on ROCM (#121561 )	2024-03-12 01:16:53 +00:00
EigenBLAS.cmake
nccl.cmake	Update submodule NCCL to v2.18.3 (#104993 )	2023-08-18 23:43:01 +00:00
nnpack.cmake	[BC BREAKING] Remove outdated python submodules (#108236 )	2023-09-02 06:24:20 +00:00
rccl.cmake
ucc.cmake	UCC PG build in CI (#81583 )	2022-08-10 00:23:47 +00:00