pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Xinya Zhang a37e22de70 Add Flash Attention support on ROCM (#121561 ) This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton) - [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`). * MI300X is supported. More architectures will be added once Triton support them. - [x] Only supports power of two sequence lengths. * Now it support arbitrary sequence length - [ ] No support for varlen APIs. * varlen API will be supported in the next release of AOTriton - [x] Only support head dimension 16,32,64,128. * Now it support arbitrary head dimension <= 256 - [x] Performance is still being optimized. * Kernel is selected according to autotune information from Triton. Other improvements from AOTriton include * Allow more flexible Tensor storage layout * More flexible API This is a more extensive fix to #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561 Approved by: https://github.com/malfet, https://github.com/atalman		2024-03-12 01:16:53 +00:00
..
External	Add Flash Attention support on ROCM (#121561 )	2024-03-12 01:16:53 +00:00
Modules	enable mkl_gemm_f16f16f32 in cpublas::gemm (#118367 )	2024-01-31 18:37:42 +00:00
Modules_CUDA_fix	fix CMake FindCUDA module for cross-compiling (#121590 )	2024-03-11 20:09:52 +00:00
public	[cuDNN] Cleanup cuDNN < 8.1 ifdefs (#120862 )	2024-03-07 01:46:25 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in	[2/4] Intel GPU Runtime Upstreaming for Device (#116833 )	2024-01-18 05:02:42 +00:00
CheckAbi.cmake	remove abi uncertainty and potential abi conflict (#94306 )	2023-02-09 09:54:04 +00:00
cmake_uninstall.cmake.in
Codegen.cmake	[Cmake] Check that gcc-9.4 or newer is used (#112858 )	2023-11-06 17:19:53 +00:00
DebugHelper.cmake
Dependencies.cmake	Add Flash Attention support on ROCM (#121561 )	2024-03-12 01:16:53 +00:00
FlatBuffers.cmake
GoogleTestPatch.cmake	Simplify cmake code (#91546 )	2023-02-08 01:05:19 +00:00
IncludeSource.cpp.in
iOS.cmake	[executorch] Update iOS toolchain with a modern cmake syntax. (#115799 )	2023-12-15 00:51:30 +00:00
Metal.cmake	[CI] Compile on M1 natively (#95719 )	2023-03-01 04:20:42 +00:00
MiscCheck.cmake	[BE] Cleanup CMake flag suppressions (#97584 )	2023-03-27 18:46:09 +00:00
ProtoBuf.cmake	[BE] Cleanup CMake flag suppressions (#97584 )	2023-03-27 18:46:09 +00:00
ProtoBufPatch.cmake	Migrate PyTorch to C++17 (#85969 )	2022-12-08 02:27:48 +00:00
Summary.cmake	[1/4] Intel GPU Runtime Upstreaming for Device (#116019 )	2024-01-12 07:36:25 +00:00
TorchConfig.cmake.in	Revert "[Reland2] Update NVTX to NVTX3 (#109843 )"	2023-12-05 16:10:20 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake	[pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation (#115948 )	2023-12-20 05:47:33 +00:00
VulkanDependencies.cmake	[Vulkan] Remove GLSL Code Gen (#91912 )	2023-01-10 20:29:47 +00:00