mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton) - [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`). * MI300X is supported. More architectures will be added once Triton support them. - [x] Only supports power of two sequence lengths. * Now it support arbitrary sequence length - [ ] No support for varlen APIs. * varlen API will be supported in the next release of AOTriton - [x] Only support head dimension 16,32,64,128. * Now it support arbitrary head dimension <= 256 - [x] Performance is still being optimized. * Kernel is selected according to autotune information from Triton. Other improvements from AOTriton include * Allow more flexible Tensor storage layout * More flexible API This is a more extensive fix to #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561 Approved by: https://github.com/malfet, https://github.com/atalman |
||
|---|---|---|
| .. | ||
| External | ||
| Modules | ||
| Modules_CUDA_fix | ||
| public | ||
| Allowlist.cmake | ||
| BuildVariables.cmake | ||
| Caffe2Config.cmake.in | ||
| CheckAbi.cmake | ||
| cmake_uninstall.cmake.in | ||
| Codegen.cmake | ||
| DebugHelper.cmake | ||
| Dependencies.cmake | ||
| FlatBuffers.cmake | ||
| GoogleTestPatch.cmake | ||
| IncludeSource.cpp.in | ||
| iOS.cmake | ||
| Metal.cmake | ||
| MiscCheck.cmake | ||
| ProtoBuf.cmake | ||
| ProtoBufPatch.cmake | ||
| Summary.cmake | ||
| TorchConfig.cmake.in | ||
| TorchConfigVersion.cmake.in | ||
| VulkanCodegen.cmake | ||
| VulkanDependencies.cmake | ||