pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Xinya Zhang 67742128b7 [ROCm] Bump AOTriton to 0.9.2b (#148433 ) Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.9b: * Optimize these Non-power-of-two head dimensions: 48, 80, 96, 160, 192, 224. Inputs with these head dimensions do not need padding to power-of-two anymore. * `is_causal=True` cases are now supported with persistent dynamic algorithm, which requires an atomic tensor but does load balance between different CTAs * `dropout_p > 0.0` cases now support full 64-bit offsets and use all i64x4 PRNG outputs * The precise AOTriton shared library version can now be identified with `readelf -p .comment libaotriton_v2.so` + However, this does not guarantee the GPU images stored under `aotriton.images` have the same version, since they can be overwritten. * The newly added fused backward kernel will be used for smaller workloads, due to less kernel invocation overhead. * Support gfx1201 (RX 9070XT). Need to be enabled at runtime with `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148433 Approved by: https://github.com/jeffdaily		2025-03-07 22:10:07 +00:00
..
aotriton.cmake	[ROCm] Bump AOTriton to 0.9.2b (#148433 )	2025-03-07 22:10:07 +00:00
EigenBLAS.cmake
nccl.cmake	Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073 )	2025-02-19 03:52:26 +00:00
nnpack.cmake	Remove legacy Caffe2 pthreadpool from CMake (#134936 )	2024-10-17 05:22:08 +00:00
rccl.cmake	[ROCm] LoadHIP CMake cleanup (#137112 )	2024-10-13 00:06:41 +00:00
ucc.cmake