pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Chien-Chin Huang 5b90e85112 [AsyncTP] Fixes AsyncMM (#162040 ) The original implementation set beta to be 1, which cause the out (C) being added to the the output. Thus if the output is not initialized as zero beforehand, the output can be incorrect. Removing the alpha and beta fixes the issue. Thanks @ngimel to figure out the root cause. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162040 Approved by: https://github.com/danielvegamyhre		2025-09-08 10:53:59 +00:00
..
cutlass/gemm/kernel	[BE][11/16] fix typos in torch/ (torch/csrc/distributed/) (#156321 )	2025-06-23 02:57:50 +00:00
AsyncMM.cu	[AsyncTP] Fixes AsyncMM (#162040 )	2025-09-08 10:53:59 +00:00
AsyncMM.cuh
CUDAEventCache.cpp	[cca] [c10d] Refactor CUDAEventCache into separate files (#158616 )	2025-07-19 02:51:28 +00:00
CUDAEventCache.hpp	[cca] [c10d] Refactor CUDAEventCache into separate files (#158616 )	2025-07-19 02:51:28 +00:00
StreamBlock.cpp	Work: block_current_stream API (#156883 )	2025-07-08 23:55:46 +00:00
StreamBlock.cu	[c10d] block_current_stream: correctness fixes (#158757 )	2025-07-21 22:23:44 +00:00
StreamBlock.cuh	[c10d] block_current_stream: correctness fixes (#158757 )	2025-07-21 22:23:44 +00:00
StreamBlock.hpp	Work: block_current_stream API (#156883 )	2025-07-08 23:55:46 +00:00
utils.cpp	[SymmMem] Find NVSHMEM from system installation (#157513 )	2025-07-04 03:34:44 +00:00
utils.hpp