mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
The original implementation set beta to be 1, which cause the out (C) being added to the the output. Thus if the output is not initialized as zero beforehand, the output can be incorrect. Removing the alpha and beta fixes the issue. Thanks @ngimel to figure out the root cause. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162040 Approved by: https://github.com/danielvegamyhre |
||
|---|---|---|
| .. | ||
| cutlass/gemm/kernel | ||
| AsyncMM.cu | ||
| AsyncMM.cuh | ||
| CUDAEventCache.cpp | ||
| CUDAEventCache.hpp | ||
| StreamBlock.cpp | ||
| StreamBlock.cu | ||
| StreamBlock.cuh | ||
| StreamBlock.hpp | ||
| utils.cpp | ||
| utils.hpp | ||