pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

PaulZhang12 3ed5f1fb77 [CUDA][cuBLAS] Aten GEMM overload for FP32 output from FP16/BF16 inputs (#150812 ) Enable FP32 output from FP16/BF16 GEMMs in aten with cuBLAS. Accumulation for these GEMMs are generally already done in FP32. Adds the functionality to the following aten operators: * mm * bmm * addmm * baddmm Follow up of customer issue: https://github.com/pytorch/pytorch/issues/146241#issuecomment-2781889390 Differential Revision: [D73126191](https://our.internmc.facebook.com/intern/diff/D73126191) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150812 Approved by: https://github.com/ngimel, https://github.com/eqy		2025-04-18 01:53:26 +00:00
..
aoti_abi_check	[AOTI] Fix complex64 not defined (#132810 )	2024-08-08 18:08:23 +00:00
aoti_inference	[AOTInductor] Add states for constant folding process (#151273 )	2025-04-17 16:41:38 +00:00
api	Set requires grad in TensorMaker::make_tensor() (#148255 )	2025-03-29 08:06:42 +00:00
c10d	[c10d] Add `_allgather_base` , `reduce_scatter` , and `_reduce_scatter_base` into ProcessGroupMPI to enable FSDP with MPI backend (#150162 )	2025-04-14 19:31:38 +00:00
common
dist_autograd	Set RUNPATH so installed tests can find the required shared libraries (#136627 )	2024-10-25 09:38:08 +00:00
jit	[Profiler/Easy] Remove temp flag for on-demand Memory Snapshot (#151068 )	2025-04-11 18:50:25 +00:00
lazy	Introduce cache clearing APIs for the lazy graph executor (#144489 )	2025-01-29 17:38:01 +00:00
lite_interpreter_runtime	Add None return type to init -- tests (#132352 )	2024-08-01 15:44:51 +00:00
monitor
profiler	[codemod] Fix a few unused-variable issues in pytorch (#143517 )	2024-12-19 00:18:08 +00:00
rpc	[rpc] Fix unit test after c10::nullopt removal (#143690 )	2024-12-20 23:36:07 +00:00
tensorexpr	[CUDA][cuBLAS] Aten GEMM overload for FP32 output from FP16/BF16 inputs (#150812 )	2025-04-18 01:53:26 +00:00
__init__.py