mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Enable FP32 output from FP16/BF16 GEMMs in aten with cuBLAS. Accumulation for these GEMMs are generally already done in FP32. Adds the functionality to the following aten operators: * mm * bmm * addmm * baddmm Follow up of customer issue: https://github.com/pytorch/pytorch/issues/146241#issuecomment-2781889390 Differential Revision: [D73126191](https://our.internmc.facebook.com/intern/diff/D73126191) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150812 Approved by: https://github.com/ngimel, https://github.com/eqy |
||
|---|---|---|
| .. | ||
| aoti_abi_check | ||
| aoti_inference | ||
| api | ||
| c10d | ||
| common | ||
| dist_autograd | ||
| jit | ||
| lazy | ||
| lite_interpreter_runtime | ||
| monitor | ||
| profiler | ||
| rpc | ||
| tensorexpr | ||
| __init__.py | ||