mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe |
||
|---|---|---|
| .. | ||
| amp_examples.rst | ||
| autograd.rst | ||
| broadcasting.rst | ||
| cpu_threading_runtimes.svg | ||
| cpu_threading_torchscript_inference.rst | ||
| cpu_threading_torchscript_inference.svg | ||
| cuda.rst | ||
| ddp.rst | ||
| extending.rst | ||
| faq.rst | ||
| gradcheck.rst | ||
| hip.rst | ||
| large_scale_deployments.rst | ||
| modules.rst | ||
| multiprocessing.rst | ||
| numerical_accuracy.rst | ||
| randomness.rst | ||
| serialization.rst | ||
| windows.rst | ||