Commit Graph

6 Commits

Author SHA1 Message Date
Natalia Gimelshein
37c6087334 Add split-K control to cuBLAS reduced-precision settings (#164766)
## Summary
- add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options
- extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls
- update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations

## Testing
- python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v *(fails: ModuleNotFoundError: No module named 'psutil')*

------
https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164766
Approved by: https://github.com/malfet, https://github.com/albanD
2025-10-08 18:48:45 +00:00
PyTorch MergeBot
9f6e1b8730 Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864)"
This reverts commit 3caddd4daa.

Reverted https://github.com/pytorch/pytorch/pull/154864 on behalf of https://github.com/atalman due to reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154864#issuecomment-3225554119))
2025-08-26 20:03:59 +00:00
Andy Lugo
3caddd4daa [ROCm] SDPA fix mem fault when dropout is enabled (#154864)
Fixes issue that exhibited a device side memory access fault due to incorrect tensor life management

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154864
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-08-21 14:23:13 +00:00
Jeff Daily
9b29166f57 [ROCm] add flag torch.backends.miopen.immediate (#158951)
The MIOpen integration has changed over the years.  In the past, the MIOpen default for benchmark was True and if it were set to False it would use MIOpen Immediate Mode.  But with #145294 the MIOpen benchmark default changed to False and to activate immediate mode you would set the deterministic flag to True.  This has proved too restrictive because benchmark and deterministic flags are independent from immediate mode.  Thus, immediate mode needs its own flag.  Though MIOpen still masquerades behind torch.backends.cudnn and its flags, it seemed inappropriate to add an miopen-exclusive flag to the set of cudnn flags.  This PR adds the first miopen-only flag to control its immediate mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158951
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-07-25 04:01:51 +00:00
Jiang, Yanbing
900fba4c07 Update warning of TF32 (#158209)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158209
Approved by: https://github.com/jansel
2025-07-16 01:28:50 +00:00
Kazuaki Ishizaki
b00b641ff1 [Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762)
Fixes #155013

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155762
Approved by: https://github.com/svekars
2025-06-12 02:55:06 +00:00