pytorch/torch/cuda
Nichols A. Romero a99332eb25 [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673)
This PR enhances offline tuning to support multi-GPUs.

High-level description of algorithm:
- Duplicate GEMMs are first eliminated
- GEMMs are distributed to multi-GPUs for tuning
- Results are gathered into a file with `_full` in the filename

Also adding support for GemmAndBias and ScaledGemm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139673
Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang
2024-11-26 19:07:41 +00:00
..
amp [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769) 2024-08-04 10:24:09 +00:00
__init__.py [ROCm] AMDSMI memory usage unification (#139900) 2024-11-21 21:11:39 +00:00
_gpu_trace.py Refactor gpu trace to be device-agnostic (#121794) 2024-03-30 13:04:38 +00:00
_memory_viz.py Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
_sanitizer.py Add proper handling for view and factory function for csan (#138236) 2024-10-18 14:04:18 +00:00
_utils.py
comm.py [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769) 2024-08-04 10:24:09 +00:00
error.py
gds.py [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489) 2024-08-15 17:11:52 +00:00
graphs.py Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
jiterator.py [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769) 2024-08-04 10:24:09 +00:00
memory.py fix: Add type annotation to _record_memory_history (#140545) 2024-11-14 17:44:46 +00:00
nccl.py Flip default value for mypy disallow_untyped_defs [4/11] (#127841) 2024-06-08 18:36:48 +00:00
nvtx.py [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769) 2024-08-04 10:24:09 +00:00
profiler.py [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769) 2024-08-04 10:24:09 +00:00
random.py [BE]: Apply PERF401 autofixes from ruff (#140980) 2024-11-20 17:52:07 +00:00
sparse.py
streams.py Use torch.Stream&torch.Event for Dynamo capature (#134850) 2024-10-02 14:15:33 +00:00
tunable.py [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673) 2024-11-26 19:07:41 +00:00