pytorch/docs/source/cuda.tunable.rst
Jin Zhou 5516ac5c21 [ROCm] Tunableop record untuned (#128813)
When enable tunableop, It is easy to have OOM since APP usually needs large video memory size, such as running a LLM for inference.  So we need a offline mode to tune the GEMMs. This PR provide an offline mode for tunableOp:

- record untuned GEMMs to file.

- a python API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128813
Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang, https://github.com/naromero77amd

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-10-09 21:59:03 +00:00

36 lines
929 B
ReStructuredText

.. currentmodule:: torch.cuda.tunable
TunableOp
=========
.. note::
This is a prototype feature, which means it is at an early stage
for feedback and testing, and its components are subject to change.
Overview
--------
.. automodule:: torch.cuda.tunable
API Reference
-------------
.. autofunction:: enable
.. autofunction:: is_enabled
.. autofunction:: tuning_enable
.. autofunction:: tuning_is_enabled
.. autofunction:: record_untuned_enable
.. autofunction:: record_untuned_is_enabled
.. autofunction:: set_max_tuning_duration
.. autofunction:: get_max_tuning_duration
.. autofunction:: set_max_tuning_iterations
.. autofunction:: get_max_tuning_iterations
.. autofunction:: set_filename
.. autofunction:: get_filename
.. autofunction:: get_results
.. autofunction:: get_validators
.. autofunction:: write_file_on_exit
.. autofunction:: write_file
.. autofunction:: read_file
.. autofunction:: tune_gemm_in_file