mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
When enable tunableop, It is easy to have OOM since APP usually needs large video memory size, such as running a LLM for inference. So we need a offline mode to tune the GEMMs. This PR provide an offline mode for tunableOp: - record untuned GEMMs to file. - a python API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file Pull Request resolved: https://github.com/pytorch/pytorch/pull/128813 Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang, https://github.com/naromero77amd Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
36 lines
929 B
ReStructuredText
36 lines
929 B
ReStructuredText
.. currentmodule:: torch.cuda.tunable
|
|
|
|
TunableOp
|
|
=========
|
|
|
|
.. note::
|
|
This is a prototype feature, which means it is at an early stage
|
|
for feedback and testing, and its components are subject to change.
|
|
|
|
Overview
|
|
--------
|
|
|
|
.. automodule:: torch.cuda.tunable
|
|
|
|
API Reference
|
|
-------------
|
|
|
|
.. autofunction:: enable
|
|
.. autofunction:: is_enabled
|
|
.. autofunction:: tuning_enable
|
|
.. autofunction:: tuning_is_enabled
|
|
.. autofunction:: record_untuned_enable
|
|
.. autofunction:: record_untuned_is_enabled
|
|
.. autofunction:: set_max_tuning_duration
|
|
.. autofunction:: get_max_tuning_duration
|
|
.. autofunction:: set_max_tuning_iterations
|
|
.. autofunction:: get_max_tuning_iterations
|
|
.. autofunction:: set_filename
|
|
.. autofunction:: get_filename
|
|
.. autofunction:: get_results
|
|
.. autofunction:: get_validators
|
|
.. autofunction:: write_file_on_exit
|
|
.. autofunction:: write_file
|
|
.. autofunction:: read_file
|
|
.. autofunction:: tune_gemm_in_file
|