mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489 Approved by: https://github.com/albanD
196 lines
4.1 KiB
ReStructuredText
196 lines
4.1 KiB
ReStructuredText
torch.cuda
|
|
===================================
|
|
.. automodule:: torch.cuda
|
|
.. currentmodule:: torch.cuda
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
StreamContext
|
|
can_device_access_peer
|
|
current_blas_handle
|
|
current_device
|
|
current_stream
|
|
cudart
|
|
default_stream
|
|
device
|
|
device_count
|
|
device_of
|
|
get_arch_list
|
|
get_device_capability
|
|
get_device_name
|
|
get_device_properties
|
|
get_gencode_flags
|
|
get_sync_debug_mode
|
|
init
|
|
ipc_collect
|
|
is_available
|
|
is_initialized
|
|
memory_usage
|
|
set_device
|
|
set_stream
|
|
set_sync_debug_mode
|
|
stream
|
|
synchronize
|
|
utilization
|
|
temperature
|
|
power_draw
|
|
clock_rate
|
|
OutOfMemoryError
|
|
|
|
Random Number Generator
|
|
-------------------------
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
get_rng_state
|
|
get_rng_state_all
|
|
set_rng_state
|
|
set_rng_state_all
|
|
manual_seed
|
|
manual_seed_all
|
|
seed
|
|
seed_all
|
|
initial_seed
|
|
|
|
|
|
Communication collectives
|
|
-------------------------
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
comm.broadcast
|
|
comm.broadcast_coalesced
|
|
comm.reduce_add
|
|
comm.scatter
|
|
comm.gather
|
|
|
|
Streams and events
|
|
------------------
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
Stream
|
|
ExternalStream
|
|
Event
|
|
|
|
Graphs (beta)
|
|
-------------
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
is_current_stream_capturing
|
|
graph_pool_handle
|
|
CUDAGraph
|
|
graph
|
|
make_graphed_callables
|
|
|
|
.. _cuda-memory-management-api:
|
|
|
|
Memory management
|
|
-----------------
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
empty_cache
|
|
list_gpu_processes
|
|
mem_get_info
|
|
memory_stats
|
|
memory_summary
|
|
memory_snapshot
|
|
memory_allocated
|
|
max_memory_allocated
|
|
reset_max_memory_allocated
|
|
memory_reserved
|
|
max_memory_reserved
|
|
set_per_process_memory_fraction
|
|
memory_cached
|
|
max_memory_cached
|
|
reset_max_memory_cached
|
|
reset_peak_memory_stats
|
|
caching_allocator_alloc
|
|
caching_allocator_delete
|
|
get_allocator_backend
|
|
CUDAPluggableAllocator
|
|
change_current_allocator
|
|
MemPool
|
|
MemPoolContext
|
|
.. FIXME The following doesn't seem to exist. Is it supposed to?
|
|
https://github.com/pytorch/pytorch/issues/27785
|
|
.. autofunction:: reset_max_memory_reserved
|
|
|
|
NVIDIA Tools Extension (NVTX)
|
|
-----------------------------
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
nvtx.mark
|
|
nvtx.range_push
|
|
nvtx.range_pop
|
|
nvtx.range
|
|
|
|
Jiterator (beta)
|
|
-----------------------------
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
|
|
jiterator._create_jit_fn
|
|
jiterator._create_multi_output_jit_fn
|
|
|
|
TunableOp
|
|
---------
|
|
|
|
Some operations could be implemented using more than one library or more than
|
|
one technique. For example, a GEMM could be implemented for CUDA or ROCm using
|
|
either the cublas/cublasLt libraries or hipblas/hipblasLt libraries,
|
|
respectively. How does one know which implementation is the fastest and should
|
|
be chosen? That's what TunableOp provides. Certain operators have been
|
|
implemented using multiple strategies as Tunable Operators. At runtime, all
|
|
strategies are profiled and the fastest is selected for all subsequent
|
|
operations.
|
|
|
|
See the :doc:`documentation <cuda.tunable>` for information on how to use it.
|
|
|
|
.. toctree::
|
|
:hidden:
|
|
|
|
cuda.tunable
|
|
|
|
|
|
Stream Sanitizer (prototype)
|
|
----------------------------
|
|
|
|
CUDA Sanitizer is a prototype tool for detecting synchronization errors between streams in PyTorch.
|
|
See the :doc:`documentation <cuda._sanitizer>` for information on how to use it.
|
|
|
|
.. toctree::
|
|
:hidden:
|
|
|
|
cuda._sanitizer
|
|
|
|
|
|
.. This module needs to be documented. Adding here in the meantime
|
|
.. for tracking purposes
|
|
.. py:module:: torch.cuda.comm
|
|
.. py:module:: torch.cuda.error
|
|
.. py:module:: torch.cuda.gds
|
|
.. py:module:: torch.cuda.graphs
|
|
.. py:module:: torch.cuda.jiterator
|
|
.. py:module:: torch.cuda.memory
|
|
.. py:module:: torch.cuda.nccl
|
|
.. py:module:: torch.cuda.nvtx
|
|
.. py:module:: torch.cuda.profiler
|
|
.. py:module:: torch.cuda.random
|
|
.. py:module:: torch.cuda.sparse
|
|
.. py:module:: torch.cuda.streams
|