mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
89 lines
3.8 KiB
ReStructuredText
89 lines
3.8 KiB
ReStructuredText
Multiprocessing package - torch.multiprocessing
|
|
===============================================
|
|
|
|
.. automodule:: torch.multiprocessing
|
|
.. currentmodule:: torch.multiprocessing
|
|
|
|
.. warning::
|
|
|
|
If the main process exits abruptly (e.g. because of an incoming signal),
|
|
Python's ``multiprocessing`` sometimes fails to clean up its children.
|
|
It's a known caveat, so if you're seeing any resource leaks after
|
|
interrupting the interpreter, it probably means that this has just happened
|
|
to you.
|
|
|
|
Strategy management
|
|
-------------------
|
|
|
|
.. autofunction:: get_all_sharing_strategies
|
|
.. autofunction:: get_sharing_strategy
|
|
.. autofunction:: set_sharing_strategy
|
|
|
|
Sharing CUDA tensors
|
|
--------------------
|
|
|
|
Sharing CUDA tensors between processes is supported only in Python 3, using
|
|
a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
|
|
Python 2 can only create subprocesses using ``fork``, and it's not supported
|
|
by the CUDA runtime.
|
|
|
|
.. warning::
|
|
|
|
CUDA API requires that the allocation exported to other processes remains
|
|
valid as long as it's used by them. You should be careful and ensure that
|
|
CUDA tensors you shared don't go out of scope as long as it's necessary.
|
|
This shouldn't be a problem for sharing model parameters, but passing other
|
|
kinds of data should be done with care. Note that this restriction doesn't
|
|
apply to shared CPU memory.
|
|
|
|
|
|
Sharing strategies
|
|
------------------
|
|
|
|
This section provides a brief overview into how different sharing strategies
|
|
work. Note that it applies only to CPU tensor - CUDA tensors will always use
|
|
the CUDA API, as that's the only way they can be shared.
|
|
|
|
File descriptor - ``file_descriptor``
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
.. note::
|
|
|
|
This is the default strategy (except for macOS and OS X where it's not
|
|
supported).
|
|
|
|
This strategy will use file descriptors as shared memory handles. Whenever a
|
|
storage is moved to shared memory, a file descriptor obtained from ``shm_open``
|
|
is cached with the object, and when it's going to be sent to other processes,
|
|
the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
|
|
receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
|
|
view onto the storage data.
|
|
|
|
Note that if there will be a lot of tensors shared, this strategy will keep a
|
|
large number of file descriptors open most of the time. If your system has low
|
|
limits for the number of open file descriptors, and you can't rise them, you
|
|
should use the ``file_system`` strategy.
|
|
|
|
File system - ``file_system``
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This strategy will use file names given to ``shm_open`` to identify the shared
|
|
memory regions. This has a benefit of not requiring the implementation to cache
|
|
the file descriptors obtained from it, but at the same time is prone to shared
|
|
memory leaks. The file can't be deleted right after its creation, because other
|
|
processes need to access it to open their views. If the processes fatally
|
|
crash, or are killed, and don't call the storage destructors, the files will
|
|
remain in the system. This is very serious, because they keep using up the
|
|
memory until the system is restarted, or they're freed manually.
|
|
|
|
To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
|
|
will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
|
|
the current process group, and will keep track of all shared memory allocations.
|
|
Once all processes connected to it exit, it will wait a moment to ensure there
|
|
will be no new connections, and will iterate over all shared memory files
|
|
allocated by the group. If it finds that any of them still exist, they will be
|
|
deallocated. We've tested this method and it prooved to be robust to various
|
|
failures. Still, if your system has high enough limits, and ``file_descriptor``
|
|
is a supported strategy, we do not recommend switching to this one.
|