pytorch/docs/source/multiprocessing.rst

Multiprocessing package - torch.multiprocessing
===============================================

.. automodule:: torch.multiprocessing
.. currentmodule:: torch.multiprocessing

.. warning::

    If the main process exits abruptly (e.g. because of an incoming signal),
    Python's ``multiprocessing`` sometimes fails to clean up its children.
    It's a known caveat, so if you're seeing any resource leaks after
    interrupting the interpreter, it probably means that this has just happened
    to you.

Strategy management
-------------------

.. autofunction:: get_all_sharing_strategies
.. autofunction:: get_sharing_strategy
.. autofunction:: set_sharing_strategy

Sharing CUDA tensors
--------------------

Sharing CUDA tensors between processes is supported only in Python 3, using
a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
Python 2 can only create subprocesses using ``fork``, and it's not supported
by the CUDA runtime.

.. warning::

    CUDA API requires that the allocation exported to other processes remains
    valid as long as it's used by them. You should be careful and ensure that
    CUDA tensors you shared don't go out of scope as long as it's necessary.
    This shouldn't be a problem for sharing model parameters, but passing other
    kinds of data should be done with care. Note that this restriction doesn't
    apply to shared CPU memory.


Sharing strategies
------------------

This section provides a brief overview into how different sharing strategies
work. Note that it applies only to CPU tensor - CUDA tensors will always use
the CUDA API, as that's the only way they can be shared.

File descriptor - ``file_descriptor``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


.. note::

    This is the default strategy (except for macOS and OS X where it's not
    supported).

This strategy will use file descriptors as shared memory handles. Whenever a
storage is moved to shared memory, a file descriptor obtained from ``shm_open``
is cached with the object, and when it's going to be sent to other processes,
the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
view onto the storage data.

Note that if there will be a lot of tensors shared, this strategy will keep a
large number of file descriptors open most of the time. If your system has low
limits for the number of open file descriptors, and you can't rise them, you
should use the ``file_system`` strategy.

File system - ``file_system``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This strategy will use file names given to ``shm_open`` to identify the shared
memory regions. This has a benefit of not requiring the implementation to cache
the file descriptors obtained from it, but at the same time is prone to shared
memory leaks. The file can't be deleted right after its creation, because other
processes need to access it to open their views. If the processes fatally
crash, or are killed, and don't call the storage destructors, the files will
remain in the system. This is very serious, because they keep using up the
memory until the system is restarted, or they're freed manually.

To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
the current process group, and will keep track of all shared memory allocations.
Once all processes connected to it exit, it will wait a moment to ensure there
will be no new connections, and will iterate over all shared memory files
allocated by the group. If it finds that any of them still exist, they will be
deallocated. We've tested this method and it prooved to be robust to various
failures. Still, if your system has high enough limits, and ``file_descriptor``
is a supported strategy, we do not recommend switching to this one.