mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17421 Differential Revision: D14194877 Pulled By: soumith fbshipit-source-id: 6173835d833ce9e9c02ac7bd507cd424a20f2738
165 lines
6.2 KiB
ReStructuredText
165 lines
6.2 KiB
ReStructuredText
Multiprocessing package - torch.multiprocessing
|
|
===============================================
|
|
|
|
.. automodule:: torch.multiprocessing
|
|
.. currentmodule:: torch.multiprocessing
|
|
|
|
.. warning::
|
|
|
|
If the main process exits abruptly (e.g. because of an incoming signal),
|
|
Python's ``multiprocessing`` sometimes fails to clean up its children.
|
|
It's a known caveat, so if you're seeing any resource leaks after
|
|
interrupting the interpreter, it probably means that this has just happened
|
|
to you.
|
|
|
|
Strategy management
|
|
-------------------
|
|
|
|
.. autofunction:: get_all_sharing_strategies
|
|
.. autofunction:: get_sharing_strategy
|
|
.. autofunction:: set_sharing_strategy
|
|
|
|
Sharing CUDA tensors
|
|
--------------------
|
|
|
|
Sharing CUDA tensors between processes is supported only in Python 3, using
|
|
a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
|
|
Python 2 can only create subprocesses using ``fork``, and it's not supported
|
|
by the CUDA runtime.
|
|
|
|
Unlike CPU tensors, the sending process is required to keep the original tensor
|
|
as long as the receiving process retains a copy of the tensor.
|
|
This shouldn't be a problem for sharing model parameters (which stay live
|
|
for the entire execution of the model), but passing other
|
|
kinds of data should be done with care.
|
|
|
|
Here is an example program which handles these requirements correctly:
|
|
|
|
::
|
|
|
|
import torch
|
|
import torch.multiprocessing as mp
|
|
|
|
torch.set_default_tensor_type(torch.cuda.FloatTensor)
|
|
|
|
def sender(q, e):
|
|
for i in range(10):
|
|
s_sample = [torch.zeros(1), torch.ones(1)]
|
|
q.put(s_sample)
|
|
e.wait()
|
|
del s_sample
|
|
e.clear()
|
|
|
|
if __name__ == "__main__":
|
|
ctx = mp.get_context("spawn")
|
|
q = ctx.Queue()
|
|
e = ctx.Event()
|
|
p = ctx.Process(target=sender, args=(q, e))
|
|
p.start()
|
|
|
|
for i in range(10):
|
|
print('=== ITER {} ===".format(i))
|
|
r_sample = q.get()
|
|
del r_sample
|
|
e.set()
|
|
|
|
p.join()
|
|
|
|
In the example above, calling `e.wait()`
|
|
on sender side ensures tensor `s_sample` doesn't get deleted while
|
|
receiver is working on it. The receiver signals when it is done
|
|
with the tensor using `e.set()`, being careful to `del` its reference
|
|
to the received tensor first. It is INSUFFICIENT to promise never to call
|
|
`r_sample` again; while `r_sample` is live, it may be confused with
|
|
any subsequent tensors allocated by the source process at the same address.
|
|
|
|
If a receiver wants to save the data of `r_sample` for future use while
|
|
letting the source process deallocate the original, it must
|
|
`clone()` it.
|
|
|
|
This behavior is very confusing, and we are tracking a fix for it
|
|
at https://github.com/pytorch/pytorch/issues/16141
|
|
|
|
Sharing strategies
|
|
------------------
|
|
|
|
This section provides a brief overview into how different sharing strategies
|
|
work. Note that it applies only to CPU tensor - CUDA tensors will always use
|
|
the CUDA API, as that's the only way they can be shared.
|
|
|
|
File descriptor - ``file_descriptor``
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
.. note::
|
|
|
|
This is the default strategy (except for macOS and OS X where it's not
|
|
supported).
|
|
|
|
This strategy will use file descriptors as shared memory handles. Whenever a
|
|
storage is moved to shared memory, a file descriptor obtained from ``shm_open``
|
|
is cached with the object, and when it's going to be sent to other processes,
|
|
the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
|
|
receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
|
|
view onto the storage data.
|
|
|
|
Note that if there will be a lot of tensors shared, this strategy will keep a
|
|
large number of file descriptors open most of the time. If your system has low
|
|
limits for the number of open file descriptors, and you can't raise them, you
|
|
should use the ``file_system`` strategy.
|
|
|
|
File system - ``file_system``
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
This strategy will use file names given to ``shm_open`` to identify the shared
|
|
memory regions. This has a benefit of not requiring the implementation to cache
|
|
the file descriptors obtained from it, but at the same time is prone to shared
|
|
memory leaks. The file can't be deleted right after its creation, because other
|
|
processes need to access it to open their views. If the processes fatally
|
|
crash, or are killed, and don't call the storage destructors, the files will
|
|
remain in the system. This is very serious, because they keep using up the
|
|
memory until the system is restarted, or they're freed manually.
|
|
|
|
To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
|
|
will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
|
|
the current process group, and will keep track of all shared memory allocations.
|
|
Once all processes connected to it exit, it will wait a moment to ensure there
|
|
will be no new connections, and will iterate over all shared memory files
|
|
allocated by the group. If it finds that any of them still exist, they will be
|
|
deallocated. We've tested this method and it proved to be robust to various
|
|
failures. Still, if your system has high enough limits, and ``file_descriptor``
|
|
is a supported strategy, we do not recommend switching to this one.
|
|
|
|
Spawning subprocesses
|
|
---------------------
|
|
|
|
.. note::
|
|
|
|
Available for Python >= 3.4.
|
|
|
|
This depends on the ``spawn`` start method in Python's
|
|
``multiprocessing`` package.
|
|
|
|
Spawning a number of subprocesses to perform some function can be done
|
|
by creating ``Process`` instances and calling ``join`` to wait for
|
|
their completion. This approach works fine when dealing with a single
|
|
subprocess but presents potential issues when dealing with multiple
|
|
processes.
|
|
|
|
Namely, joining processes sequentially implies they will terminate
|
|
sequentially. If they don't, and the first process does not terminate,
|
|
the process termination will go unnoticed. Also, there are no native
|
|
facilities for error propagation.
|
|
|
|
The ``spawn`` function below addresses these concerns and takes care
|
|
of error propagation, out of order termination, and will actively
|
|
terminate processes upon detecting an error in one of them.
|
|
|
|
.. autofunction:: spawn
|
|
|
|
.. class:: SpawnContext
|
|
|
|
Returned by :func:`~spawn` when called with ``join=False``.
|
|
|
|
.. automethod:: join
|