Commit Graph

50 Commits

Author SHA1 Message Date
Aliaksandr Ivanou
3ffd2af8cd Add exception classification to torch.multiprocessing.spawn (#45174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45174

Introduce different types of exceptions that map to different failures
of torch.multiprocessing.spawn. The change introduces three different exception types:
ProcessRaisedException - occurs when the process initiated by spawn raises an exception
ProcessExitedException - occurs when the process initiated by spawn exits
The following logic will allow frameworks that use mp.spawn to categorize failures.
This can be helpful for tracking metrics and enhancing logs.

Test Plan: Imported from OSS

Reviewed By: taohe

Differential Revision: D23889400

Pulled By: tierex

fbshipit-source-id: 8849624c616230a6a81158c52ce0c18beb437330
2020-10-09 12:59:41 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
David Reiss
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
Kiuk Chung
7314f1c281 [torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070

`start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`.

Test Plan:
Warning message looks like:
```
main.py:8: UserWarning: This method only supports start_method=spawn (got: fork).
To use a different start_method use:
         torch.multiprocessing.start_process(...)
  warnings.warn(msg)
```

Reviewed By: ailzhang

Differential Revision: D19780235

fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b
2020-02-07 15:26:00 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
peterjc123
6486bdfb90 Fix os.register_at_fork not defined on Windows (#30809)
Summary:
According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809

Differential Revision: D18828777

Pulled By: bddppq

fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f
2019-12-05 13:36:53 -08:00
Peter Bell
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
Ailing Zhang
a997f224ac Add torch.multiprocessing.create_processes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493

Differential Revision: D18766066

Pulled By: ailzhang

fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89
2019-12-03 10:38:19 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
なるみ
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
Richard Zou
277d442d18 Rename torch.namedtensor -> torch._namedtensor_internals (#26349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26349

The directory holds a lot of private helper functions that help
implement named tensor functionality. Instead of naming each helper
function with a leading underscore, I change the name of the import to
`_namedtensor_internals` to signal it should not be used directly.

Test Plan: - [namedtensor ci]

Differential Revision: D17424178

Pulled By: zou3519

fbshipit-source-id: 8f7b74346765759303480e581038a661021acf53
2019-09-18 05:47:09 -07:00
Richard Zou
2513ca66ca Add guards for using named tensor with serialization and multiprocessing (#25345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25345

Test Plan
- New tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17101486

Pulled By: zou3519

fbshipit-source-id: 58e803b042056ee6abab8551517f74078f2b81d5
2019-08-29 14:10:33 -07:00
SsnL
eb756746ab Fix possible deadlock in SharedCache inside a forked child proc (#25158)
Summary:
Related: https://github.com/pytorch/pytorch/issues/24927#issuecomment-524608021

`fork` inherits lock state. So if we happen to unfortunately fork when the `SharedCache` lock is held. We could deadlock in the child process when some code tries to acquire it.

Following pytorch multiprocessing library design, this patch resets the lock to a new object after fork. A similar example from python core lib for `multiprocessing.Queue` is :

```py
class Queue(object):
    def __init__(self, ...):
        ...
        self._after_fork()
        if sys.platform != 'win32':
            register_after_fork(self, Queue._after_fork)

    def _after_fork(self):
        debug('Queue._after_fork()')
        self._notempty = threading.Condition(threading.Lock())
        self._buffer = collections.deque()
        self._thread = None
        self._jointhread = None
        self._joincancelled = False
        self._closed = False
        self._close = None
        self._send_bytes = self._writer.send_bytes
        self._recv_bytes = self._reader.recv_bytes
        self._poll = self._reader.poll
```

d4d60134b2/Lib/multiprocessing/queues.py (L54-L78)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25158

Differential Revision: D17091227

Pulled By: soumith

fbshipit-source-id: ee7130f47d7bbd42fc34a2598f1f6974d8d7cdb7
2019-08-28 13:34:03 -07:00
SsnL
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
Tongzhou Wang
bc6281028c rebuild_storage_fd retry on EINTR (#21723)
Summary:
Some data loader tests are flaky on py 2 with the following error
```
Jun 12 22:17:31 Traceback (most recent call last):
Jun 12 22:17:31   File "test_dataloader.py", line 798, in test_iterable_dataset
Jun 12 22:17:31     fetched = sorted([d.item() for d in dataloader_iter])
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__
Jun 12 22:17:31     idx, data = self._get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data
Jun 12 22:17:31     success, data = self._try_get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data
Jun 12 22:17:31     data = self.data_queue.get(timeout=timeout)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Jun 12 22:17:31     res = self._recv()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Jun 12 22:17:31     return pickle.loads(buf)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Jun 12 22:17:31     return Unpickler(file).load()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Jun 12 22:17:31     dispatch[key](self)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Jun 12 22:17:31     value = func(*args)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Jun 12 22:17:31     fd = multiprocessing.reduction.rebuild_handle(df)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
Jun 12 22:17:31     new_handle = recv_handle(conn)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
Jun 12 22:17:31     return _multiprocessing.recvfd(conn.fileno())
Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call
```

Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174
So we should call it with an outer try-catch loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723

Differential Revision: D15806247

Pulled By: ezyang

fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7
2019-06-14 09:10:00 -07:00
Soumith Chintala
2e029db2f9 fixes multiprocessing serialization for integer nn.Parameter (#18639)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639

Differential Revision: D14711565

Pulled By: soumith

fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19
2019-04-01 17:15:42 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Vitaly Fedyunin
5653a914f7 Implement reference counting for shared IPC CUDA tensors (#16854)
Summary:
This is to fix #16141 and similar issues.

The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.

ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854

Differential Revision: D13994490

Pulled By: VitalyFedyunin

fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
2019-03-25 10:24:38 -07:00
hysts
cbefd0323b Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17521

Differential Revision: D14237482

Pulled By: soumith

fbshipit-source-id: 636e0fbe2c667d15fcb649136a65ae64937fa0cb
2019-02-26 20:23:34 -08:00
Shen Li
24f4d3987e Move all Stream and Event Python implementation to C++ (#15937)
Summary:
1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation.
2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++
3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937

Differential Revision: D13649001

Pulled By: mrshenli

fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240
2019-01-17 07:29:22 -08:00
Ailing Zhang
be47470c91 Fix cuda multiprocessing cached memory (#14736)
Summary:
This PR fixes #11422

In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's.

This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type.

In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come.

Thanks a ton to ezyang who helped design the solution and debugged the issue!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736

Differential Revision: D13335899

Pulled By: ailzhang

fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371
2018-12-05 10:55:43 -08:00
Pieter Noordhuis
220ce8046e Binding for prctl(PR_SET_PDEATHSIG) (#14491)
Summary:
If torch.multiprocessing.spawn is used to launch non-daemonic
processes (the default since #14391), the spawned children won't be
automatically terminated when the parent terminates.

On Linux, we can address this by setting PR_SET_PDEATHSIG, which
delivers a configurable signal to child processes when their parent
terminates.

Fixes #14394.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491

Differential Revision: D13270374

Pulled By: pietern

fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c
2018-11-29 20:09:19 -08:00
Teng Li
ffbc3905a1 Fixed torch.multiprocessing.spawn for not being able to spawn like dataloader workers (#14391)
Summary:
Should fix: https://github.com/pytorch/pytorch/issues/14390

Now imagenet example works fine with multiprocessing and more than 1 dataloader worker
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14391

Reviewed By: calebho

Differential Revision: D13209800

Pulled By: teng-li

fbshipit-source-id: e8abc0fb38d4436cf3474dcbba0e28f4290e4d29
2018-11-27 12:37:41 -08:00
Teng Li
778e23606b multiprocessing.spawn python version check (#14039)
Summary:
This will be super helpful to the user
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039

Differential Revision: D13089200

Pulled By: teng-li

fbshipit-source-id: 29e7507bd8fe5a0c58a85c52f976bfca282b4c1b
2018-11-16 18:53:23 -08:00
Pieter Noordhuis
1caa341c68 Add torch.multiprocessing.spawn docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13846

Differential Revision: D13029595

Pulled By: pietern

fbshipit-source-id: b733b00f7070c18535c31801f20e6e717eec7748
2018-11-12 14:39:52 -08:00
Pieter Noordhuis
be424de869 Add torch.multiprocessing.spawn helper (#13518)
Summary:
This helper addresses a common pattern where one spawns N processes to
work on some common task (e.g. parallel preprocessing or multiple
training loops).

A straightforward approach is to use the multiprocessing API directly
and then consecutively call join on the resulting processes.

This pattern breaks down in the face of errors. If one of the
processes terminates with an exception or via some signal, and it is
not the first process that was launched, the join call on the first
process won't be affected. This helper seeks to solve this by waiting
on termination from any of the spawned processes. When any process
terminates with a non-zero exit status, it terminates the remaining
processes, and raises an exception in the parent process. If the
process terminated with an exception, it is propagated to the parent.
If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is
mentioned in the exception as well.

Requires Python >= 3.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518

Reviewed By: orionr

Differential Revision: D12929045

Pulled By: pietern

fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd
2018-11-06 14:08:37 -08:00
Edward Yang
3bfa7258b3 Don't serialize hooks (#11705)
Summary:
Fixes #11683.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705

Differential Revision: D9833057

Pulled By: ezyang

fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6
2018-10-16 20:11:03 -07:00
Sam Gross
0b63d12db6 Don't call into Python during Storage destruction. (#10407)
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.

This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407

Differential Revision: D9272400

Pulled By: colesbury

fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
2018-08-13 11:20:07 -07:00
Edward Yang
674f7a9778 Correctly share CUDA Parameters. (#10220)
Summary:
```
    Correctly share CUDA Parameters, requires_grad and hooks.

    Previously, the following was true:

    - If you put a Parameter for a CUDA tensor
      in multiprocessing queue (or otherwise tried to transfer it),
      this failed, saying that we cannot pickle CUDA storage.
      This is issue #9996.

    - If you put a leaf Tensor that requires_grad=True through the
      multiprocessing queue, it would come out the other end as
      requires_grad=False (It should have come out the other end
      as requires_grad=True).  Similarly, backwards hooks were
      lost.

    - If you put a non-leaf Tensor that requires_grad=True through
      the multiprocessing queue, it would come out the other end
      as requires_grad=False.

    The root cause for the first issue was that implementation of
    reductions for Parameter used the superclass implementation
    (tensor) in __reduce_ex__, but this always picks up the
    non-ForkingPickler reduction, which doesn't work with CUDA tensors.
    So, we registered a new ForkingPickler specifically for Parameter,
    and adjusted the code to correctly rewrap a Tensor in a Parameter
    if it was originally a parameter.

    While working on this, we realized that requires_grad and backwards
    hooks would not be preserved in the ForkingPickler reduction
    implementation.  We fixed the reducer to save these parameters.
    However, Adam Paszke pointed out that we shouldn't allow sending
    requires_grad=True, non-leaf Tensors over a multiprocessing
    queue, since we don't actually support autograd over process
    boundar.  We now throw an error in this case; this may cause
    previously working code to fail, but this is easy enough to fix;
    just detach() the tensor before sending it.  The error message says
    so.

    Fixes #9996.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220

Differential Revision: D9160746

Pulled By: ezyang

fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c
2018-08-10 13:54:56 -07:00
Edward Yang
976f9253a5 Eliminate storage views. (#9466)
Summary:
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary.  The new strategy is described in
Note [CUDA IPC and the caching allocator].

This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.

Fixes #9447.  Fixes #46.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466

Reviewed By: apaszke

Differential Revision: D8859698

Pulled By: ezyang

fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
2018-07-16 15:40:24 -07:00
Edward Yang
d0d1820814 Add weak pointer and finalizer support directly to THStorage. (#9148)
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
2018-07-10 06:25:33 -07:00
Richard Zou
e831ad6204 Fix sharing of empty tensor in multiprocessing (#6229)
Fixes #5719

Previously, the following would error out with an "Invalid file
descriptor" error:
```
import torch
import torch.multiprocessing as mp

q = mp.Queue()
t = torch.tensor([])
q.put(t)
```
on some OSes. The problem was that because one cannot mmap data of size
0, and that an empty tensor has a storage of size 0, the file descriptor
for the storage (referencing shared memory) was not being set. The
multiprocessing sharing code then calls DupFD on that uninitialized file
descriptor, leading to an error.

This PR special cases sharing an empty tensor on the CPU. CUDA does not
have this problem.

Unit tests for both cpu and cuda empty tensors
2018-04-03 11:49:40 -04:00
peterjc123
77ea2f26d8 Add build support for Python 2.7 using MSVC (#4226) 2017-12-20 15:07:25 +01:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Adam Lerer
e71cf20192 improved serialization (no tar copy) (#713) 2017-02-22 22:24:20 +01:00
Sam Gross
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
Luke Yeager
3ed720079e [pep8] Fix most remaining lint manually 2017-01-28 01:15:51 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Sam Gross
0c69fd559a Fix CUDA sharing across processes (#530) 2017-01-20 18:28:39 -05:00
Adam Paszke
58320d5082 Add multiprocessing docs 2017-01-03 18:31:08 -05:00
Adam Paszke
f908432eb3 Ensure that Variable's grad is shared between processes 2016-12-31 16:25:39 -05:00
Sam Gross
24af02154c Use ForkingPickler for sharing tensor/storages across processes (#344)
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.

This also exposes:

 - Tensor/Storage.share_memory_()
 - Module.share_memory()

These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.
2016-12-28 20:34:23 -05:00
Sam Gross
bb72ccf1a5 Support CUDA IPC in Python 3 (#203)
CUDA IPC only works with Python 3 using the "spawn" start method. You
can select the start method using the get_context method:

 import torch.multiprocessing as mp
 ctx = mp.get_context('spawn')
 queue = ctx.Queue()
 event = ctx.Event()
2016-12-19 20:42:53 -05:00
Sam Gross
551a7c72f3 Fix multiprocess serialization with "spawn" or "forksever" (#198) 2016-11-02 17:44:36 -04:00
Sam Gross
f2d7e94948 Use torch.Size for Tensor sizes and tuple for strides
See issue #20

The torch.Size class is a tuple subclass which distinguishes sizes from
other tuples so that torch.Tensor(size) is interpreted as size instead
of data.
2016-10-28 19:37:09 +02:00
Sam Gross
8a09c45f28 Fix typo 2016-10-18 09:29:19 -07:00
Adam Paszke
8fdec15a55 Codemod to remove camel case method naming 2016-09-20 08:40:28 -07:00
Adam Paszke
e223564a55 Fix multiprocessing on OS X 2016-09-16 18:27:07 -04:00
Adam Paszke
58f507f9e3 Add file descriptor sharing mode to multiprocessing 2016-09-08 11:23:33 -07:00
Adam Paszke
f9d186d33a Add initial version of multiprocessing module 2016-08-31 19:46:08 -07:00