Commit Graph

57 Commits

Author SHA1 Message Date
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Vitaly Fedyunin
5653a914f7 Implement reference counting for shared IPC CUDA tensors (#16854)
Summary:
This is to fix #16141 and similar issues.

The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.

ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854

Differential Revision: D13994490

Pulled By: VitalyFedyunin

fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
2019-03-25 10:24:38 -07:00
Edward Yang
ba81074c40 Fix B902 lint error: invalid first argument. (#18181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181
ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* **#18181 Fix B902 lint error: invalid first argument.**
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

A variety of sins were committed:
- Some code was dead
- Some code was actually a staticmethod
- Some code just named it the wrong way
- Some code was purposely testing the omitted case

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530876

fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e
2019-03-21 09:10:28 -07:00
Edward Yang
84c30398c7 Fix lint in test_multiprocessing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18016

Reviewed By: eellison

Differential Revision: D14458177

fbshipit-source-id: f17b3e06223ab399e9ce24be6988effe04dad636
2019-03-14 09:58:13 -07:00
Stefan Krah
5ea6344c54 Skip test_event_handle_multi_gpu() on a single GPU system (#17402)
Summary:
This fixes the following test failure:

```
======================================================================
ERROR: test_event_handle_multi_gpu (__main__.TestMultiprocessing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_multiprocessing.py", line 445, in test_event_handle_multi_gpu
    with torch.cuda.device(d1):
  File "/home/stefan/rel/lib/python3.7/site-packages/torch/cuda/__init__.py", line 229, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /home/stefan/pytorch/torch/csrc/cuda/Module.cpp:33
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17402

Differential Revision: D14195190

Pulled By: soumith

fbshipit-source-id: e911f3782875856de3cfbbd770b6d0411d750279
2019-02-23 08:29:36 -08:00
Shen Li
898329c3f9 Unify device() return type in Stream, Event, and Tensor (#16150)
Summary:
Addresses one future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150

Differential Revision: D13732299

Pulled By: mrshenli

fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18
2019-01-19 23:01:31 -08:00
Shen Li
24f4d3987e Move all Stream and Event Python implementation to C++ (#15937)
Summary:
1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation.
2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++
3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937

Differential Revision: D13649001

Pulled By: mrshenli

fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240
2019-01-17 07:29:22 -08:00
Edward Yang
1989157eb6 Disable test_leaf_variable_sharing on ASAN runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15001

Reviewed By: orionr

Differential Revision: D13399119

fbshipit-source-id: 6b1d098e55a67b1f5bc6d08a8ee3c1be8234a654
2018-12-10 10:43:05 -08:00
Ailing Zhang
be47470c91 Fix cuda multiprocessing cached memory (#14736)
Summary:
This PR fixes #11422

In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's.

This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type.

In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come.

Thanks a ton to ezyang who helped design the solution and debugged the issue!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736

Differential Revision: D13335899

Pulled By: ailzhang

fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371
2018-12-05 10:55:43 -08:00
Tongzhou Wang
8ad69a80e3 Test scripts only run cases defined in the running script (#13250)
Summary:
1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it.
2. Adds an assertion in `load_tests` that each script only runs cases defined in itself.

cc yf225 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250

Differential Revision: D12823734

Pulled By: SsnL

fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b
2018-10-29 13:57:40 -07:00
Zachary DeVito
dae7616078 Shard all of tests based on how many tests exist. (#13160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
2018-10-26 18:20:34 -07:00
James Sun
f4944f0f8a Rename test/common.py to test/common_utils.py (#12794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794

common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.

Reviewed By: orionr

Differential Revision: D10438204

fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
2018-10-17 23:04:29 -07:00
Edward Yang
3bfa7258b3 Don't serialize hooks (#11705)
Summary:
Fixes #11683.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705

Differential Revision: D9833057

Pulled By: ezyang

fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6
2018-10-16 20:11:03 -07:00
Edward Yang
674f7a9778 Correctly share CUDA Parameters. (#10220)
Summary:
```
    Correctly share CUDA Parameters, requires_grad and hooks.

    Previously, the following was true:

    - If you put a Parameter for a CUDA tensor
      in multiprocessing queue (or otherwise tried to transfer it),
      this failed, saying that we cannot pickle CUDA storage.
      This is issue #9996.

    - If you put a leaf Tensor that requires_grad=True through the
      multiprocessing queue, it would come out the other end as
      requires_grad=False (It should have come out the other end
      as requires_grad=True).  Similarly, backwards hooks were
      lost.

    - If you put a non-leaf Tensor that requires_grad=True through
      the multiprocessing queue, it would come out the other end
      as requires_grad=False.

    The root cause for the first issue was that implementation of
    reductions for Parameter used the superclass implementation
    (tensor) in __reduce_ex__, but this always picks up the
    non-ForkingPickler reduction, which doesn't work with CUDA tensors.
    So, we registered a new ForkingPickler specifically for Parameter,
    and adjusted the code to correctly rewrap a Tensor in a Parameter
    if it was originally a parameter.

    While working on this, we realized that requires_grad and backwards
    hooks would not be preserved in the ForkingPickler reduction
    implementation.  We fixed the reducer to save these parameters.
    However, Adam Paszke pointed out that we shouldn't allow sending
    requires_grad=True, non-leaf Tensors over a multiprocessing
    queue, since we don't actually support autograd over process
    boundar.  We now throw an error in this case; this may cause
    previously working code to fail, but this is easy enough to fix;
    just detach() the tensor before sending it.  The error message says
    so.

    Fixes #9996.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220

Differential Revision: D9160746

Pulled By: ezyang

fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c
2018-08-10 13:54:56 -07:00
Edward Yang
e5678794ed Reenable multiprocessing preserve sharing tests on ASAN. (#9498)
Summary:
This issue was fixed in 976f9253a5

Fixes #5311.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9498

Differential Revision: D8875605

Pulled By: ezyang

fbshipit-source-id: 449ffe975d35c959f92874437ba9be37d4d3a1f2
2018-07-17 11:10:21 -07:00
Edward Yang
976f9253a5 Eliminate storage views. (#9466)
Summary:
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary.  The new strategy is described in
Note [CUDA IPC and the caching allocator].

This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.

Fixes #9447.  Fixes #46.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466

Reviewed By: apaszke

Differential Revision: D8859698

Pulled By: ezyang

fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
2018-07-16 15:40:24 -07:00
Will Feng
ff501c30af Turn on UBSAN in the OSS build (#8813)
Summary:
Copy of https://github.com/pytorch/pytorch/pull/8802
Closes https://github.com/pytorch/pytorch/pull/8813

Differential Revision: D8707364

Pulled By: yf225

fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417
2018-07-05 15:55:49 -07:00
Will Feng
90fd4df695 Add flag for disabling tests with multiprocessing spawn start method (#9061)
Summary:
This will resolve some of the timeout issues in CPU and GPU tests internally.
Closes https://github.com/pytorch/pytorch/pull/9061

Reviewed By: ezyang

Differential Revision: D8707471

Pulled By: yf225

fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246
2018-06-30 14:39:11 -07:00
gchanan
a6bfa16c17
torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
Richard Zou
e831ad6204 Fix sharing of empty tensor in multiprocessing (#6229)
Fixes #5719

Previously, the following would error out with an "Invalid file
descriptor" error:
```
import torch
import torch.multiprocessing as mp

q = mp.Queue()
t = torch.tensor([])
q.put(t)
```
on some OSes. The problem was that because one cannot mmap data of size
0, and that an empty tensor has a storage of size 0, the file descriptor
for the storage (referencing shared memory) was not being set. The
multiprocessing sharing code then calls DupFD on that uninitialized file
descriptor, leading to an error.

This PR special cases sharing an empty tensor on the CPU. CUDA does not
have this problem.

Unit tests for both cpu and cuda empty tensors
2018-04-03 11:49:40 -04:00
Edward Z. Yang
40ea24cc54 Skip test_backwards_fork test as flaky. (#5839)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-17 10:40:27 -04:00
Edward Z. Yang
6f9dc115e8 Mark test_fs_sharing as hanging in ASAN. (#5451)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-28 00:15:53 -05:00
Vedanuj Goswami
7f1b3d12e1 Fix ASAN alloc-dealloc-mismatch in TestMultiprocessing (#5428) 2018-02-27 03:14:52 -05:00
Edward Z. Yang
40d79e4447
Turn on ASAN in continuous integration. (#5271)
I know this works because I had to squelch a bunch of ASAN
errors in multiprocessing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-24 17:04:25 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
peterjc123
2dd7039b6b Fix multiprocessing and dataloader tests on Windows (#4453) 2018-01-06 17:41:36 +01:00
Will Feng
1681d07199 Disable tests and fix issues with Windows CUDA build (#4251) 2017-12-20 11:30:21 +01:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
Sam Gross
b79d74aa81 Re-initialize autograd engine in child processes (#4158)
* Re-initialize autograd engine in child processes

The autograd engine uses threads for backwards. These don't exist after
forks and they were not being re-initialized because the
Engine::start_threads_flag was already set. This re-initializes the
engine in child processes, which will cause it to re-create threads when
backwards() is called in the child process.

Note that we only attempt to handle the common case where fork() is
called while the backwards threads are idle.

Fixes #3966

* Avoid non-async-signal-safe functions in fork handler
2017-12-18 01:51:27 -05:00
Scott Stevenson
a9ef76b9c6 Reflect renaming of OS X to macOS (#3795) 2017-11-20 16:52:10 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Soumith Chintala
43eaa28b9f fix empty Tensor mmap 2017-07-14 02:55:05 -04:00
Sam Gross
85a95d8a23 Fix sharing of CUDA tensors on non-current devices
The correct device must be set when getting the base allocation and when
calling cudaIpcCloseMemHandle. Store the device in the allocators
context, which was previously always NULL.

Fixes #1707
2017-06-05 13:01:19 -04:00
Soumith Chintala
ed679fc43c disabling fd leakchecker test (#1593) 2017-05-19 01:20:50 -04:00
Adam Paszke
91c4ba7980 Add torch.arange and deprecate torch.range 2017-04-03 10:38:58 -04:00
Sam Gross
77fbc12f23 Fix some deadlocks when torch_shm_manager is not found (#1030)
- Add additional timeouts to test_multiprocessing to reduce chances of
   hanging indefintely on failure
 - Add missing header guards
 - Fix typo
 - Check that torch_shm_manager exists in torch/__init__.py
2017-03-17 18:28:39 -04:00
Martin Raison
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
Adam Paszke
2822013437 Fix flaky tests 2017-02-14 21:28:50 +01:00
Adam Paszke
63edca44f2 Add tests for non-contiguous inputs and gradients 2017-02-14 21:28:50 +01:00
Sam Gross
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
Adam Paszke
13e34b4679 Fix multiprocessing tests 2017-01-28 01:18:42 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
a1fa995044 Fixes and improvements (#593)
* Fix error in ELU backward

* Add --seed flag for testst st

* Add test for BatchNorm eval

* Fix autograd.backward docs

* Support cc flags in cuDNN search

* Fix IndexSelect backward formula
2017-01-25 22:21:49 -05:00
Sam Gross
0c69fd559a Fix CUDA sharing across processes (#530) 2017-01-20 18:28:39 -05:00
Adam Paszke
95f0fa8a92 Change .grad attribute of Variables to be a Variable 2017-01-16 12:59:47 -05:00
Adam Paszke
0633c08ec9 Add is_shared() method for storages and tensors 2016-12-31 16:25:39 -05:00
Adam Paszke
f908432eb3 Ensure that Variable's grad is shared between processes 2016-12-31 16:25:39 -05:00
Adam Paszke
1bd291c57c Fix multiprocessing tests on macOS 2016-12-31 16:25:39 -05:00
Sam Gross
24af02154c Use ForkingPickler for sharing tensor/storages across processes (#344)
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.

This also exposes:

 - Tensor/Storage.share_memory_()
 - Module.share_memory()

These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.
2016-12-28 20:34:23 -05:00