Commit Graph

202 Commits

Author SHA1 Message Date
Tongzhou Wang
4563e190c4 Use THC cached CUDA device property when get_device_name and get_device_capability (#6027)
Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python.

Also fixes ATen compile error when it can't find CUDA.

Fixes #4908. Using the script from that issue, we get roughly 18x speed-up.

[ssnl@ ~] python dev.py  # master
0.2826697587966919
0.00034999847412109375
0.0003493785858154297
0.000356292724609375
0.00036025047302246094
0.0003629922866821289
0.00036084651947021484
0.00035686492919921874
0.00036056041717529296
0.0003606319427490234
[ssnl@ ~] python dev.py  # this PR
0.27275662422180175
2.1147727966308594e-05
1.9598007202148438e-05
1.94549560546875e-05
1.9359588623046876e-05
1.938343048095703e-05
2.0074844360351563e-05
1.952648162841797e-05
1.9311904907226562e-05
1.938343048095703e-05
2018-03-30 16:39:22 -04:00
Sam Gross
48a3349c29
Delete dead Tensor code paths (#5417)
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.

This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
2018-02-27 17:58:09 -05:00
Carl Lemaire
6b95ca4eda DataParallel: GPU imbalance warning (#5376) 2018-02-27 21:30:41 +01:00
Sam Gross
406c9f9c28
Remove two uses of the old Tensor class (#5413) 2018-02-26 15:00:51 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
Adam Paszke
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
Adam Paszke
de5f7b725e Base for pure C++ NCCL interface 2018-01-18 11:16:45 +01:00
Tongzhou Wang
5918243b0c Methods for checking CUDA memory usage (#4511)
* gpu mem allocated

* add test

* addressed some of @apaszke 's comments

* cache stats

* add more comments about test
2018-01-09 11:47:48 -05:00
peterjc123
77ea2f26d8 Add build support for Python 2.7 using MSVC (#4226) 2017-12-20 15:07:25 +01:00
Sam Gross
bcfe259f83
Add streams and comms as optional arguments (#3968)
Adds streams and comms as optional arguments to the NCCL calls in
torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for
multi-process mode.

Moves Py_RETURN_NONE statements after the GIL is re-acquired.
2017-12-04 13:51:22 -05:00
Sam Gross
4bce69be22
Implement Variable.storage() (#3765)
This still uses THPStorage, but avoids touching THPTensor
2017-11-20 14:18:07 -05:00
Soumith Chintala
50009144c0
add warnings if device capability is less than ideal (#3601) 2017-11-09 11:48:59 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
SsnL
bb1b826cdc Exposing emptyCache from allocator (#3518)
* Add empty_cache binding

* cuda.empty_cache document

* update docs
2017-11-07 17:00:38 -05:00
Adam Paszke
cc3058bdac Fix macOS build (with CUDA) (#3071) 2017-10-11 19:04:15 +02:00
Soumith Chintala
e9dccb3156 implement all_reduce, broadcast, all_gather, reduce_scatter 2017-10-09 22:24:18 -04:00
Soumith Chintala
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00
Soumith Chintala
b3bc5fe302 refactor THCP method defs into cuda/Module.cpp 2017-09-30 13:14:35 -07:00
Justin Johnson
94b5990201 Add torch.cuda.get_device_name function (#2540) 2017-08-26 15:06:37 -04:00
Adam Paszke
8ab3d214d5 Fixes for DistributedDataParallel (#2168) 2017-07-21 16:00:46 -04:00
Edward Z. Yang
72e9e7abf7 Warning squash.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-21 14:13:11 -04:00
Zach DeVito
9d8cff9bc1 initialize aten and pytorch to share the same THCState 2017-07-11 10:35:03 -04:00
Adam Paszke
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
Trevor Killeen
05bc877a05 make THPPointer have explicit constructors (#1636) 2017-05-25 15:35:54 -04:00
Sam Gross
4c1cdb6148 Refactor Python string utility function 2017-04-28 21:25:26 +02:00
Sam Gross
aab30d4ea2 Fix errors when no CUDA devices are available (#1334)
Fixes #1267

This fixes a number of issues when PyTorch was compiled with CUDA
support but run on a machine without any GPUs. Now, we treat all errors
from cudaGetDeviceCount() as if the machine has no devices.
2017-04-23 14:45:27 +02:00
Sergey Zagoruyko
8dc5d2a22e export current_blas_handle 2017-03-23 23:32:45 +01:00
Sam Gross
b9379cfab7 Use cuDNN and NCCL symbols from _C library (#1017)
This ensures that we use the same library at the C++ level and with
Python ctypes. It moves the searching for the correct library from
run-time to compile-time.
2017-03-16 16:10:17 -04:00
soumith
7ad948ffa9 fix tests to not sys.exit(), also fix fatal error on THC initialization 2017-03-01 17:37:04 -05:00
Sam Gross
fc6fcf23f7 Lock the cudaFree mutex. (#880)
Prevents NCCL calls from overlapping with cudaFree() which can lead to
deadlocks.
2017-03-01 11:29:25 -05:00
Adam Paszke
19a65d2bea Expose stateless methods for torch.cuda.HalfTensor 2017-02-26 20:02:42 +01:00
Sam Gross
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
Zeming Lin
59d66e6963 Sparse Library (#333) 2017-01-05 00:43:41 +01:00
Sam Gross
20fffc8bb7 Fix torch.is_tensor for half tensors (#322)
Fixes #311
2016-12-19 15:27:47 +01:00
Sam Gross
1af9a9637f Refactor copy and release GIL during copy (#286) 2016-12-11 21:54:58 +01:00
Sam Gross
0d7d29fa57 Enable caching allocator for CUDA pinned memory (#275)
Also add binding for CUDA "sleep" kernel
2016-12-02 01:33:56 -05:00
Adam Paszke
ebc70f7919 Look for libcudart in default CUDA installation paths (#195) 2016-11-02 19:36:10 -04:00
Sam Gross
ad5fdef6ac Make every user-visible Tensor have a Storage (#179) 2016-10-31 12:12:22 -04:00
Sam Gross
79ead42ade Add CUDA Stream and Event API (#133) 2016-10-18 12:15:57 -04:00
Sam Gross
8d39fb4094 Use new THC API for device allocator 2016-10-17 09:35:41 -07:00
Sam Gross
ee14cf9438 Add support for pinned memory: (#127)
torch.Storage/Tensor.pin_memory()
 torch.Storage/Tensor.is_pinned()
2016-10-15 18:38:26 -04:00
Sam Gross
c20828478e Update Module.cpp for THC changes 2016-09-30 11:13:14 -07:00
Adam Paszke
3f7ab95890 Finish implementation of prng related functions 2016-09-29 11:33:25 -07:00
Sam Gross
4e9f0a8255 Use CUDA caching allocator 2016-09-26 13:12:39 -07:00
Adam Paszke
06ab3f962f Refactor _C extension to export some utilities 2016-09-21 08:36:54 -07:00
Adam Paszke
3ea1da3b2c Minor fix in CUDA module 2016-09-14 11:09:03 -04:00
soumith
1f2695e875 adding cuda driver check functions for runtime checking 2016-09-13 10:34:13 -07:00
Adam Paszke
8d933cbfc4 Fixes for OS X 2016-08-22 22:45:35 -04:00
Adam Paszke
12bed8dc0d Add CUDA device selection 2016-08-12 07:46:46 -07:00
Adam Paszke
92e983a489 Fixes for Linux and new cutorch 2016-08-02 09:20:18 -07:00
Adam Paszke
c574295012 Various fixes 2016-07-19 10:45:59 -04:00
Adam Paszke
3a44259b32 Add support for CUDA 2016-07-19 10:45:59 -04:00