pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	4563e190c4	Use THC cached CUDA device property when get_device_name and get_device_capability (#6027 ) Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python. Also fixes ATen compile error when it can't find CUDA. Fixes #4908. Using the script from that issue, we get roughly 18x speed-up. [ssnl@ ~] python dev.py # master 0.2826697587966919 0.00034999847412109375 0.0003493785858154297 0.000356292724609375 0.00036025047302246094 0.0003629922866821289 0.00036084651947021484 0.00035686492919921874 0.00036056041717529296 0.0003606319427490234 [ssnl@ ~] python dev.py # this PR 0.27275662422180175 2.1147727966308594e-05 1.9598007202148438e-05 1.94549560546875e-05 1.9359588623046876e-05 1.938343048095703e-05 2.0074844360351563e-05 1.952648162841797e-05 1.9311904907226562e-05 1.938343048095703e-05	2018-03-30 16:39:22 -04:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Carl Lemaire	6b95ca4eda	DataParallel: GPU imbalance warning (#5376 )	2018-02-27 21:30:41 +01:00
Sam Gross	406c9f9c28	Remove two uses of the old Tensor class (#5413 )	2018-02-26 15:00:51 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Adam Paszke	de5f7b725e	Base for pure C++ NCCL interface	2018-01-18 11:16:45 +01:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
peterjc123	77ea2f26d8	Add build support for Python 2.7 using MSVC (#4226 )	2017-12-20 15:07:25 +01:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Sam Gross	4bce69be22	Implement Variable.storage() (#3765 ) This still uses THPStorage, but avoids touching THPTensor	2017-11-20 14:18:07 -05:00
Soumith Chintala	50009144c0	add warnings if device capability is less than ideal (#3601 )	2017-11-09 11:48:59 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
Adam Paszke	cc3058bdac	Fix macOS build (with CUDA) (#3071 )	2017-10-11 19:04:15 +02:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Soumith Chintala	b3bc5fe302	refactor THCP method defs into cuda/Module.cpp	2017-09-30 13:14:35 -07:00
Justin Johnson	94b5990201	Add torch.cuda.get_device_name function (#2540 )	2017-08-26 15:06:37 -04:00
Adam Paszke	8ab3d214d5	Fixes for DistributedDataParallel (#2168 )	2017-07-21 16:00:46 -04:00
Edward Z. Yang	72e9e7abf7	Warning squash. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-21 14:13:11 -04:00
Zach DeVito	9d8cff9bc1	initialize aten and pytorch to share the same THCState	2017-07-11 10:35:03 -04:00
Adam Paszke	12813b88f6	Add DistributedDataParallel	2017-06-12 22:00:22 -04:00
Trevor Killeen	05bc877a05	make THPPointer have explicit constructors (#1636 )	2017-05-25 15:35:54 -04:00
Sam Gross	4c1cdb6148	Refactor Python string utility function	2017-04-28 21:25:26 +02:00
Sam Gross	aab30d4ea2	Fix errors when no CUDA devices are available (#1334 ) Fixes #1267 This fixes a number of issues when PyTorch was compiled with CUDA support but run on a machine without any GPUs. Now, we treat all errors from cudaGetDeviceCount() as if the machine has no devices.	2017-04-23 14:45:27 +02:00
Sergey Zagoruyko	8dc5d2a22e	export current_blas_handle	2017-03-23 23:32:45 +01:00
Sam Gross	b9379cfab7	Use cuDNN and NCCL symbols from _C library (#1017 ) This ensures that we use the same library at the C++ level and with Python ctypes. It moves the searching for the correct library from run-time to compile-time.	2017-03-16 16:10:17 -04:00
soumith	7ad948ffa9	fix tests to not sys.exit(), also fix fatal error on THC initialization	2017-03-01 17:37:04 -05:00
Sam Gross	fc6fcf23f7	Lock the cudaFree mutex. (#880 ) Prevents NCCL calls from overlapping with cudaFree() which can lead to deadlocks.	2017-03-01 11:29:25 -05:00
Adam Paszke	19a65d2bea	Expose stateless methods for torch.cuda.HalfTensor	2017-02-26 20:02:42 +01:00
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00
Zeming Lin	59d66e6963	Sparse Library (#333 )	2017-01-05 00:43:41 +01:00
Sam Gross	20fffc8bb7	Fix torch.is_tensor for half tensors (#322 ) Fixes #311	2016-12-19 15:27:47 +01:00
Sam Gross	1af9a9637f	Refactor copy and release GIL during copy (#286 )	2016-12-11 21:54:58 +01:00
Sam Gross	0d7d29fa57	Enable caching allocator for CUDA pinned memory (#275 ) Also add binding for CUDA "sleep" kernel	2016-12-02 01:33:56 -05:00
Adam Paszke	ebc70f7919	Look for libcudart in default CUDA installation paths (#195 )	2016-11-02 19:36:10 -04:00
Sam Gross	ad5fdef6ac	Make every user-visible Tensor have a Storage (#179 )	2016-10-31 12:12:22 -04:00
Sam Gross	79ead42ade	Add CUDA Stream and Event API (#133 )	2016-10-18 12:15:57 -04:00
Sam Gross	8d39fb4094	Use new THC API for device allocator	2016-10-17 09:35:41 -07:00
Sam Gross	ee14cf9438	Add support for pinned memory: (#127 ) torch.Storage/Tensor.pin_memory() torch.Storage/Tensor.is_pinned()	2016-10-15 18:38:26 -04:00
Sam Gross	c20828478e	Update Module.cpp for THC changes	2016-09-30 11:13:14 -07:00
Adam Paszke	3f7ab95890	Finish implementation of prng related functions	2016-09-29 11:33:25 -07:00
Sam Gross	4e9f0a8255	Use CUDA caching allocator	2016-09-26 13:12:39 -07:00
Adam Paszke	06ab3f962f	Refactor _C extension to export some utilities	2016-09-21 08:36:54 -07:00
Adam Paszke	3ea1da3b2c	Minor fix in CUDA module	2016-09-14 11:09:03 -04:00
soumith	1f2695e875	adding cuda driver check functions for runtime checking	2016-09-13 10:34:13 -07:00
Adam Paszke	8d933cbfc4	Fixes for OS X	2016-08-22 22:45:35 -04:00
Adam Paszke	12bed8dc0d	Add CUDA device selection	2016-08-12 07:46:46 -07:00
Adam Paszke	92e983a489	Fixes for Linux and new cutorch	2016-08-02 09:20:18 -07:00
Adam Paszke	c574295012	Various fixes	2016-07-19 10:45:59 -04:00
Adam Paszke	3a44259b32	Add support for CUDA	2016-07-19 10:45:59 -04:00

1 2 3 4 5

202 Commits