pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
LaiyuanGong	f5cd479b59	fix type mismatch while call torch._C._cuda_setDevice (#8065 ) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice	2018-06-05 09:53:22 -04:00
Soumith Chintala	50e92a3085	Static linkage for CUDA (#6807 ) * add static linkage option for CUDA libs * add CuFFT linking via fakelink * remove warning for 5.0 cuda architecture	2018-04-22 13:57:17 -04:00
Tongzhou Wang	4563e190c4	Use THC cached CUDA device property when get_device_name and get_device_capability (#6027 ) Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python. Also fixes ATen compile error when it can't find CUDA. Fixes #4908. Using the script from that issue, we get roughly 18x speed-up. [ssnl@ ~] python dev.py # master 0.2826697587966919 0.00034999847412109375 0.0003493785858154297 0.000356292724609375 0.00036025047302246094 0.0003629922866821289 0.00036084651947021484 0.00035686492919921874 0.00036056041717529296 0.0003606319427490234 [ssnl@ ~] python dev.py # this PR 0.27275662422180175 2.1147727966308594e-05 1.9598007202148438e-05 1.94549560546875e-05 1.9359588623046876e-05 1.938343048095703e-05 2.0074844360351563e-05 1.952648162841797e-05 1.9311904907226562e-05 1.938343048095703e-05	2018-03-30 16:39:22 -04:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Carl Lemaire	6b95ca4eda	DataParallel: GPU imbalance warning (#5376 )	2018-02-27 21:30:41 +01:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Soumith Chintala	2d84cb4b04	warn that CUDA capability 3.0 and 5.0 is no longer supported (#5125 )	2018-02-08 00:07:53 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Peter Goldsborough	86fd5fd524	Replace async with non_blocking for Python 3.7 (#4999 ) * Replace async with non_blocking for Python 3.7 upgrade * Remove trailing whitespace * Give _cuda and _type kwargs and accept async for compatibility * Rename async to non_blocking in all C++ code * Add entries for async in python_variable_methods * Friendlier backward compatibility for cuda and type	2018-02-02 09:23:51 -05:00
Christian Sarofeen	ef4cf860ac	Lazy init in set device, also should not be called in getDevCount (#4918 )	2018-01-30 16:24:31 +01:00
albanD	ee8bcdca79	make torch.cuda.empty_cache() a no-op when cuda is not initialized (#4936 )	2018-01-30 16:22:17 +01:00
albanD	7a47790c27	Add missing _lazy_init in cuda python functions	2018-01-29 18:19:03 +01:00
SsnL	3ecd25b065	fix indentation	2018-01-28 20:56:57 +01:00
Tongzhou Wang	6420c6b224	Improve `torch.cuda.empty_cache` documentation (#4879 ) * add doc about empty_cache wont increase amount of memory available * typo	2018-01-27 04:54:25 -05:00
Yongjik Kim	dd5c195646	More documentation for CUDA stream functions. (#4756 )	2018-01-21 12:58:51 +01:00
Sam Gross	f1c616418d	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
Edward Z. Yang	c6381c6d44	Add function to explicitly initialize PyTorch CUDA state. (#4180 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 17:48:05 -05:00
Richard Zou	d450895a74	fix typo (#4175 )	2017-12-14 12:31:58 -05:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Luca Antiga	af58bfbb1b	Make integer parameters and buffers immune to float(), double() and half() (#3820 ) * Avoid casting integer params and buffers to float(), double() and half() * Add test for immune integer buffers * Fix documentation for float(), double() and half() * Fix test	2017-11-22 18:34:53 -05:00
Soumith Chintala	50009144c0	add warnings if device capability is less than ideal (#3601 )	2017-11-09 11:48:59 -05:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
SsnL	fa5efab669	comments and case where not all sparse (#3370 )	2017-11-01 06:05:17 -04:00
SsnL	01be4d6b20	sparse broadcast_coalesce and reduce_add_coalesced	2017-10-28 18:52:35 -04:00
Adam Paszke	76abc06b1f	Fix nvprof mode in autograd profiler	2017-10-20 10:22:54 -04:00
SsnL	fce3ed19e5	Change device_id to device in python land (#3133 ) * change device_id to device in python land * cuda/random.py	2017-10-17 00:54:26 +02:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Edward Z. Yang	2dcaa40425	Add get_rng_state_all and set_rng_state_all. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00
Adam Paszke	833bedc77d	Add CUDA profiler bindings	2017-09-25 23:21:30 -04:00
Edward Z. Yang	450379256c	Don't call is_available() in manual_seed, it initializes CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	b17dfa07ba	Make CUDA seeding/RNG state functions even lazier Instead of initializing CUDA immediately and executing them, we wait until we actually initialize CUDA before executing. To keep things debuggable, we also keep track of the original backtrace when these functions are called, so we can inform users where they actually called the seeding/state functions (as opposed to the first time they actually initialized the RNG). Fixes #2517 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	06d7a0b1bc	Write docs for RNG seeding on GPU more carefully. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
ngimel	3d7459ff6c	fix indices for data_parallel and add parameter gradient tests (#2632 )	2017-09-05 17:29:27 -04:00
Justin Johnson	94b5990201	Add torch.cuda.get_device_name function (#2540 )	2017-08-26 15:06:37 -04:00
Zhou Mo	2c07f88ea3	Fix typos.	2017-08-25 14:27:07 -04:00
Christian Sarofeen	ec86d0b2ba	Updates for CUDA 9	2017-08-25 07:32:05 -04:00
Gregory Chanan	50c208a50b	Revert "Fix typos." This reverts commit `4622b33952`.	2017-08-10 13:57:00 -04:00
Zhou Mo	4622b33952	Fix typos.	2017-08-08 11:05:38 -04:00
Adam Paszke	8ab3d214d5	Fixes for DistributedDataParallel (#2168 )	2017-07-21 16:00:46 -04:00
Alykhan Tejani	f814a892cf	done re-seed cuda device if in bad fork (#1923 )	2017-06-27 13:24:52 -04:00
Adam Paszke	12813b88f6	Add DistributedDataParallel	2017-06-12 22:00:22 -04:00

1 2 3

110 Commits