pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shen Li	292edfb087	Change current device in stream context manager if necessary (#16128 ) Summary: Fixes #16019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128 Differential Revision: D13721850 Pulled By: mrshenli fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5	2019-01-18 12:39:51 -08:00
Derek Kim	fbdafb006e	Fix trivial typos in torch.cuda._utils (#16026 ) Summary: Trivial typo fixings. Maybe the indefinite article "an" is needed before each "specified index" but I'm not perfectly sure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16026 Differential Revision: D13709499 Pulled By: ezyang fbshipit-source-id: 698b000bb8aa063afd81db6e67046456a439b2ce	2019-01-17 10:40:43 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
SsnL	300dcc3b96	Add cuda.reset_max_memory_* (#15985 ) Summary: Addresses #15968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985 Differential Revision: D13649916 Pulled By: soumith fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff	2019-01-14 07:31:51 -08:00
Shen Li	7b9f794580	Wrap C10 CUDAStream instead of cudaStream_t in THCPStream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15833 Differential Revision: D13608337 Pulled By: mrshenli fbshipit-source-id: 4c66ef89fad0dc14a11ddb69da92907797cd2828	2019-01-09 15:12:48 -08:00
Shen Li	99d2743863	Move Stream.query() implementation down to C++ (#15737 ) Summary: See #15682 Pushing up this small PR to check if I am doing the right thing. If correct, more will follow for other Stream APIs. Questions will be added inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15737 Differential Revision: D13581400 Pulled By: mrshenli fbshipit-source-id: 24afed7847b89b62f0692c79a101ec7ff9d9ee4d	2019-01-07 20:58:07 -08:00
Shen Li	1e9a6d7192	A quick fix for Stream operation errors on non-current device (#15689 ) Summary: see #15682 This is a quick fix by implementing the simpler solution as suggested by colesbury. As benchmark result shows, it slows down `Stream.query()` by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner. ~Test TBA~ Added FYI jeffreyksmithjr now ```python In [1]: def f(): ...: d0 = torch.device('cuda:0') ...: d1 = torch.device('cuda:1') ...: with torch.cuda.device(d0): ...: s0 = torch.cuda.current_stream() ...: with torch.cuda.device(d1): ...: s1 = torch.cuda.current_stream() ...: s0.query() ...: s1.query() In [4]: %timeit f() 38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` before ```python In [4]: %timeit f() 28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15689 Differential Revision: D13571697 Pulled By: mrshenli fbshipit-source-id: 4fe697f91248c6419136d37bb5b7147e612e2f4c	2019-01-03 15:14:58 -08:00
SsnL	e4477feb15	Update cuda.get/set_rng_state doc (#14324 ) Summary: Now that `cuda.get/set_rng_state` accept `device` objects, the default value should be an device object, and doc should mention so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14324 Reviewed By: ezyang Differential Revision: D13528707 Pulled By: soumith fbshipit-source-id: 32fdac467dfea6d5b96b7e2a42dc8cfd42ba11ee	2018-12-27 14:09:25 -08:00
David Riazati	59d71b9664	Bicubic interpolation for nn.functional.interpolate (#9849 ) Summary: Addresses #918, interpolation results should be similar to tf * Adds bicubic interpolation operator to `nn.functional.interpolate` * Corresponding test in `test_nn.py` The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849 Differential Revision: D9007525 Pulled By: driazati fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc	2018-12-17 15:31:48 -08:00
Krishna Kalyan	5e09c7bc80	record unit time in torch.cuda.event (#15221 ) Summary: Record unit of time for torch.cuda.Event's elapsed_time Differential Revision: D13467646 Pulled By: zou3519 fbshipit-source-id: 4f1f4ef5fa4bc5a1b4775dfcec6ab155e5bf8d6e	2018-12-14 15:29:06 -08:00
SsnL	fab8085111	_get_device_index supports parsing device strings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14929 Reviewed By: weiyangfb Differential Revision: D13394498 Pulled By: soumith fbshipit-source-id: 948c6118abdf6c1e1a8a17709333954cafb2345e	2018-12-09 21:12:46 -08:00
Tongzhou Wang	2448a83d30	Give broadcast_coalesced tensors different version counters (#13594 ) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370 edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5	2018-11-07 21:49:35 -08:00
Evan Klitzke	189c1e1afb	Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636 ) Summary: The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636 Differential Revision: D10377099 Pulled By: soumith fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039	2018-10-15 13:03:27 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
LaiyuanGong	f5cd479b59	fix type mismatch while call torch._C._cuda_setDevice (#8065 ) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice	2018-06-05 09:53:22 -04:00
Soumith Chintala	50e92a3085	Static linkage for CUDA (#6807 ) * add static linkage option for CUDA libs * add CuFFT linking via fakelink * remove warning for 5.0 cuda architecture	2018-04-22 13:57:17 -04:00
Tongzhou Wang	4563e190c4	Use THC cached CUDA device property when get_device_name and get_device_capability (#6027 ) Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python. Also fixes ATen compile error when it can't find CUDA. Fixes #4908. Using the script from that issue, we get roughly 18x speed-up. [ssnl@ ~] python dev.py # master 0.2826697587966919 0.00034999847412109375 0.0003493785858154297 0.000356292724609375 0.00036025047302246094 0.0003629922866821289 0.00036084651947021484 0.00035686492919921874 0.00036056041717529296 0.0003606319427490234 [ssnl@ ~] python dev.py # this PR 0.27275662422180175 2.1147727966308594e-05 1.9598007202148438e-05 1.94549560546875e-05 1.9359588623046876e-05 1.938343048095703e-05 2.0074844360351563e-05 1.952648162841797e-05 1.9311904907226562e-05 1.938343048095703e-05	2018-03-30 16:39:22 -04:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Carl Lemaire	6b95ca4eda	DataParallel: GPU imbalance warning (#5376 )	2018-02-27 21:30:41 +01:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Soumith Chintala	2d84cb4b04	warn that CUDA capability 3.0 and 5.0 is no longer supported (#5125 )	2018-02-08 00:07:53 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Peter Goldsborough	86fd5fd524	Replace async with non_blocking for Python 3.7 (#4999 ) * Replace async with non_blocking for Python 3.7 upgrade * Remove trailing whitespace * Give _cuda and _type kwargs and accept async for compatibility * Rename async to non_blocking in all C++ code * Add entries for async in python_variable_methods * Friendlier backward compatibility for cuda and type	2018-02-02 09:23:51 -05:00
Christian Sarofeen	ef4cf860ac	Lazy init in set device, also should not be called in getDevCount (#4918 )	2018-01-30 16:24:31 +01:00
albanD	ee8bcdca79	make torch.cuda.empty_cache() a no-op when cuda is not initialized (#4936 )	2018-01-30 16:22:17 +01:00
albanD	7a47790c27	Add missing _lazy_init in cuda python functions	2018-01-29 18:19:03 +01:00
SsnL	3ecd25b065	fix indentation	2018-01-28 20:56:57 +01:00
Tongzhou Wang	6420c6b224	Improve `torch.cuda.empty_cache` documentation (#4879 ) * add doc about empty_cache wont increase amount of memory available * typo	2018-01-27 04:54:25 -05:00
Yongjik Kim	dd5c195646	More documentation for CUDA stream functions. (#4756 )	2018-01-21 12:58:51 +01:00
Sam Gross	f1c616418d	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
Edward Z. Yang	c6381c6d44	Add function to explicitly initialize PyTorch CUDA state. (#4180 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 17:48:05 -05:00
Richard Zou	d450895a74	fix typo (#4175 )	2017-12-14 12:31:58 -05:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Luca Antiga	af58bfbb1b	Make integer parameters and buffers immune to float(), double() and half() (#3820 ) * Avoid casting integer params and buffers to float(), double() and half() * Add test for immune integer buffers * Fix documentation for float(), double() and half() * Fix test	2017-11-22 18:34:53 -05:00
Soumith Chintala	50009144c0	add warnings if device capability is less than ideal (#3601 )	2017-11-09 11:48:59 -05:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
SsnL	fa5efab669	comments and case where not all sparse (#3370 )	2017-11-01 06:05:17 -04:00
SsnL	01be4d6b20	sparse broadcast_coalesce and reduce_add_coalesced	2017-10-28 18:52:35 -04:00
Adam Paszke	76abc06b1f	Fix nvprof mode in autograd profiler	2017-10-20 10:22:54 -04:00
SsnL	fce3ed19e5	Change device_id to device in python land (#3133 ) * change device_id to device in python land * cuda/random.py	2017-10-17 00:54:26 +02:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Edward Z. Yang	2dcaa40425	Add get_rng_state_all and set_rng_state_all. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00
Adam Paszke	833bedc77d	Add CUDA profiler bindings	2017-09-25 23:21:30 -04:00
Edward Z. Yang	450379256c	Don't call is_available() in manual_seed, it initializes CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	b17dfa07ba	Make CUDA seeding/RNG state functions even lazier Instead of initializing CUDA immediately and executing them, we wait until we actually initialize CUDA before executing. To keep things debuggable, we also keep track of the original backtrace when these functions are called, so we can inform users where they actually called the seeding/state functions (as opposed to the first time they actually initialized the RNG). Fixes #2517 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	06d7a0b1bc	Write docs for RNG seeding on GPU more carefully. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
ngimel	3d7459ff6c	fix indices for data_parallel and add parameter gradient tests (#2632 )	2017-09-05 17:29:27 -04:00
Justin Johnson	94b5990201	Add torch.cuda.get_device_name function (#2540 )	2017-08-26 15:06:37 -04:00
Zhou Mo	2c07f88ea3	Fix typos.	2017-08-25 14:27:07 -04:00
Christian Sarofeen	ec86d0b2ba	Updates for CUDA 9	2017-08-25 07:32:05 -04:00
Gregory Chanan	50c208a50b	Revert "Fix typos." This reverts commit `4622b33952`.	2017-08-10 13:57:00 -04:00
Zhou Mo	4622b33952	Fix typos.	2017-08-08 11:05:38 -04:00
Adam Paszke	8ab3d214d5	Fixes for DistributedDataParallel (#2168 )	2017-07-21 16:00:46 -04:00
Alykhan Tejani	f814a892cf	done re-seed cuda device if in bad fork (#1923 )	2017-06-27 13:24:52 -04:00
Adam Paszke	12813b88f6	Add DistributedDataParallel	2017-06-12 22:00:22 -04:00
Adam Paszke	8db8716c7c	Support non-default streams in NCCL reduce	2017-06-12 21:58:38 -04:00
Gregory Chanan	69287250d1	Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests.	2017-06-11 05:37:59 -04:00
Edward Z. Yang	ba690d5607	Add support for NVTX functions. (#1748 )	2017-06-10 18:26:58 +02:00
Sam Gross	7f6cd7c7ea	Fix error message in CUDA forked subprocess (#1585 ) We need to re-call _lazy_init in _CudaBase.__new__ in the subprocess.	2017-05-19 12:36:08 -04:00
Sam Gross	aab30d4ea2	Fix errors when no CUDA devices are available (#1334 ) Fixes #1267 This fixes a number of issues when PyTorch was compiled with CUDA support but run on a machine without any GPUs. Now, we treat all errors from cudaGetDeviceCount() as if the machine has no devices.	2017-04-23 14:45:27 +02:00
Adam Paszke	01a35dcace	Fix coalesced CUDA collectives for nonhomogeneous lists	2017-04-11 14:48:54 -07:00
Sergey Zagoruyko	8dc5d2a22e	export current_blas_handle	2017-03-23 23:32:45 +01:00
Sam Gross	b9379cfab7	Use cuDNN and NCCL symbols from _C library (#1017 ) This ensures that we use the same library at the C++ level and with Python ctypes. It moves the searching for the correct library from run-time to compile-time.	2017-03-16 16:10:17 -04:00
Sam Gross	e50a1f19b3	Use streams in scatter to overlap copy with compute	2017-03-14 22:46:07 +01:00
Sam Gross	704ee3ca68	Use cudart symbols from the main program. Our extension library links against cudart and pulls in the symbols. Use LoadLibrary(None) to use the same symbols as the _C extension. This fixes the PyTorch wheel when you don't have system CUDA installed.	2017-03-13 19:45:34 -04:00
陈云	c7c4778af6	modify docs of `broadcast` to fix issuse #940 (#970 )	2017-03-10 09:54:43 -05:00
Sam Gross	15a9fbdedb	Merge pull request #881 from colesbury/parallelize_backwards Parallelize autograd backwards	2017-03-06 16:57:19 -05:00
Sam Gross	65b66264d4	Improve broadcast/reduce performance by coalescing tensors	2017-03-06 12:47:53 -08:00
Christian Sarofeen	b1ae7f90d5	Added functionality for data parallel table (#843 )	2017-03-05 02:35:46 +01:00
Sam Gross	34ce58c909	Parallelize backwards	2017-03-03 11:26:00 -08:00
Martin Raison	f17cfe4293	sparse tensor operations (#735 )	2017-03-03 18:37:03 +01:00
soumith	7ad948ffa9	fix tests to not sys.exit(), also fix fatal error on THC initialization	2017-03-01 17:37:04 -05:00
Sam Gross	fc6fcf23f7	Lock the cudaFree mutex. (#880 ) Prevents NCCL calls from overlapping with cudaFree() which can lead to deadlocks.	2017-03-01 11:29:25 -05:00
Eli Stevens	b87c113cf4	CUDA documentation enhancement and docs versioning (#848 ) * Add more detail to CUDA documentation Also adds better cross-linking to the pages that discuss relevant topics. * Adds recommendation to torch.save docs * Make the version numbers for the docs dynamic Might need tweaks for beta, 1.0, etc.	2017-02-26 08:33:26 -05:00
Alykhan Tejani	01bd43037d	add docs to torch/cuda/random	2017-02-20 20:43:47 -05:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	15c1dad340	Minor fixes and torch.cuda docs	2017-01-16 20:38:14 -05:00
Natalia Gimelshein	2290798a83	if nccl is available, do not compile it and load system version	2017-01-14 10:09:48 +01:00
Sam Gross	24af02154c	Use ForkingPickler for sharing tensor/storages across processes (#344 ) This hooks into the (internal) ForkingPickler class in multiprocessing to reduce tensors, storages, and CUDA events instead of our queue from joblib. This makes it easier to use the standard multiprocessing classes in later versions of Python. This also exposes: - Tensor/Storage.share_memory_() - Module.share_memory() These methods move the CPU tensors and storages to shared memory. If you're using the "fork" method of multiprocessing, these objects can be directly inherited instead of serialized through a queue.	2016-12-28 20:34:23 -05:00
Sam Gross	bb72ccf1a5	Support CUDA IPC in Python 3 (#203 ) CUDA IPC only works with Python 3 using the "spawn" start method. You can select the start method using the get_context method: import torch.multiprocessing as mp ctx = mp.get_context('spawn') queue = ctx.Queue() event = ctx.Event()	2016-12-19 20:42:53 -05:00
Sam Gross	20fffc8bb7	Fix torch.is_tensor for half tensors (#322 ) Fixes #311	2016-12-19 15:27:47 +01:00
Sam Gross	ffcc38cf05	Deterministic ordering of parameters and buffers. (#317 ) Uses the assignment syntax to get deterministic ordering of parameters. The ordering of parameters using the constructor syntax is non-deterministic because kwargs use dict() in Python 3.5 and earlier.	2016-12-16 14:45:56 -05:00
Adam Paszke	767c96850d	Return False from torch.cuda.is_available() when no devices are visible	2016-12-15 00:47:55 +01:00
Sam Gross	0d7d29fa57	Enable caching allocator for CUDA pinned memory (#275 ) Also add binding for CUDA "sleep" kernel	2016-12-02 01:33:56 -05:00
Adam Paszke	88d9fdec2e	Add torch.cuda.set_device	2016-12-01 23:14:41 +01:00
Adam Paszke	ebc70f7919	Look for libcudart in default CUDA installation paths (#195 )	2016-11-02 19:36:10 -04:00
Sam Gross	0cb5943be8	Fix NCCL reduce_scatter in Python 2.7 (#183 )	2016-10-30 17:58:02 -04:00
Sam Gross	a9c14a5306	Remove unused code	2016-10-28 15:28:22 -07:00
Sam Gross	f2d7e94948	Use torch.Size for Tensor sizes and tuple for strides See issue #20 The torch.Size class is a tuple subclass which distinguishes sizes from other tuples so that torch.Tensor(size) is interpreted as size instead of data.	2016-10-28 19:37:09 +02:00
Adam Paszke	4c17098bb8	Fix platform detection in torch.cuda	2016-10-24 22:29:43 +02:00
Francisco Massa	b85fc35f9a	Fix for versions compiled without CUDA support (#155 ) * Fix pytorch when compiling without CUDA support * Skip print test with CUDA types if CUDA is not available	2016-10-23 13:03:10 +02:00
Sam Gross	79ead42ade	Add CUDA Stream and Event API (#133 )	2016-10-18 12:15:57 -04:00
Sam Gross	ee14cf9438	Add support for pinned memory: (#127 ) torch.Storage/Tensor.pin_memory() torch.Storage/Tensor.is_pinned()	2016-10-15 18:38:26 -04:00
Sam Gross	f30081a313	Use NCCL bcast and reduce functions in comm	2016-10-14 14:16:32 -07:00
Adam Paszke	0325e2f646	Major autograd refactor Improves autograd performance by more than 2x and fixes a couple of bugs. All core functions have been moved to C.	2016-10-13 17:17:49 -07:00
Adam Paszke	93b8b5631f	Improve CUDA tensor constructor speed	2016-10-13 17:16:39 -07:00
Adam Paszke	60ab1ce0c1	Stop using contextlib for device and device_of	2016-10-13 17:16:39 -07:00
Sam Gross	2bc9da4f5e	Support "device" keyword argument (#79 ) Adds the optional "device" keyword argument to Tensor and Storage constructors and .new methods.	2016-10-01 19:32:55 -04:00
Adam Paszke	11b38a6895	Add more functions to autograd	2016-09-30 16:37:07 -04:00
Adam Paszke	3f7ab95890	Finish implementation of prng related functions	2016-09-29 11:33:25 -07:00
Sam Gross	cb5d4e836f	Lazy load CUDA and THNN modules (#64 )	2016-09-28 19:29:53 -04:00
Soumith Chintala	412019dbe4	fixing CPU builds by making cuda imports optional	2016-09-28 11:56:18 -04:00
Adam Paszke	f9d9c92560	Fix type conversions in autograd	2016-09-27 15:45:52 -07:00
Adam Paszke	3eac7164f4	Add data parallel functions to nn	2016-09-27 15:45:45 -07:00
Adam Paszke	1828e7c42f	Add async CUDA copy	2016-09-27 15:12:48 -07:00
Adam Paszke	2c89ae4e8a	Rename getDevice to get_device	2016-09-27 15:12:48 -07:00
Sam Gross	779a460030	Add cuDNN support for convolutions (#36 )	2016-09-27 17:55:04 -04:00
Adam Paszke	8fdec15a55	Codemod to remove camel case method naming	2016-09-20 08:40:28 -07:00
Adam Paszke	da5bb373e6	Type conversions now use auto gpu	2016-09-15 18:48:27 -07:00
soumith	1f2695e875	adding cuda driver check functions for runtime checking	2016-09-13 10:34:13 -07:00
Adam Paszke	774a6f1093	Add in-place operations to autograd and nn	2016-08-25 09:34:54 -07:00
Adam Paszke	ff785e5f17	Make optimizers accept a closure	2016-08-25 09:23:39 -07:00
Adam Paszke	1e905eb4d5	copy -> copy_	2016-08-12 09:26:33 -07:00
Adam Paszke	12bed8dc0d	Add CUDA device selection	2016-08-12 07:46:46 -07:00
Adam Paszke	3a44259b32	Add support for CUDA	2016-07-19 10:45:59 -04:00

... 3 4 5 6 7

323 Commits