pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Zou	e60a7c2c88	codemod tensor.type().is_cuda(), tensor.type().is_sparse() (#13590 ) Summary: Followup to #12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7	2018-11-07 07:27:42 -08:00
Edward Yang	0aaff5eaf9	Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275 This resulted in a bunch of knock-on changes, which I will now describe: - s/original_index/original_device/ - s/last_index/last_device/ - A bunch of places that used set_index, now use CUDAGuard (which does have set_index) because they were CUDA-specific code. Major caveat: DeviceGuard doesn't actually work non-CUDA/CPU devices, To make that happen, I plan on totally replacing the implementation of DeviceGuard; what I mostly care about here is wrangling the API into an acceptable state. Reviewed By: gchanan Differential Revision: D12832080 fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d	2018-10-31 07:55:13 -07:00
Teng Li	e475d3ede3	DDP multi-GPU segfault fix (#13291 ) Summary: Fix https://github.com/pytorch/pytorch/issues/13200 Tested on 8 GPU machines since CI doesn't have this many GPUs, so multi-GPU test won't be triggered ``` tengli@learnfair096:~/pytorch/test$ python run_test.py -i distributed --verbose Selected tests: distributed Running test_distributed ... [2018-10-29 20:32:46.355858] /public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec Running distributed tests for the gloo backend test_DistBackend (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... ok ``` Also I would like to bump up the bucket size of broadcast to higher for performance reasons Pull Request resolved: https://github.com/pytorch/pytorch/pull/13291 Differential Revision: D12842840 Pulled By: teng-li fbshipit-source-id: e8c50f15ebf2ab3e2cd1b51d365e41a6106b98fe	2018-10-31 00:43:42 -07:00
Edward Yang	1a4473bbd7	Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125 Previously, it returned a vector of THCStream*, which we eventually turned into CUDAStream. No need to spatter the conversion code everywhere: just do it correctly to begin with. An important side effect of doing it this way is that we no longer pass nullptr to CUDAStream; instead, we create the default stream. I will rely on this in a later patch. Reviewed By: gchanan Differential Revision: D10853224 fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a	2018-10-29 08:27:23 -07:00
Teng Li	b4d0dc77be	Eliminate CUDAStream nullptr in NCCL (#13089 ) Summary: As the title says, we should always use the current stream on device in NCCL. This can unblock ezyang on his further work Pull Request resolved: https://github.com/pytorch/pytorch/pull/13089 Reviewed By: ezyang Differential Revision: D10847172 Pulled By: teng-li fbshipit-source-id: 7fc7c4248b5efa1971d2af4d43f62d3379debfe4	2018-10-24 20:04:41 -07:00
Teng Li	8d3e7e2fcb	Move DDP queue_reduction to C++ (#12852 ) Summary: fully working version by using continuing on goldsborough 's initial version. waiting on the stream guard to be merged before adding more stream perf logics into the c++ version Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852 Differential Revision: D10468696 Pulled By: teng-li fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f	2018-10-22 16:07:46 -07:00
Edward Yang	1d399a80a0	Handle pollution of MAX, MIN and CHECK macros. (#11805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805 Some of our headers in Caffe2 pollute the macro namespace with things like MAX, MIN, CHECK, so I renamed these in places where this is a problem. This patch courtesy of gchanan, extracted out of #11721 Reviewed By: Yangqing Differential Revision: D9917757 fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c	2018-09-18 13:18:31 -07:00
Tongzhou Wang	70d93f4777	Check for maximum numel in NCCL broadcasting (#11466 ) Summary: NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit. No test because I don't think our test suite should broadcast big tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466 Differential Revision: D9754753 Pulled By: SsnL fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685	2018-09-10 14:39:15 -07:00
Peter Goldsborough	7ddc6f84c4	NULL -> nullptr (#11047 ) Summary: How did we get so many uses of `NULL` again? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047 Differential Revision: D9566799 Pulled By: goldsborough fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3	2018-08-30 16:25:42 -07:00
Edward Yang	6bdbad93b9	Refactor Device to not depend on Backend. (#10478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478 - Removed Backend constructor from Device, and fixed all use-sites to use DeviceType::CPU instead of kCPU, or use a new function backendToDeviceType to perform the conversion. - New method device_type() on Type; it gives you the underlying device type, e.g., CPU for SparseCPU. - We add backward compatibility for kCPU/kCUDA uses, by introducing a new special type which is implicitly convertible to both DeviceType and Backend. As long as you don't define a function that's overloaded on both DeviceType and Backend (but not on BackendOrDeviceType), the implicit conversions will ensure that uses of at::Device(at::kCPU) keep working. We fixed use-sites in the library, but did NOT fix sites in the test code, so that we can exercise this BC code. Reviewed By: Yangqing Differential Revision: D9301861 fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716	2018-08-18 17:39:14 -07:00
mruberry	d6f21fc663	Ports Streams to ATen (#8997 ) Summary: This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it: + Creates a new (THC free) at::CUDAStream class and API + Extends the at::Context API to expose it + Stubs the current THCStream and THCState APIs to use it + Updates THC to no longer violate stream encapsulation (stream.hpp is dead) + Adds an ATen cpp test of the API + Bonus: Removes some debug spew in test_nn.py The new API has several advantages over the old one: (1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t. (2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique) (3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState (4) The internals are completely encapsulated, unlike the historic THCStream (5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention) There are a couple things this PR does not do, however, which are left for future work: - It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream). - It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed. I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8997 Differential Revision: D8683375 Pulled By: soumith fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf	2018-07-08 16:25:09 -07:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Xiaodong Wang	922adf8d09	Skip calling ncclCommDestroy in destructor (#8352 ) There is a bug in NCCL that causing seg faults while calling ncclCommDestroy() in the destructor during program exit. According to Nvidia, "Whether the NCCL destructor will be called before or after the CUDA runtime destructor is undefined, which can lead to crashes." For the immediate workaround, skip calling ncclCommDestroy ihe NCCL destructor. This is UGLY and we'll follow up with Nvidia to solve this ASAP.	2018-06-12 13:11:09 -04:00
Edward Z. Yang	4caea64d72	Make all of TH and THC C++. (#6913 ) Changelist: - Move .c to .cpp - Change includes of ".c" to ".cpp" - A bunch of cmake configuration modifying CMAKE_C_FLAGS changed to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code - Explicitly cast void* to T* in a number of places - Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C") - Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct" - Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor) - Documentation about how the TH C interface (and extern C business) works - Note that THD master_worker mode is dead - C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h) - New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly - New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero. - In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions - Some extra Werror fixes (char* versus const char*)	2018-04-28 07:45:02 -04:00
Sam Gross	23fc2b7e06	Define CHECK in torch/csrc/cuda/nccl.h (#4721 ) The CHECK function was used but not defined in the nccl.h header file.	2018-01-18 13:08:06 -05:00
Adam Paszke	de5f7b725e	Base for pure C++ NCCL interface	2018-01-18 11:16:45 +01:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Gregory Chanan	6dee02923c	[ATen] Rename isSparse -> is_sparse.	2017-11-15 18:33:07 -08:00
Gregory Chanan	9a2b54e08b	[ATen] Rename isCuda -> is_cuda.	2017-11-15 18:33:07 -08:00
soumith	5a96037810	skip ncclCommDestroy if CUDA driver is already unloaded	2017-10-13 08:50:00 -07:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00

1 2 3

123 Commits