pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Roy Li	80a7eac79e	Remove Type::elementSizeInBytes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17785 Reviewed By: ezyang Differential Revision: D14379074 fbshipit-source-id: 60727f187d61eb571b144bd6eed4dd4908da0b51	2019-03-15 12:56:02 -07:00
Roy Li	7aae51cded	Replace tensor.type().scalarType() calls with tensor.scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515 Reviewed By: ezyang Differential Revision: D14233250 fbshipit-source-id: 6c7af8d2291c0c2b148001b30cf03834f34366c0	2019-03-08 14:08:18 -08:00
Zachary DeVito	21193bf123	try to get rid of tmp_install (#16414 ) Summary: Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414 Differential Revision: D13863635 Pulled By: zdevito fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828	2019-01-29 17:29:40 -08:00
Edward Yang	e936a69085	Move THCCachingAllocator to c10_cuda. (#16119 ) Summary: Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119 Reviewed By: smessmer Differential Revision: D13718768 fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656	2019-01-24 12:06:56 -08:00
Edward Yang	24b50f1411	Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117 This means I can move it to c10_cuda with minimal fuss. Reviewed By: smessmer Differential Revision: D13717836 fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00	2019-01-24 12:06:54 -08:00
Teng Li	fc5b79cd1c	CUDA event should only be recorded after NCCL group (#8219 ) Summary: Otherwise, it won't work if we sync on this event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8219 Reviewed By: pietern Differential Revision: D13788657 Pulled By: teng-li fbshipit-source-id: 8c96e9691ed2441d7a685fb7ae8fece906f58daf	2019-01-23 14:18:26 -08:00
Edward Yang	2d485ffb17	Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248 This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current behavior of pretending CUDA is HIP when you build with ROCm enabled. Reviewed By: bddppq Differential Revision: D13145293 fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02	2018-12-12 11:24:26 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Teng Li	20e395a130	Fixed uninitialized warning (#14001 ) Summary: Fixing: https://github.com/pytorch/pytorch/issues/12014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14001 Differential Revision: D13078583 Pulled By: teng-li fbshipit-source-id: 6c8d663da81bc3e564f0643926d67260df828dd8	2018-11-14 21:37:11 -08:00
Edward Yang	72da09bb4d	Canonicalize THD includes with .. in them Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13980 Reviewed By: jerryzh168 Differential Revision: D13062706 fbshipit-source-id: 100e10d1bae7efc3e13f029708c2c1dd053ce074	2018-11-14 17:43:56 -08:00
Edward Yang	a440629f14	Remove defunct build.sh/THConfig.cmake (#13953 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13953 Differential Revision: D13056128 Pulled By: ezyang fbshipit-source-id: 9fd17f4fe000ac06144b04be996ef6849de2bafa	2018-11-14 07:42:09 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Richard Zou	e60a7c2c88	codemod tensor.type().is_cuda(), tensor.type().is_sparse() (#13590 ) Summary: Followup to #12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7	2018-11-07 07:27:42 -08:00
Teng Li	ce6edbfbd9	Fixed NCCL backend not being built (#13653 ) Summary: An regression caused by NCCL build refactoring earlier. CC fmassa Fixing: https://github.com/facebookresearch/maskrcnn-benchmark/issues/122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13653 Differential Revision: D12952555 Pulled By: teng-li fbshipit-source-id: b42e2a88fff83c9ddd58eeb33e933f1f59f51c52	2018-11-06 19:33:49 -08:00
Edward Yang	0aaff5eaf9	Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275 This resulted in a bunch of knock-on changes, which I will now describe: - s/original_index/original_device/ - s/last_index/last_device/ - A bunch of places that used set_index, now use CUDAGuard (which does have set_index) because they were CUDA-specific code. Major caveat: DeviceGuard doesn't actually work non-CUDA/CPU devices, To make that happen, I plan on totally replacing the implementation of DeviceGuard; what I mostly care about here is wrangling the API into an acceptable state. Reviewed By: gchanan Differential Revision: D12832080 fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d	2018-10-31 07:55:13 -07:00
Edward Yang	e5d56659ec	Delete DeviceGuard(int64_t) constructor. (#13232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232 DeviceGuard should be device agnostic, which means that it shouldn't assume that int64_t means select the CUDA device. Reviewed By: gchanan Differential Revision: D10858024 fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe	2018-10-31 07:55:11 -07:00
Anders Papitto	380d2dfb27	absorb nccl (#13150 ) Summary: always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150 Differential Revision: D12815674 Pulled By: anderspapitto fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa	2018-10-29 12:04:32 -07:00
Teng Li	4f94d82c7f	clang-format on c10d and THD (#13138 ) Summary: clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd Pull Request resolved: https://github.com/pytorch/pytorch/pull/13138 Differential Revision: D10857742 Pulled By: teng-li fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52	2018-10-25 14:16:47 -07:00
Anders Papitto	69906afaee	absorb THD into main cmake build (#12775 ) Summary: We want to move _C into the same cmake invocation that builds libcaffe2 and libtorch. However, _C depends on THD and c10d, which in turn depend on libcaffe2. That means that we can't move _C into that cmake file unless we do these two first. This change does so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12775 Differential Revision: D10457374 Pulled By: anderspapitto fbshipit-source-id: 2c1aa3b8a418a73d2112e93c7da53a2e70cf7bba	2018-10-24 21:28:37 -07:00
Anders Papitto	2dacf28b66	link libgloo_cuda.a explictly from setup.py (#12951 ) Summary: rather than pass a list through a text file Pull Request resolved: https://github.com/pytorch/pytorch/pull/12951 Differential Revision: D10528309 Pulled By: anderspapitto fbshipit-source-id: d94befcd61b6304815859694b623046f256462df	2018-10-24 13:19:46 -07:00
Gregory Chanan	0947712e5d	Move Factory functions from Type to TypeExtendedInterface. (#12025 ) Summary: This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular: 1) Removes factory functions from Type, into TypeExtendedInterface. 2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads. 3) We move from Type-based sparse_coo_tensor dispatch to function-based. Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025 Reviewed By: ezyang Differential Revision: D10017205 Pulled By: gchanan fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320	2018-09-25 09:40:17 -07:00
Yangqing Jia	c6a14b1edd	Revert D9985212: [pytorch][PR] [minor] remove a remaining todo line deletion in THD cmake Differential Revision: D9985212 Original commit changeset: 5f8e7ac94101 fbshipit-source-id: 1783cbfc91008ab3db36bad7c1bf51e16da7fb2d	2018-09-21 11:25:53 -07:00
Yangqing Jia	0d9be2135f	remove a remaining todo line deletion in THD cmake (#11920 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/11920 Differential Revision: D9985212 Pulled By: Yangqing fbshipit-source-id: 5f8e7ac94101177740e791f44eaa8c8ec55a908c	2018-09-21 00:40:20 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Edward Yang	19031c68dc	Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488 ) Summary: ``` Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage This patch does two major changes: - It replaces the use of Retainable in Storage with a new implementation based on intrusive_ptr. This will be necessary because Caffe2 will be using this class to implement intrusive_ptrs, and we need to line these up for the merge. One good thing about the new implementation is that the default copy/move constructors/assignment operators and destructor work automatically, instead of needing to be hardcoded into Storage/Tensor. - It replaces all places where we returned std::unique_ptr<Storage> with Storage, collapsing an unnecessary double indirection that is no longer necessary now that we have correctly working copy/move constructors. I didn't initially want to do step (2), but it was very important to eliminate all bare uses of new Storage and new StorageImpl, and this making the API change was the most straightforward way to do this. HOW TO FIX YOUR CODE IN THE NEW API - You no longer need to dereference the result of tensor.storage() to pass it to set. So, instead of: x.set_(*y.storage()); just write: x.set_(y.storage()); - If you were accessing methods on StorageImpl via the pImpl() method, you must use the dot operator to run pImpl(). Even better; just drop pImpl, we now have method forwarding. So, instead of: storage->pImpl()->data(); just do: storage->data(); // storage.pImpl()->data() works too but is not as recommended - storage->getDevice() is no more; instead use storage->device().index() MISC CODE UPDATES - retain, release, weak_retain, weak_release and weak_lock are now reimplemented using the "blessed API", and renamed to make it clearer that their use is discouraged. - nvcc OS X and general OS X portability improvements to intrusive_ptr - A new comment in intrusive_ptr describing how stack allocated intrusive_ptr_targets work differently than heap allocated ones from c10::make_intrusive CAVEAT EMPTOR - THStorage_weakRetain used to work on strong pointers, but it NO LONGER works with intrusive_ptr. You must reclaim the strong pointer into a real strong pointer, construct a weak pointer from it, and then release the strong and weak pointers. See StorageSharing.cpp for an example. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488 Reviewed By: gchanan Differential Revision: D9306134 Pulled By: ezyang fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2	2018-08-21 21:39:55 -07:00
Adam Lerer	18e298305e	Increase TCP listen queue size from 64 to 1024 (#10268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268 Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64. I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused. Reviewed By: soumith Differential Revision: D9182216 fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936	2018-08-07 08:26:06 -07:00
Ailing Zhang	371a786b18	Errors out when Openmpi < 2.x.x with distributed. (#10015 ) Summary: This PR fixes #9418 . Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version. I've tested on 2.1.1 and 3.0.0 and they work well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015 Reviewed By: soumith Differential Revision: D9088103 Pulled By: ailzhang fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045	2018-07-31 12:24:40 -07:00
Gregory Chanan	1af1b0c2a5	Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895 The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward. Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!): ``` nDimension corresponds to the "true" ATen dimension. TODO: implement. nDimensionLegacyNoScalars correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors. nDimensionLegacyAll corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors and tensors with a dimension of size zero are collapsed to 0-dimensional tensors. ``` So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll. These are just codemods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835 Reviewed By: ezyang Differential Revision: D8999338 Pulled By: gchanan fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311	2018-07-27 08:56:38 -07:00
idansc	029cf1d78a	Improve error messages of wrong dimensions (#9694 ) Summary: Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694 Differential Revision: D8949589 Pulled By: ezyang fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f	2018-07-23 10:10:55 -07:00
Christian Puhrsch	c1ee8835b6	Constructors and member functions for THStorage (#9357 ) Summary: Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357 Reviewed By: ezyang Differential Revision: D8863934 Pulled By: cpuhrsch fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1	2018-07-18 15:56:26 -07:00
Edward Yang	d2d43824cd	Delete flag from THTensor. (#9494 ) Summary: It was only used to toggle refcounting, but we ALWAYS refcount tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9494 Differential Revision: D8875169 Pulled By: ezyang fbshipit-source-id: 3a8618fb288334e62942bbaf388f3c9e473e7524	2018-07-17 11:25:41 -07:00
Will Feng	ad74006ffa	Pass THDRequest as void* pointer to THDRequest_free (#9398 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/9054. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9398 Reviewed By: ezyang Differential Revision: D8827778 Pulled By: yf225 fbshipit-source-id: 862287802cb69c6ac71ff4df19cadb89b1face1d	2018-07-16 19:25:22 -07:00
Edward Yang	976f9253a5	Eliminate storage views. (#9466 ) Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes #9447. Fixes #46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97	2018-07-16 15:40:24 -07:00
Edward Yang	cffca2926b	Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (#9358 ) Summary: See Note [Supervisor deleter] for how SupervisedPtr works. This design is not the obvious one, but there were a lot of constraints feeding into it: - It must support the reallocation usage-pattern, where, given an existing Storage, we allocate a new region of memory, copy the existing data to it, and then deallocate the old region of memory. - Creation of a deleter for memory MUST avoid dynamic allocations in the common case. We've done some benchmarking in Caffe2 where dynamic allocation for deleters is ruinously expensive, and it's really hard to avoid these performance tarpits in very general function wrappers like std::function or folly::Function (while benchmarking this, we discovered that folly::Function's move constructor was way more expensive than it should be). - We need to be able to deallocate data that comes from external sources, e.g., dlpack and numpy tensors. Most notably, you often cannot deallocate these with merely the void* data pointer; you need some extra, out-of-band information (e.g., the managing struct) to deallocate it. Sometimes, you may even want to resize data living in an external source! - The "core" allocators need to support being wrapped in a Thrust allocator, so you need to be implement the following two functions: char* allocate(size_t); void deallocate(char, size_t); - We need to support tensors which contain non-POD, non-trivially copyable data; specifically tensors of std::string. This is an upcoming requirement from Caffe2. It's dirty AF, but it's really useful. - It should use C++ standard library types like std::unique_ptr (which is hugely problematic because std::unique_ptr doesn't call the deleter when the pointer is null.) Here is the billing of changes: - Built-in support for realloc() has been DROPPED ENTIRELY. Instead, you're expected to allocate and then copy from the old memory to the new memory if you want to do a reallocation. This is what you'd generally have expected to occur; and axing realloc() from the design lets us avoid some tricky correctness issues with std::realloc(), namely the fact that we must refuse the realloc if the type of the elements are not trivially copyeable. If it really matters, we can add this back, but there really needs to be a good explanation WHY you need fast resizing reallocations (by in large, people don't resize their storages, and it should be acceptable to have a performance degradation when they do). - TH_STORAGE_FREEMEM is no more; instead, if you want a storage which doesn't free its result, you just give it an empty deleter. - What we used to call an "allocator" (really, a combined object for allocating/deleting) has been split into two concepts, an allocator, and a smart pointer (SupervisedPtr) which knows how to delete data. - Unlike previously, where THAllocator/THCDeviceAllocator could have a per-tensor context storing extra information (e.g., a pointer to the metadata you need to actually free the tensor), there is no context in the allocator or the deleter of the smart pointer; instead, the smart pointer directly holds an owning reference to the metadata necessary to free the data. This metadata is freshly manufactured* upon every allocation, which permits us to resize tensors even in the absence of built-in support for realloc(). - By default, allocators don't support "raw" allocations and deallocations with raw pointers. This is because some allocations may return a different context every time, in which case you need to reconstruct the context at delete time (because all you got was a void, not a unique_ptr that carries the deleter). - The diff between at::Allocator and THCDeviceAllocator is a bit larger: - It used to return a cudaError_t. Now, allocators are expected to check the error status immediately and throw an exception if there was an error. It turns out that this is what was immediately done after all occurrences of allocate/release, so it wasn't a big deal (although some subsidiary interfaces had to themselves be converted to not return cudaError_t). There is one notable exception to this, and it is how we handle CUDA OOM: if this occurs, we attempt to return unused memory to the system and try again. This is now handled by a catch-all try-catch block. The cost of catching the exception is probably the least of your worries if you're about to OOM. - It used to take the CUDA stream to perform the allocation on as an argument. However, it turned out that all call sites, this stream was the stream for the current device. So we can push this into the allocator (and the choice, in the future, could be made explicitly by twiddling thread local state.) - It held two extra methods, emptyCache and cacheInfo, specifically for interacting with some state in THCCachingAllocator. But this "generality" was a lie, since THCCachingAllocator was the only allocator that actually implemented these methods, and there is actually a bunch of code in THC which assumes that it is the caching allocator that is the underlying allocator for CUDA allocations. So I folded these two methods into this interface as THCCachingAllocator_emptyCache and THCCachingAllocator_cacheInfo. - It held its context directly inside the THCDeviceAllocator struct. This context has been moved out into whatever is holding the at::Allocator. - The APIs for getting at allocators/deleters is now a little different. - Previously there were a bunch of static variables you could get the address of (e.g., &THDefaultAllocator); now there is a function getTHDefaultAllocator(). - Some "allocators" didn't actually know how to allocate (e.g., the IPC "allocator"). These have been deleted; instead, you can wrap the produced pointers into SupervisedPtr using an appropriate makeSupervisedPtr() static method. - Storage sharing was a lot of work to wrangle, but I think I've tamed the beast. - THMapAllocator and its "subclasses" have been refactored to be proper, honest to goodness C++ classes. I used the enum argument trick to get "named" constructors. We use inheritance to add refcounting and management (in libshm). What we previously called the "Context" class (Context has been dropped from the name) is now the supervisor for the data. - Sometimes, we need to pull out the file descriptor from a tensor. Previously, it was pulled out of the allocator context. Now, we pull it out of the supervisor of the SupervisorPtr, using the static method fromSupervisedPtr(), which uses the deleter as the typeid, and refines the type if it matches. - I renamed the std::function deleter into InefficientStdFunctionSupervisor, to emphasize the fact that it does a dynamic allocation to save the std::function deleter. TODO: - Windows libshm is in shambles and needs to be fixed. Perhaps for the future: - newFromFd is now unconditionally calling cudaPointerGetAttributes even though this is unnecessary, because we know what the device is from higher up in the callstack. We can fix this by making newWithDataAndAllocator also take an explicit device argument. - Consider statically distinguishing between allocators that support raw_allocate/raw_deallocate, and those which don't. The Thrust constraint applies only to the CUDA device allocator; you never need to allocate CPU memory this way - Really want to get rid of storage views. Ugh. Nontrivial bugs I noticed when preparing this patch: - I forgot to placement-new unique pointers and attempted to assign them directly on uninitialized memory; very bad! Sam Gross has encouraged me to replace this with a proper constructor but I keep putting it off, because once everything goes in StorageImpl there really will be a proper constructor. - I rewrote a number of APIs to use newWithDataAndAllocator instead of newWithAllocator, calling the allocator at the call site (because they required "allocation context" which we no longer give to "allocators"). When I did this, I forgot to insert the multiplication with sizeof(real) to scale from numels to number of bytes. - The implementation of swap on storages was missing it for scalarType and backend. It was benign (because the only case we call swap is when these are the same), but I fixed it anyway. - I accidentally returned a nullptr unique_ptr with no deleter, even though there was a legitimate one. This matters, because some code still shoves its hands in the deleter context to get extra metadata about the function. - I used std::move() on a unique_ptr, and then did a boolean test on the pointer aftewards (always false!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9358 Reviewed By: SsnL Differential Revision: D8811822 Pulled By: ezyang fbshipit-source-id: 4befe2d12c3e7fd62bad819ff52b054a9bf47c75	2018-07-15 15:11:18 -07:00
Yu Feng	22ba8726da	Mention MPICH_MAX_THREAD_SAFETY=multiple. (#8580 ) Currently, this is a common step to enable level 3 support on MPICH based systems.	2018-06-26 12:40:48 -04:00
Peter Goldsborough	4f37a6481d	Fix DeviceGuard usage in THD (#8622 )	2018-06-18 18:33:54 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Soumith Chintala	7251d70c5b	fixed THD NO_CUDA (#8539 )	2018-06-15 09:09:23 -04:00
Yangqing Jia	991bdd7f13	[build] remove the use of NO_CUDA (#8300 ) * Only remove NO_CUDA from CMakeLists.txt * @ezyang's catch	2018-06-12 12:14:36 -04:00
Teng Li	80b6f9edd6	[THD] fix broken THD build with NCCL (#8323 )	2018-06-10 23:48:10 -04:00
Yangqing Jia	37073f8be0	[build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159 ) * Remove /torch/lib/THD/cmake in favor of /cmake * path fix * Explicitly marking gloo to use cuda * Fix gloo path in THD	2018-06-08 17:55:12 -04:00
Peter Goldsborough	04a3616de0	Replace std::size_t with size_t (#8093 )	2018-06-04 11:10:44 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
Deyu Fu	20041e2704	better cache for nccl resourse (#6970 ) allow more than 1 device list to be stored	2018-05-10 19:42:36 +02:00
gchanan	36a3f0995b	Remove THDTensorDescriptor_newFromTH{X}Tensor. (#7287 ) They don't seem to be used and we are moving to a single TensorImpl model.	2018-05-04 12:22:19 -07:00
Pieter Noordhuis	ebebfce681	Minor THD cleanup (#7161 ) * Remove stale THD README * Move common THD dependency into THD/base The master_worker directory now no longer contains files that are needed for building other parts of THD.	2018-05-02 07:29:27 -07:00
Paul Jesse Hellemn	7968ee0f59	Removing references to CUDA_SDK_ROOT_DIR to see if it breaks anything (#7125 )	2018-05-01 07:52:16 -07:00
Luca Antiga	0703357723	Don't build THD/master_worker if not explicitly requested (#7081 )	2018-04-29 13:17:09 -04:00
Edward Z. Yang	4caea64d72	Make all of TH and THC C++. (#6913 ) Changelist: - Move .c to .cpp - Change includes of ".c" to ".cpp" - A bunch of cmake configuration modifying CMAKE_C_FLAGS changed to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code - Explicitly cast void* to T* in a number of places - Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C") - Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct" - Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor) - Documentation about how the TH C interface (and extern C business) works - Note that THD master_worker mode is dead - C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h) - New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly - New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero. - In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions - Some extra Werror fixes (char* versus const char*)	2018-04-28 07:45:02 -04:00

1 2 3 4 5

224 Commits