pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
cyy	6da0e7f84b	[Clang-tidy header][17/N] Apply clang-tidy on headers in torch/csrc/cuda (#117829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117829 Approved by: https://github.com/albanD	2024-01-26 13:33:24 +00:00
cyy	2b5a201aa6	[Exception] [3/N] Replace torch::NotImplementedError and torch::LinAlgError with C10 counterparts. (#116824 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116824 Approved by: https://github.com/albanD	2024-01-11 11:27:04 +00:00
cyy	91bbcf8c71	[1/N] replace THPUtils_assert with TORCH_CHECK (#116675 ) This PR replaces THPUtils_assert with TORCH_CHECK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116675 Approved by: https://github.com/albanD	2024-01-04 11:15:33 +00:00
Wes Bland	9d765d28ca	[pytorch] Add binding to get nccl version suffix (#112884 ) Summary: Adds a Python to C binding to get the NCCL_SUFFIX value for more accurate NCCL version information and add that to the NCCL version tuple. Differential Revision: D50978181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112884 Approved by: https://github.com/kwen2501	2023-11-08 02:51:22 +00:00
PyTorch MergeBot	b3603f8129	Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855 )" This reverts commit `34f2d3e6ae`. Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests	2023-01-06 19:56:35 +00:00
William Phetsinorath	34f2d3e6ae	Deduplicate c10 error and PyTorchError hierarchy (#87855 ) Fixes #53370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855 Approved by: https://github.com/albanD	2023-01-02 15:53:36 +00:00
Aaron Gokaslan	12671fe620	Reserve space for std::vector output in extract_tensors for nccl python bindings (#88203 ) Optimizes the nccl python bindings to reserve space when converting PythonObject* into Tensors. This should reduce the number of unnecessary allocations in the nccl bindings as the std::vector grows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88203 Approved by: https://github.com/ezyang	2022-12-10 20:28:19 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Edward Yang	6c65ce8ee1	Use THPVariable_Unpack in python_nccl (#56016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56016 Missed these because I don't build on CUDA Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27765124 Pulled By: ezyang fbshipit-source-id: aa202f594659d53c903b88c9d4a4cbb0e1c0b40a	2021-04-15 08:57:06 -07:00
Richard Barnes	d690973295	irange on int64_t (#55148 ) Summary: Converts loops of the form: ``` for(int64_t VAR=0;VAR<LIMIT;VAR++) ``` to the form ``` for(const auto VAR : c10::irange(LIMIT)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55148 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27447811 fbshipit-source-id: 6311a094ec4a81a0b57383aaee0ba1b1dc2445c4	2021-04-05 16:14:00 -07:00
Nikita Shulga	8ab2ad306d	Enable `torch.cuda.nccl` typechecking (#45344 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45344 Reviewed By: walterddr Differential Revision: D23935306 Pulled By: malfet fbshipit-source-id: dd09d4f8ff7a327131764487158675027a13bf69	2020-09-25 17:02:47 -07:00
Rong Rong	3eb31325fc	refactor torch/cuda/nccl.h to remove direct dependency on NCCL in libtorch_python (#42687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42687 Reviewed By: malfet Differential Revision: D23145834 Pulled By: walterddr fbshipit-source-id: c703a953a54a638852f6e5a1479ca95ae6a10529	2020-08-19 20:16:53 -07:00
Alban Desmaison	02ae9a1583	add TypeError to c10 and fix segfault in error checking in Tensor constructor (#40106 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40106 Differential Revision: D22137193 Pulled By: albanD fbshipit-source-id: 11d059263c00a834211f016bd9a9e18fdc0437ef	2020-06-22 13:42:44 -07:00
SsnL	d5236f8517	Avoid initializing unnecessary tensors in nccl.reduce (#39688 ) Summary: While working on https://github.com/pytorch/pytorch/issues/38911, I realized that `nccl.reduce` only needs a single output tensor, while our current implementation requires a list of output tensors. This, along with a TODO I fixed in reduce_add, should have some speed up for data parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39688 Differential Revision: D22034547 Pulled By: mrshenli fbshipit-source-id: e74d54d673ebbb062474b1bb5cc93a095a3a5f6c	2020-06-14 10:11:32 -07:00
Nikita Shulga	2458f6c63e	Move all nccl from torch_python to torch_cuda (#36193 ) Summary: Because `torch_python` is supposed to be thin wrapper around `torch` In this PR, all invocation of functions from nccl library are moved from python_nccl.cpp (which is part of torch_python) to nccl.cpp (which is part of torch_cuda) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36193 Test Plan: CI Differential Revision: D20930047 Pulled By: malfet fbshipit-source-id: 7f278610077df6ac5dc3471c1a1b5d51e653ef9c	2020-04-08 18:01:47 -07:00
Edward Yang	1111a6b810	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/29095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274 Differential Revision: D18762293 Pulled By: ezyang fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9	2019-12-02 12:19:58 -08:00
Mike Ruberry	eff4c4d7c1	Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL Test Plan: revert-hammer Differential Revision: D18301806 Original commit changeset: 03da6a26c41e fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39	2019-11-21 14:50:07 -08:00
Alan Du	f4b9690f2d	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095 ) Summary: Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions. Fixes https://github.com/pytorch/pytorch/issues/29065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095 Differential Revision: D18301806 Pulled By: ezyang fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a	2019-11-21 13:44:40 -08:00
Michael Carilli	314066bd74	Making torch/csrc/cuda nccl usage safe for nccl 2.5 (#29014 ) Summary: Thanks to AddyLaddy ptrblck for tracking this fix down. In torch/csrc/cuda/nccl.cpp and torch/csrc/cuda/python_nccl.cpp, construction of the `AutoNcclGroup` guard (which calls `ncclGroupStart()`) [precedes](https://github.com/pytorch/pytorch/pull/29014/files#diff-3b6a42619dd44000cf58c0328b679a1cL239-L241) a possible call to `get_communicators`, which may call `ncclCommInitAll()`. Calling `ncclCommInitAll()` within a `ncclGroupStart()/End()` is incorrect according to our Nccl people. It seemed ok (relevant tests were silently passing) as long as Pytorch was compiled/linked against Nccl 2.4.x (which is currently what's locked into your third_party/nccl subrepo). However, when we tried to compile and link against Nccl 2.5.x in internal builds, we began to see test hangs (TestAutogradDeviceTypeCUDA.test_unused_output_device_cuda was what initially brought it to our attention). The present PR fixes those hangs, as far as we know, and will prevent a nasty future surprise when you start building against nccl 2.5. The backend affected by this PR is exposed via https://github.com/pytorch/pytorch/blob/master/torch/cuda/nccl.py. I'm not sure if the exposure is actually used anywhere (I think the distributed frontend is now backed by ProcessGroupNCCL in torch/lib/c10d). So this PR may affect code that is already dead or dying, but still tested, it seems. I skimmed ProcessGroupNCCL.cpp for potential similar vulnerabilities and didn't spot anything obvious. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29014 Differential Revision: D18274799 Pulled By: ezyang fbshipit-source-id: c5f88cf187960d61736be14458be01e3675c6702	2019-11-01 13:53:31 -07:00
Pritam Damania	24242e86fa	Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (#27124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27124 ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL 2.4 to detect errors in NCCL communicators. These were used as part of ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were supported. Although, there is still legitimate use for older NCCL versions and hence we should still support those. For that purpose, in this change I've ensured we disable NCCL error checking for versions < 2.4. ghstack-source-id: 91452959 Test Plan: 1) Test with 2.4.8 2) Test with 2.2.13 3) unit tests. Differential Revision: D17178988 fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698	2019-10-07 17:39:32 -07:00
Jeff Daily	8bc28cc898	Remove cuda free mutex (#23040 ) Summary: Revision of https://github.com/pytorch/pytorch/issues/22173 to address CI failure after merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23040 Differential Revision: D16366872 Pulled By: mrshenli fbshipit-source-id: 747b6ecf2dc195c25f82b8f732ae9ff52cd3a394	2019-07-22 07:58:29 -07:00
Will Feng	a5e6586618	Revert D16357177: [pytorch][PR] Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. Differential Revision: D16357177 Original commit changeset: f4ca9cd46cc6 fbshipit-source-id: 49e66e7e59df6cbc7f5d847bacc07da134067956	2019-07-18 12:28:46 -07:00
Jeff Daily	29347cc9cf	Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. (#22173 ) Summary: There are two mutexes within CUDACachingAllocator that cause a deadlock. One of the mutexes was added in order to work around the issue of NCCL interacting poorly with cudaFree. See - `68ff58d771` - https://github.com/pytorch/pytorch/pull/880 As of NCCL version 2 and its new group start/end APIs, the protection surrounding cudaFree() is no longer needed. The PyTorch code was updated to use the NCCL2 group start/end API, but the corresponding cuda_free_mutex and its getter getFreeMutex() were not revised. This PR removes the use of the getFreeMutex() when NCCL2 is used by moving calls to getFreeMutex() into the AutoNcclGroup. That way, depending on the NCCL version used, we either use the mutex or we use the new group APIs. The race condition is as follows, thanks to skeelyamd: The deadlock occurs between hip_free_mutex (aka cuda_free_mutex in github) (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L165) and mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L162). hip_free_mutex is exported from THCCachingAllocator in getFreeMutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L660) and is acquired in ProcessGroupNCCL::collective (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L397), which then calls back into THCCachingAllocator via c10::cuda::CUDACachingAllocator::recordStream (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L416 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L655 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L379). At this point it acquires mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L384). This requires hip_free_mutex to be locked before mutex. However, in free_blocks (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L505) THCCachingAllocator locks hip_free_mutex. Free_blocks is called from emptyCache (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L328) which locks mutex. That requires mutex to be locked before hip_free_mutex. emptyCache and ProcessGroupNCCL::collective may not be executed concurrently but this is occurring and deadlocking the CPU. free_blocks is also called by malloc (via cuda_malloc_retry -> free_cached_blocks -> free_blocks) which also locks mutex first and so malloc must not execute concurrent with ProcessGroupNCCL::collective. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22173 Differential Revision: D16357177 Pulled By: pietern fbshipit-source-id: f4ca9cd46cc6d5e15290d99577d88be3f4fa8972	2019-07-18 07:31:02 -07:00
Will Feng	8cde4c4d22	Remove Variable::Impl and DifferentiableViewImpl (#17072 ) Summary: As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR: 1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class 2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()` 3. Remove `Variable.data()` API 3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history. After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't. Note that this PR is BC-breaking in the following use cases: Use Case 1: Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type. Use Case 2: If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example: ```python params = torch.tensor([1.5, 1.5]).requires_grad_() with torch.no_grad(): # Change gradient to a sparse tensor params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.])) grad_saved = params.grad params.backward(torch.tensor([1.5, 1.5])) assert id(grad_saved) == id(params.grad) # This will fail after this PR ``` The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072 Differential Revision: D14075257 Pulled By: yf225 fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957	2019-05-23 21:09:04 -07:00
Roy Li	65b00aa597	Remove some simple use cases of Type::ScalarType() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17529 Reviewed By: ezyang Differential Revision: D14237932 fbshipit-source-id: be633a1fc19215d53cfe083fdd7196acf2b7dd2f	2019-03-08 16:42:05 -08:00
Mikhail Zolotukhin	1905bbb01d	Include ATen/core/functional.h directly instead of torch/csrc/utils/functional.h. (#16377 ) Summary: One more shim removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16377 Differential Revision: D13821816 Pulled By: ZolotukhinM fbshipit-source-id: 007f014d404de51841437db7eef28367a2f6e46b	2019-01-30 14:02:34 -08:00
Edward Yang	e936a69085	Move THCCachingAllocator to c10_cuda. (#16119 ) Summary: Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119 Reviewed By: smessmer Differential Revision: D13718768 fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656	2019-01-24 12:06:56 -08:00
Edward Yang	24b50f1411	Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117 This means I can move it to c10_cuda with minimal fuss. Reviewed By: smessmer Differential Revision: D13717836 fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00	2019-01-24 12:06:54 -08:00
Edward Yang	2d485ffb17	Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248 This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current behavior of pretending CUDA is HIP when you build with ROCm enabled. Reviewed By: bddppq Differential Revision: D13145293 fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02	2018-12-12 11:24:26 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Edward Yang	0aaff5eaf9	Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275 This resulted in a bunch of knock-on changes, which I will now describe: - s/original_index/original_device/ - s/last_index/last_device/ - A bunch of places that used set_index, now use CUDAGuard (which does have set_index) because they were CUDA-specific code. Major caveat: DeviceGuard doesn't actually work non-CUDA/CPU devices, To make that happen, I plan on totally replacing the implementation of DeviceGuard; what I mostly care about here is wrangling the API into an acceptable state. Reviewed By: gchanan Differential Revision: D12832080 fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d	2018-10-31 07:55:13 -07:00
Edward Yang	1a4473bbd7	Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125 Previously, it returned a vector of THCStream*, which we eventually turned into CUDAStream. No need to spatter the conversion code everywhere: just do it correctly to begin with. An important side effect of doing it this way is that we no longer pass nullptr to CUDAStream; instead, we create the default stream. I will rely on this in a later patch. Reviewed By: gchanan Differential Revision: D10853224 fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a	2018-10-29 08:27:23 -07:00
Teng Li	b4d0dc77be	Eliminate CUDAStream nullptr in NCCL (#13089 ) Summary: As the title says, we should always use the current stream on device in NCCL. This can unblock ezyang on his further work Pull Request resolved: https://github.com/pytorch/pytorch/pull/13089 Reviewed By: ezyang Differential Revision: D10847172 Pulled By: teng-li fbshipit-source-id: 7fc7c4248b5efa1971d2af4d43f62d3379debfe4	2018-10-24 20:04:41 -07:00
Teng Li	8d3e7e2fcb	Move DDP queue_reduction to C++ (#12852 ) Summary: fully working version by using continuing on goldsborough 's initial version. waiting on the stream guard to be merged before adding more stream perf logics into the c++ version Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852 Differential Revision: D10468696 Pulled By: teng-li fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f	2018-10-22 16:07:46 -07:00
Edward Yang	1d399a80a0	Handle pollution of MAX, MIN and CHECK macros. (#11805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805 Some of our headers in Caffe2 pollute the macro namespace with things like MAX, MIN, CHECK, so I renamed these in places where this is a problem. This patch courtesy of gchanan, extracted out of #11721 Reviewed By: Yangqing Differential Revision: D9917757 fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c	2018-09-18 13:18:31 -07:00
Peter Goldsborough	7ddc6f84c4	NULL -> nullptr (#11047 ) Summary: How did we get so many uses of `NULL` again? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047 Differential Revision: D9566799 Pulled By: goldsborough fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3	2018-08-30 16:25:42 -07:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Xiaodong Wang	922adf8d09	Skip calling ncclCommDestroy in destructor (#8352 ) There is a bug in NCCL that causing seg faults while calling ncclCommDestroy() in the destructor during program exit. According to Nvidia, "Whether the NCCL destructor will be called before or after the CUDA runtime destructor is undefined, which can lead to crashes." For the immediate workaround, skip calling ncclCommDestroy ihe NCCL destructor. This is UGLY and we'll follow up with Nvidia to solve this ASAP.	2018-06-12 13:11:09 -04:00
Edward Z. Yang	4caea64d72	Make all of TH and THC C++. (#6913 ) Changelist: - Move .c to .cpp - Change includes of ".c" to ".cpp" - A bunch of cmake configuration modifying CMAKE_C_FLAGS changed to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code - Explicitly cast void* to T* in a number of places - Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C") - Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct" - Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor) - Documentation about how the TH C interface (and extern C business) works - Note that THD master_worker mode is dead - C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h) - New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly - New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero. - In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions - Some extra Werror fixes (char* versus const char*)	2018-04-28 07:45:02 -04:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Sam Gross	93f49667d0	Allow Variables in calls to NCCL bindings. (#4725 ) The Tensor and Variable classes are being merged in Python. This means that all interfaces to C++ must accept Variables where they previously accepted Tensors.	2018-01-18 15:25:41 -05:00
Sam Gross	23fc2b7e06	Define CHECK in torch/csrc/cuda/nccl.h (#4721 ) The CHECK function was used but not defined in the nccl.h header file.	2018-01-18 13:08:06 -05:00

1 2

52 Commits