pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	c31fcdaa4f	[3/N] Add -Wdeprecated and related fixes (#109698 ) This PR follows #108626. Hopefully we can enable the warning in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-03 22:50:53 +00:00
Aaron Gokaslan	a34a9c3471	Perf: Apply more clang-tidy fixups to torch headers (#91445 ) Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup This is PR fixes 3 main issues: 1. Use emplacement more in headers 1. Avoid unnecessary copies and use const ref when possible 1. Default any special functions when possible to make them potentially trivial and more readable. 1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445 Approved by: https://github.com/ezyang	2022-12-29 23:43:45 +00:00
Richard Barnes	72e4aab74b	Eliminate unused parameters in PyTorch (#73749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73749 Unused parameters cause compiler warnings which distract from real issues. Let's remove unused parameters! Test Plan: Sandcastle Reviewed By: swolchok, ngimel Differential Revision: D34567731 fbshipit-source-id: 2e42301a29a8e1014ac8ab429588bb773db58850 (cherry picked from commit 3eda4743991328d532194efd0fe3d127a294343d)	2022-03-04 02:31:37 +00:00
Luca Wehrstedt	e7cccc23b9	Add query and synchronize to c10::Stream (#59560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59560 `at::cuda::CUDAStream` has the `query` and `synchronize` methods, but `c10::Stream` does not, and I couldn't find any generic way to accomplish this. Hence I added helpers to do this to the DeviceGuardImpl interface, and then defined these methods on `c10::Stream`. (I had to do it out-of-line to circumvent a circular dependency). ghstack-source-id: 130932249 Test Plan: CI Reviewed By: ezyang Differential Revision: D28931377 fbshipit-source-id: cd0c19cf021e305d0c0cf9af364afb445d010248	2021-06-10 01:42:40 -07:00
Luca Wehrstedt	0c3e79b5b9	Rename DeviceGuardImplInteface's getStreamFromPool method (#57345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57345 Already back in https://github.com/pytorch/pytorch/pull/57046 we realized that calling this method `getStreamFromPool` could cause issues because that name gets HIPified and thus in some callsites we'd end up calling a method that doesn't exist. In the end we got away with it because the places where we were calling that method weren't HIPified. However in the next PR we'll use this method inside RPC, and that will start causing problems, hence here I rename it to something that should not cause conflicts. This is a private API (since it's inside `impl`) thus there's no backwards compatibility concerns. ghstack-source-id: 127916484 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28114923 fbshipit-source-id: e027ad08a8e02090c08c6407c2db5a7fde104812	2021-05-01 16:12:53 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Luca Wehrstedt	ea64c90ecc	Add recordDataPtrOnStream to DeviceGuardImplInterface (#57047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57047 We intend to merge CUDAFuture into ivalue::Future by using DeviceGuardImplInterface to avoid explicitly referring to CUDA. For that we need to add two methods to DeviceGuardImplInterface. In this PR, we add a method to record a DataPtr onto a stream with the caching allocator. ghstack-source-id: 127713135 (Note: this ignores all push blocking failures!) Test Plan: Used later in this stack Reviewed By: ezyang Differential Revision: D28029161 fbshipit-source-id: ff337ab8ccc98437b5594b2f263476baa1ae93e7	2021-04-29 09:31:43 -07:00
Luca Wehrstedt	6fdf092cad	Add getStreamFromPool to DeviceGuardImplInterface (#57046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57046 We intend to merge CUDAFuture into ivalue::Future by using DeviceGuardImplInterface to avoid explicitly referring to CUDA. For that we need to add two methods to DeviceGuardImplInterface. In this PR, we add a method to get a stream from the global ATen pool. ghstack-source-id: 127713137 (Note: this ignores all push blocking failures!) Test Plan: Used later in this stack Reviewed By: ezyang Differential Revision: D28029159 fbshipit-source-id: 5055d84c1f3c2a4d86442f3149455c5ebd976dea	2021-04-29 09:30:41 -07:00
Edward Yang	fd3004d3ee	Add NoOpDeviceGuardImpl (#53142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53142 It turns out to make Meta a device I need to substantively reuse the CPUGuardImpl implementation. It's pretty parametrizable so just move this over to DeviceGuardImplInterface templated over the DeviceType. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: anjali411, samestep Differential Revision: D26763553 Pulled By: ezyang fbshipit-source-id: 464fb3e3a72ba7c55a12adffe01c18171ce3e857	2021-03-03 11:24:08 -08:00
Samuel Marks	8aad66a7bd	[c10/**] Fix typos (#49815 ) Summary: All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815 Reviewed By: VitalyFedyunin Differential Revision: D25734507 Pulled By: mruberry fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd	2021-01-01 02:11:56 -08:00
Scott Wolchok	4c9eb57914	[PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023 DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know. ghstack-source-id: 116901430 Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect Reviewed By: dzhulgakov Differential Revision: D24605460 fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2	2020-11-18 19:39:40 -08:00
Xiaodong Wang	2fbe5971b3	[pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * https://github.com/pytorch/pytorch/pull/44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * https://github.com/pytorch/pytorch/pull/44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Reviewed By: ezyang Differential Revision: D23972356 fbshipit-source-id: ab91fbbfccb6389052de216f95cf9a8265445aea	2020-10-05 22:37:47 -07:00
Edward Yang	b141754b7f	Give a better error message when people accidentally use unsupported devices (#29409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29409 Fixes #27875 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18396828 Pulled By: ezyang fbshipit-source-id: 3f53cbbe620cd3445852273be90ff5744aa7a8cb	2019-11-11 08:10:53 -08:00
Mike Ruberry	87a2c92615	Updates autograd engine to respect streams set in forward (#8354 ) Summary: This PR addresses issue https://github.com/pytorch/pytorch/issues/7601. Currently models that use streams explicitly in forward have to do a lot of extra work to make backwards respect those streams. This PR extends the (recently added) input tracing (see TypeAndShape) to record the devices and streams of inputs. The autograd engine then uses this metadata to enact the expected stream parallelism without extra work from the user. For example, a model with forward declared like (original example courtesy of ngimel): ``` def forward(self,x): x0 = x.clone() torch._C._cuda_setStream(self.stream1._cdata) y0 = self.fc1(x0) self.event1.record(stream = torch.cuda.current_stream()) torch._C._cuda_setStream(self.stream2._cdata) y1 = self.fc2(x) self.event2.record(stream = torch.cuda.current_stream()) self.stream2.wait_event(self.event1) return y0 + y1 ``` currently will backward on a single stream. With this change the kernels will go on the streams they are assigned in forward and both forward and backward will (for appropriate sizes) run the fc1 and fc2 kernels simultaneously. The crux of this change is, as mentioned, an expansion of the TypeAndShape tracing and a relatively simple change to the autograd engine to use cuda events for stream synchronization. To make this efficient I also added a new AutoGPUAndStream class, exposed getting and setting streams on devices, and removed InputBuffer's AutoGPU (it's now redundant). While making these modifications I also fixed AutoGPU to check before setting the GPU when it's destroyed and to use THCudaCheck instead of its custom error handler. These changes mean that an often excessive cudaSetDevice() is not being called when inputs are added to a buffer. In addition to allowing users to easily set and use streams that are respected in both forward and backward, this change may encourage modules to do the same and the expanded tracing might allow further optimizations in the autograd engine. (apaszke, for example, now after initial enumeration we know the number of devices that will be used by a graph task, which might help provide a sense of the "level of parallelism" we should expect.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/8354 Test Plan: Two tests were added specifically for this behavior. Differential Revision: D17275980 Pulled By: mruberry fbshipit-source-id: 92bd50ac782ffa973b159fcbbadb7a083802e45d	2019-09-10 23:46:51 -07:00
Mike Ruberry	a024e1e091	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Test Plan: Two tests were added to cuda_stream_test for this functionality. Differential Revision: D17145538 Pulled By: mruberry fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455	2019-09-01 12:37:52 -07:00
Edward Yang	529bb859b2	Revert D17052534: [pytorch][PR] Creates Torch-friendly Event class and adds Stream tracking to autograd Test Plan: revert-hammer Differential Revision: D17052534 Original commit changeset: d91b308ad0f7 fbshipit-source-id: dacc7e70a835a8fa6ae71246999b4eff3383f3f3	2019-08-28 08:24:43 -07:00
Mike Ruberry	433fe47d95	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Differential Revision: D17052534 Pulled By: mruberry fbshipit-source-id: d91b308ad0f730646bb7b3492a601cd9b05c72d8	2019-08-26 15:19:06 -07:00
Edward Yang	515238e0a5	Unify cudaGetDeviceCount implementations. (#18445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445 ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18445 Unify cudaGetDeviceCount implementations. I went about doing this by searching for calls to cudaGetDeviceCount, and then methodically replacing them with references to c10::cuda::device_count() or at::cuda::device_count(). There is a point to doing this: the various implementations wildly differed in their handling of what to do when cudaGetDeviceCount returns an error. The final standardized behavior is that all errors are swallowed and we return device count of zero. This indirectly fixes running CUDA builds on CPU, which was broken in #17847. I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl. This is a BC-breaking change for anyone inheriting from DeviceGuardImpl but all you need to do is put 'noexcept' on your method and it is backwards compatible with older libtorch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14612189 fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb	2019-03-26 09:50:14 -07:00
Davide Libenzi	272a48f6fe	Enable autograd to recognize the XLA backend as one providing multiple devices (#17847 ) Summary: …e devices, while not being CUDA/HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17847 Differential Revision: D14545634 Pulled By: ezyang fbshipit-source-id: 417181bf2ff4f8978544afe2fb6b042e787854ed	2019-03-20 13:58:36 -07:00
Sebastian Messmer	d408324350	Move files to/from c10/core and c10/util (#15316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316 This starts cleaning up the files in c10 according to the module structure we decided on. Move to c10/util: - Half.h, Half-inl.h, Half.cpp, bitcasts.h Move to c10/core: - Device.h, Device.cpp - DeviceType.h, DeviceType.cpp i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov Differential Revision: D13498493 fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63	2019-01-10 16:22:22 -08:00

20 Commits