pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	fb10e13000	[Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} (#120781 ) This PR begins to clean clang-tidy warnings of code in c10/cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120781 Approved by: https://github.com/ezyang	2024-03-15 05:03:22 +00:00
Nikita Shulga	53e32d12c4	[c10] Use nested namespace in c10/cuda (#116464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464 Approved by: https://github.com/Skylion007	2023-12-27 23:14:00 +00:00
cyy	f9bf104c64	[2/N] Fixes clang-tidy warnings in header files (#113727 ) This PR fixes more clang-tidy warnings in common headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113727 Approved by: https://github.com/Skylion007	2023-11-16 13:21:15 +00:00
Jeff Daily	50e3f9cbbb	[ROCm] HIP stream priority fix post #101956 (#106157 ) PR #101956 introduced additional stream priorities for cuda streams. HIP streams have slightly different semantics. - HIP: 1=low, 0=default, -1=high - CUDA: 0=default, -1=high, -2=higher, etc. This PR forces HIP stream priority to just 0 and -1 to match the pytorch semantics. This fixes a broken unit test. ``` python3 test_cuda_multigpu.py TestCudaMultiGPU.test_streams_priority -v Test results will be stored in test-reports/python-unittest/test_cuda_multigpu Running tests... ---------------------------------------------------------------------- test_streams_priority (__main__.TestCudaMultiGPU) ... ERROR (0.200s) ====================================================================== ERROR [0.200s]: test_streams_priority (__main__.TestCudaMultiGPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 2354, in wrapper method(args, *kwargs) File "test_cuda_multigpu.py", line 656, in test_streams_priority low, high = torch.cuda.Stream.priority_range() RuntimeError: least_priority == 0 INTERNAL ASSERT FAILED at "/var/lib/jenkins/pytorch-upstream/c10/hip/HIPStream.h":184, please report a bug to PyTorch. Unexpected HIP stream priority range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106157 Approved by: https://github.com/malfet	2023-07-31 16:57:20 +00:00
Natalia Gimelshein	ecd79b1fef	add additional stream priority for cuda streams (#101956 ) Changes the StreamID encoding to use the last bit to distinguish between external and internal streams, 4 bits for IdType (DEFAULT, EXT or user-created streams possibly with high priority), and 5 bits for index. This allows us to have more stream priorities exposed to user (I'm currently setting 4, but that's easy to change now). Note, we are pre-creating all 32 streams in the pool per each allowed priority, I don't know if it's a problem in practice. Currently cuda 11.8/A100 GPUs allow 6 different stream priorities, the number may be different for the different cards/different cuda versions. Previous callsites explicitly requesting high prioity stream (`isHighPriority=true`) are now getting the highest priority stream. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101956 Approved by: https://github.com/ezyang	2023-05-27 02:36:16 +00:00
PyTorch MergeBot	6c9b94dcda	Revert "add additional stream priority for cuda streams (#101956 )" This reverts commit `5da497cabb`. Reverted https://github.com/pytorch/pytorch/pull/101956 on behalf of https://github.com/osalpekar due to Broke internal builds that used -Wunused-function since this PR removed the call to StreamIdType::<< ([comment](https://github.com/pytorch/pytorch/pull/101956#issuecomment-1563875493))	2023-05-26 06:35:23 +00:00
Natalia Gimelshein	5da497cabb	add additional stream priority for cuda streams (#101956 ) Changes the StreamID encoding to use the last bit to distinguish between external and internal streams, 4 bits for IdType (DEFAULT, EXT or user-created streams possibly with high priority), and 5 bits for index. This allows us to have more stream priorities exposed to user (I'm currently setting 4, but that's easy to change now). Note, we are pre-creating all 32 streams in the pool per each allowed priority, I don't know if it's a problem in practice. Currently cuda 11.8/A100 GPUs allow 6 different stream priorities, the number may be different for the different cards/different cuda versions. Previous callsites explicitly requesting high prioity stream (`isHighPriority=true`) are now getting the highest priority stream. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101956 Approved by: https://github.com/ezyang	2023-05-24 23:26:47 +00:00
cyy	3c6bc58f63	use C10_API in libc10.so (#94171 ) MSVC emits several C4273 warning when compiling c10. I think the offending files should use C10_API instead of TORCH_API. If the tests pass, the changes should be safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94171 Approved by: https://github.com/Skylion007	2023-02-06 20:16:22 +00:00
cyy	bfe5e1258b	avoid unnecessary static_cast (#93898 ) avoid unnecessary static_cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898 Approved by: https://github.com/Skylion007	2023-02-03 03:44:43 +00:00
Eddie Yan	e096d2db5a	[BC-Breaking] Separate `stream_id`, `device_index`, and `device_type` in `pack` and `unpack` for `Streams` (#81596 ) #75854 A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`. Stills needs sanity checks, testing, and minimization of BC-breaking changes. Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596 Approved by: https://github.com/ezyang	2023-01-12 14:16:49 +00:00
Richard Barnes	2793cf85ec	Check all CUDA API calls for errors in caffe2/c10/ (#74918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74918 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D35194795 fbshipit-source-id: 8490e5497c37bab0055925ed520c2fd0c37a554c (cherry picked from commit 52697ab670e2f53c580cfd4ca82c5468ed3bb06c)	2022-03-30 17:13:02 +00:00
Jeff Daily	b7391f44df	cast return of cudaGetLastError() to void when discarding (#62518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62511. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518 Reviewed By: walterddr, janeyx99 Differential Revision: D30029858 Pulled By: malfet fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d	2021-08-03 11:17:22 -07:00
Natalia Gimelshein	6284d2a82b	wrap cudaStreamSynchronize calls (#61889 ) Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a	2021-07-21 19:30:52 -07:00
Jeff Daily	15210f3b82	ignore and clear not ready errors (#61554 ) Summary: Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554 Reviewed By: mrshenli Differential Revision: D29763973 Pulled By: ezyang fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9	2021-07-19 16:03:04 -07:00
Emilio Castillo	f9ec86a6c6	External stream (#59527 ) Summary: Previous is https://github.com/pytorch/pytorch/issues/57781 We add now two CUDA bindings to avoid using ctypes to fix a windows issue. However, we use ctypes to allocate the stream and create its pointer (we can do this with a 0-dim tensor too if it feels better). CC. ezyang rgommers ngimel mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/59527 Reviewed By: albanD Differential Revision: D29053062 Pulled By: ezyang fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f	2021-06-14 13:46:11 -07:00
Rong Rong (AI Infra)	689a5edd0a	Revert D28326365: [pytorch][PR] Add `torch.cuda.streams.ExternalStream` Test Plan: revert-hammer Differential Revision: D28326365 (`d7ef9b73fb`) Original commit changeset: b67858c80339 fbshipit-source-id: 337588d40b96cf04e46e554fa481ae7fd4254478	2021-06-04 11:19:36 -07:00
Emilio Castillo	d7ef9b73fb	Add `torch.cuda.streams.ExternalStream` (#57781 ) Summary: This is required in https://github.com/pytorch/pytorch/pull/57110#issuecomment-828357947 We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: https://github.com/pytorch/pytorch/pull/57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91	2021-06-04 08:47:09 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Samuel Marks	8aad66a7bd	[c10/**] Fix typos (#49815 ) Summary: All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815 Reviewed By: VitalyFedyunin Differential Revision: D25734507 Pulled By: mruberry fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd	2021-01-01 02:11:56 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
Jeff Daily	6906701bde	[ROCm] enable stream priorities (#47136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47136 Reviewed By: mruberry Differential Revision: D24672457 Pulled By: ngimel fbshipit-source-id: 54f60c32df87cbd40fccd7fb1ecf0437905f01a3	2020-11-02 11:25:44 -08:00
Gao, Xiang	5a4911834d	Add CUDA11 build and test (#40452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40452 Differential Revision: D22316007 Pulled By: malfet fbshipit-source-id: 94f4b4ba2a46ff3d3042ba842a615f8392cdc350	2020-06-30 13:50:44 -07:00
Edward Yang	73a97387c1	Replace AT_CHECK with TORCH_CHECK [shard 9/10] Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435 Reviewed By: jerryzh168 Differential Revision: D15318877 fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1	2019-05-15 08:05:59 -07:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Sebastian Messmer	d408324350	Move files to/from c10/core and c10/util (#15316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316 This starts cleaning up the files in c10 according to the module structure we decided on. Move to c10/util: - Half.h, Half-inl.h, Half.cpp, bitcasts.h Move to c10/core: - Device.h, Device.cpp - DeviceType.h, DeviceType.cpp i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov Differential Revision: D13498493 fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63	2019-01-10 16:22:22 -08:00
Shen Li	99d2743863	Move Stream.query() implementation down to C++ (#15737 ) Summary: See #15682 Pushing up this small PR to check if I am doing the right thing. If correct, more will follow for other Stream APIs. Questions will be added inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15737 Differential Revision: D13581400 Pulled By: mrshenli fbshipit-source-id: 24afed7847b89b62f0692c79a101ec7ff9d9ee4d	2019-01-07 20:58:07 -08:00
Edward Yang	2d485ffb17	Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248 This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current behavior of pretending CUDA is HIP when you build with ROCm enabled. Reviewed By: bddppq Differential Revision: D13145293 fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02	2018-12-12 11:24:26 -08:00

27 Commits