pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jeff Daily	b7391f44df	cast return of cudaGetLastError() to void when discarding (#62518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62511. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518 Reviewed By: walterddr, janeyx99 Differential Revision: D30029858 Pulled By: malfet fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d	2021-08-03 11:17:22 -07:00
Jeff Daily	15210f3b82	ignore and clear not ready errors (#61554 ) Summary: Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554 Reviewed By: mrshenli Differential Revision: D29763973 Pulled By: ezyang fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9	2021-07-19 16:03:04 -07:00
Christy Lee	b8dca04f73	Add error message if CUDA startup fails (#29670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670 This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included. Test Plan: Build without gpu code. Run the binary. Check that the new error message exists. Reviewed By: yfeldblum Differential Revision: D18453798 fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e	2019-11-13 16:48:40 -08:00
Edward Yang	1e6acc676f	Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623 Despite it's generic sounding name, caffe2::DeviceGuard actually only worked on CUDA devices. Rename it to something that more clearly spells out its applicability. I'm not sure if it's the right call, but in this patch I added 'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more in-line with how the Caffe2 codebase is currently written. More idiomatic c10 namespace style would be to say cuda::CUDAGuard. Willing to change this if people shout. This is a respin of D13156470 (#14284) Reviewed By: dzhulgakov Differential Revision: D14285504 fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d	2019-03-06 10:48:15 -08:00
Junjie Bai	883da952be	Hipify caffe2/core (#13148 ) Summary: petrex ashishfarmer iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148 Reviewed By: xw285cornell Differential Revision: D10862276 Pulled By: bddppq fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4	2018-10-26 15:27:32 -07:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Jerry Zhang	9f4bcdf075	caffe2::DeviceType -> at::DeviceType (#11254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254 Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h: ``` template <int d> struct EventCreateFunctionRegisterer { explicit EventCreateFunctionRegisterer(EventCreateFunction f) { static_assert(d < MaxDeviceTypes, ""); Event::event_creator_[d] = f; } }; ``` at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example): 1. caffe2::DeviceType → caffe2::DeviceTypeProto 2. caffe2::CPU → caffe2::PROTO_CPU 3. caffe2::DeviceType = at::DeviceType 4. caffe2::CPU = at::DeviceType::CPU codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_' + some manual changes In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU. In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later. Reviewed By: ezyang Differential Revision: D9545704 fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7	2018-09-05 16:28:09 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Orion Reblitz-Richardson	c18f9b4dea	Back out "[codemod] - comment out unused parameters" Original commit changeset: 8e10b1f1e2ae @allow-large-files	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	7e9f8af018	[codemod] - comment out unused parameters	2018-02-26 10:26:25 -08:00
Ilia Cherniavskii	38f166c13a	Async executor with less polling Summary: Async executor based on async_polling (D5985110): - Tasks scheduling other tasks, using polling only when necessary (e.g. CUDA->CPU case) - Fully async, i.e. RunAsync immediately returns Reviewed By: azzolini Differential Revision: D6281681 fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44	2017-11-28 18:50:32 -08:00
Ilia Cherniavskii	1149b9bbb5	Polling async net executor Summary: Implementation of polling async net executor. Notes: - New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread - Events: update to Caffe2 events to support async CPU events, adding new methods: Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message - Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain - Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event - Scheduling of CPU ops: using global thread pools - Scheduling of GPU ops: using GPU thread pool per GPU device Reviewed By: dzhulgakov Differential Revision: D5985110 fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c	2017-11-03 07:27:44 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	26f0943130	Do CaffeCudaSetDevice and CaffeCudaGetDevice Summary: These are wrapper functions so that if we run in a Caffe2-only mode, we can turn the flag on and get some small speedup on cuda device switches. The purpose of the diff is to allow us to quickly assess the overhead of cuda device switch functions. Ideally, the caching behavior shall live in the cuda driver, which is the only safe place to ensure correctness. If other code is running aside Caffe2 and does not properly do device guard, this functionality will fail as separate cudaSetDevice() calls will not update Caffe2's thread local device id. As a result, the functionality is only enabled when/if one explicitly sets the flag. This might not be safe, so use with caution. - cudaGetDevice can go from 90ns to 2ns - when setting the same device, we can go from 100ns to 2 ns - when setting a different device, things are the same (1ns overhead on top of 143ns) Reviewed By: azzolini Differential Revision: D5709398 fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e	2017-08-25 18:20:14 -07:00
Yangqing Jia	0b363fd9de	Add event as a first-class citizen of the OperatorBase interface. Summary: This adds Event as a new member object to OperatorBase, hence allowing us to do async computation more easily. Will send a fix for proper RunAsync() for SimpleNet. In principle this should have no functionality change yet - the only difference is that async_dag net now delegates to the operators for holding the event objects. Reviewed By: harouwu Differential Revision: D5668627 fbshipit-source-id: 55f994074be6b85d6c66f09795dcbe2b93aba300	2017-08-21 13:30:53 -07:00
Yangqing Jia	5d24a4eeef	Early design for a general Event abstraction cross-devices. Summary: There are ad-hoc efforts on avoiding excessive device synchronizations, such as async_dag, singlethread_async, etc. This diff aims to provide an early design for a general Event class, that can achieve the following: (1) It is device agnostic, essentially using a vtable to do cross device record, wait and synchronization. (2) Created new functions WaitEvent and Record in the Context class for interacting with Events. (3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase class as well. An example use case is that, after potential future refactoring, one can achieve a real async execution per operator by running op.WaitEvent(previous_event); op.RunAsync(); op.RecordEvent(this_op_event); and the next op can do next_op.WaitEvent(this_op_event); Right now, I changed async_dag net implementation so that it uses the general event design. The old Event class is assimilated to the general Event class and the old Stream class is now essentially taken over by the Context class itself. Reviewed By: harouwu Differential Revision: D5648463 fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc	2017-08-18 15:46:51 -07:00

20 Commits