Commit Graph

20 Commits

Author SHA1 Message Date
Jeff Daily
b7391f44df cast return of cudaGetLastError() to void when discarding (#62518)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62511.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518

Reviewed By: walterddr, janeyx99

Differential Revision: D30029858

Pulled By: malfet

fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d
2021-08-03 11:17:22 -07:00
Jeff Daily
15210f3b82 ignore and clear not ready errors (#61554)
Summary:
Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554

Reviewed By: mrshenli

Differential Revision: D29763973

Pulled By: ezyang

fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9
2021-07-19 16:03:04 -07:00
Christy Lee
b8dca04f73 Add error message if CUDA startup fails (#29670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670

This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included.

Test Plan: Build without gpu code.  Run the binary.  Check that the new error message exists.

Reviewed By: yfeldblum

Differential Revision: D18453798

fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e
2019-11-13 16:48:40 -08:00
Edward Yang
1e6acc676f Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623

Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices.  Rename it to something that more
clearly spells out its applicability.

I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written.  More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.

This is a respin of D13156470 (#14284)

Reviewed By: dzhulgakov

Differential Revision: D14285504

fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d
2019-03-06 10:48:15 -08:00
Junjie Bai
883da952be Hipify caffe2/core (#13148)
Summary:
petrex ashishfarmer iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148

Reviewed By: xw285cornell

Differential Revision: D10862276

Pulled By: bddppq

fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4
2018-10-26 15:27:32 -07:00
Junjie Bai
f54ab540af Rename cuda_gpu_id to device_id in DeviceOption (#12456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456

codemod with 'Yes to all'
codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format

Reviewed By: Yangqing

Differential Revision: D10240535

fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25
2018-10-09 15:54:04 -07:00
Junjie Bai
ff608a9ff3 Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232

Original commit changeset: fca91fea58b7

This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396

Reviewed By: jerryzh168

Differential Revision: D10132473

fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b
2018-10-01 21:54:52 -07:00
Rick Ratmansky
3010dc4208 Revert D10123245: Back out "codemod cuda_gpu_id to device_id"
Differential Revision:
D10123245

Original commit changeset: d83da8e00a12

fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b
2018-10-01 12:22:36 -07:00
Yang Liu
7d7d336c45 Back out "codemod cuda_gpu_id to device_id"
Summary:
Original commit changeset: f5614a5d2607

D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.

Reviewed By: orionr

Differential Revision: D10123245

fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
2018-10-01 11:31:14 -07:00
Junjie Bai
3eb5940cf5 codemod cuda_gpu_id to device_id (#12022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022

codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

codemod with 'Yes to all'

Reviewed By: orionr

Differential Revision: D9986213

fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1
2018-09-27 20:24:53 -07:00
Jerry Zhang
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Orion Reblitz-Richardson
c18f9b4dea Back out "[codemod] - comment out unused parameters"
Original commit changeset: 8e10b1f1e2ae

@allow-large-files
2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson
7e9f8af018 [codemod] - comment out unused parameters 2018-02-26 10:26:25 -08:00
Ilia Cherniavskii
38f166c13a Async executor with less polling
Summary:
Async executor based on async_polling (D5985110):
- Tasks scheduling other tasks, using polling only when necessary (e.g.
  CUDA->CPU case)
- Fully async, i.e. RunAsync immediately returns

Reviewed By: azzolini

Differential Revision: D6281681

fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44
2017-11-28 18:50:32 -08:00
Ilia Cherniavskii
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Yangqing Jia
26f0943130 Do CaffeCudaSetDevice and CaffeCudaGetDevice
Summary:
These are wrapper functions so that if we run in a Caffe2-only mode, we can
turn the flag on and get some small speedup on cuda device switches.

The purpose of the diff is to allow us to quickly assess the overhead of cuda
device switch functions. Ideally, the caching behavior shall live in the cuda
driver, which is the only safe place to ensure correctness.

If other code is running aside Caffe2 and does not properly do device guard,
this functionality will fail as separate cudaSetDevice() calls will not update
Caffe2's thread local device id. As a result, the functionality is only enabled
when/if one explicitly sets the flag.

This might not be safe, so use with caution.

- cudaGetDevice can go from 90ns to 2ns
- when setting the same device, we can go from 100ns to 2 ns
- when setting a different device, things are the same (1ns overhead on top of 143ns)

Reviewed By: azzolini

Differential Revision: D5709398

fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e
2017-08-25 18:20:14 -07:00
Yangqing Jia
0b363fd9de Add event as a first-class citizen of the OperatorBase interface.
Summary:
This adds Event as a new member object to OperatorBase, hence allowing us to do
async computation more easily. Will send a fix for proper RunAsync() for
SimpleNet.

In principle this should have no functionality change yet - the only difference
is that async_dag net now delegates to the operators for holding the event
objects.

Reviewed By: harouwu

Differential Revision: D5668627

fbshipit-source-id: 55f994074be6b85d6c66f09795dcbe2b93aba300
2017-08-21 13:30:53 -07:00
Yangqing Jia
5d24a4eeef Early design for a general Event abstraction cross-devices.
Summary:
There are ad-hoc efforts on avoiding excessive device synchronizations, such as
async_dag, singlethread_async, etc. This diff aims to provide an early design
for a general Event class, that can achieve the following:

(1) It is device agnostic, essentially using a vtable to do cross device record,
wait and synchronization.
(2) Created new functions WaitEvent and Record in the Context class for
interacting with Events.
(3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase
class as well.

An example use case is that, after potential future refactoring, one can achieve
a real async execution per operator by running

op.WaitEvent(previous_event);
op.RunAsync();
op.RecordEvent(this_op_event);

and the next op can do

next_op.WaitEvent(this_op_event);

Right now, I changed async_dag net implementation so that it uses the general
event design. The old Event class is assimilated to the general Event class and
the old Stream class is now essentially taken over by the Context class itself.

Reviewed By: harouwu

Differential Revision: D5648463

fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc
2017-08-18 15:46:51 -07:00