Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623
Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices. Rename it to something that more
clearly spells out its applicability.
I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written. More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.
This is a respin of D13156470 (#14284)
Reviewed By: dzhulgakov
Differential Revision: D14285504
fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692
It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed.
Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose.
Reviewed By: Yangqing
Differential Revision: D13571988
fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029
In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr.
Reviewed By: ezyang
Differential Revision: D9889990
fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4
Summary:
Original commit changeset: f5614a5d2607
D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.
Reviewed By: orionr
Differential Revision: D10123245
fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.
This is a codemod by mechanically doing the following change:
CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019
Reviewed By: ezyang, teng-li
Differential Revision: D10016276
Pulled By: Yangqing
fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
Summary:
Add CUDAContext::cudnn_handle() for easier integration of single
cudnn routines into operators without requiring the weight
of CuDNNWrapper or similar, or needing to spin out a separate CuDNN*Op
version of an operator.
It was necessary to split out the cuDNN wrapper code from the base cuDNN helpers in order to resolve a circular dependency between context_gpu.h and common_cudnn.h when handles and cuDNN `#define` were added.
Closes https://github.com/caffe2/caffe2/pull/1376
Reviewed By: pietern
Differential Revision: D6162034
Pulled By: akyrola
fbshipit-source-id: 95687e55b3e1e921e1f5e0f016f43b586f5f3350