pytorch/torch/csrc/cuda/python_comm.cpp
Mike Ruberry 1003ccfa15 Creates CUDAContext (#9435)
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
2018-07-20 12:56:15 -07:00

62 lines
2.0 KiB
C++

#include "torch/csrc/utils/pybind.h"
#include "torch/csrc/cuda/comm.h"
#include "torch/csrc/cuda/Stream.h"
#include "torch/csrc/cuda/THCP.h"
#include "torch/csrc/utils/auto_gil.h"
#include "torch/csrc/utils/functional.h"
#include <ATen/ATen.h>
#include <THC/THC.h>
#include <cstddef>
#include <vector>
namespace torch { namespace cuda { namespace python {
void initCommMethods(PyObject *module) {
auto m = py::cast<py::module>(module);
m.def("_broadcast_coalesced", [](std::vector<at::Tensor>& tensors, std::vector<int64_t> devices, size_t buffer_size) {
return broadcast_coalesced(tensors, devices, buffer_size);
}, py::arg("tensors"), py::arg("devices"), py::arg("buffer_size"),
py::call_guard<py::gil_scoped_release>())
.def("_broadcast", [](at::Tensor& tensor, std::vector<int64_t> devices) {
return broadcast(tensor, devices);
}, py::call_guard<py::gil_scoped_release>())
.def("_scatter", [](
at::Tensor& tensor,
std::vector<int64_t>& devices,
at::optional<std::vector<int64_t>> chunk_sizes,
int64_t dim,
at::optional<py::object> py_streams) {
at::optional<std::vector<at::cuda::CUDAStream>> streams;
if (py_streams) {
py::handle handle = *py_streams;
streams = fmap(
THPUtils_PySequence_to_THCStreamList(handle.ptr()),
[](THCStream* stream) {
at::cuda::detail::CUDAStream_retain(stream);
return at::cuda::CUDAStream(stream);
});
}
// Note: We're holding the GIL up to here.
AutoNoGIL no_gil;
return scatter(tensor, devices, chunk_sizes, dim, streams);
},
py::arg("tensor"),
py::arg("devices"),
py::arg("chunk_sizes"),
py::arg("dim"),
py::arg("streams"))
.def("_gather", [](
std::vector<at::Tensor>& tensors,
int64_t dim,
at::optional<int32_t> destination_index) {
return gather(tensors, dim, destination_index);
},
py::arg("tensors"),
py::arg("dim"),
py::arg("destination_index"),
py::call_guard<py::gil_scoped_release>());
}
}}}