pytorch/torch/csrc/DeviceAccelerator.cpp
Yu, Guangye 40c098f731 Introduce a device-agnostic runtime API design (#132204)
# Motivation
According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the **Simple Version** APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does **NOT** break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:
1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
2. Device-specific APIs should be placed under device-specific submodules.
3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.

Also, I list the pros and cons of **Simple Version** here:
Pros:
- `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience;
- more concise, facilitate the developer to write a device-agnostic code.

Cons:
- no obvious drawbacks.

# Additional Context
I list the new APIs here:
```python
torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:
```
According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204
Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD
2024-10-27 10:37:09 +00:00

83 lines
2.9 KiB
C++

#include <c10/core/DeviceGuard.h>
#include <torch/csrc/DeviceAccelerator.h>
#include <torch/csrc/utils/device_lazy_init.h>
namespace torch::accelerator {
void initModule(PyObject* module) {
auto m = py::handle(module).cast<py::module>();
m.def("_accelerator_getAccelerator", []() {
// If no accelerator is currently available, raise an exception.
return c10::Device(at::getAccelerator(true).value());
});
m.def("_accelerator_deviceCount", []() {
const auto device_type = at::getAccelerator(false);
if (!device_type.has_value()) {
return static_cast<c10::DeviceIndex>(0);
}
torch::utils::maybe_initialize_device(device_type.value());
c10::impl::VirtualGuardImpl impl(device_type.value());
return static_cast<c10::DeviceIndex>(impl.deviceCount());
});
m.def("_accelerator_setDeviceIndex", [](c10::DeviceIndex device_index) {
const auto device_type = at::getAccelerator(true).value();
// If device index is negative, no-op
if (device_index < 0) {
return;
}
torch::utils::maybe_initialize_device(device_type);
c10::impl::VirtualGuardImpl impl(device_type);
impl.setDevice({device_type, device_index});
});
m.def("_accelerator_getDeviceIndex", []() {
const auto device_type = at::getAccelerator(true).value();
torch::utils::maybe_initialize_device(device_type);
c10::impl::VirtualGuardImpl impl(device_type);
return static_cast<c10::DeviceIndex>(impl.getDevice().index());
});
m.def("_accelerator_setStream", [](c10::Stream stream) {
const auto device_type = at::getAccelerator(true).value();
TORCH_CHECK(
device_type == stream.device_type(),
"stream's device type ",
c10::DeviceTypeName(stream.device_type()),
" doesn't match the current accelerator ",
c10::DeviceTypeName(device_type));
torch::utils::maybe_initialize_device(device_type);
c10::impl::VirtualGuardImpl impl(device_type);
// Set the current device to the device of stream
if (impl.getDevice().index() != stream.device_index()) {
impl.setDevice(stream.device());
}
impl.exchangeStream(stream);
});
m.def("_accelerator_getStream", [](c10::DeviceIndex device_index) {
const auto device_type = at::getAccelerator(true).value();
torch::utils::maybe_initialize_device(device_type);
c10::impl::VirtualGuardImpl impl(device_type);
return impl.getStream({device_type, device_index});
});
m.def("_accelerator_synchronizeDevice", [](c10::DeviceIndex device_index) {
const auto device_type = at::getAccelerator(true).value();
if (!torch::utils::is_device_initialized(device_type)) {
return;
}
torch::utils::maybe_initialize_device(device_type);
c10::impl::VirtualGuardImpl impl(device_type);
// impl.synchronizeDevice should can be safely called from any device
{
py::gil_scoped_release no_gil;
impl.synchronizeDevice(device_index);
}
});
}
} // namespace torch::accelerator