pytorch/torch/csrc/jit/python/python_ivalue.h
Luca Wehrstedt a688b29750 Support custom Python classes in CUDAFuture (#56516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56516

One problem with CUDAFuture's extraction of DataPtrs from IValues is that it only supported Python objects that could be converted to "regular" IValues (e.g., lists/dicts/tuples of ints/strings/tensors/...). One notable exception are custom Python classes, which are in fact a very common data type transferred over RPC. The only solution we found for those is to use the Python pickler to extract the tensors contained in them.

We can't insert a Python dependency directly into CUDAFuture, so instead I'm proposing to use the same indirection technique used to support `getSubValues` on Python objects: define some methods on the abstract class `PyObjectHolder` (which can be used by CUDAFuture) but only implement them in the concrete subclass `ConcretePyObjectHolder` (which is only built when Python support is enabled).

I am a bit worried about the performance toll of this (pickling isn't exactly known to be cheap) but I think we should start by providing a functionally complete API. We already have ideas on how to make this faster if needed, for example by having users provide a custom DataPtr extractor tailored to their class via a decorator. (Or just use TorchScript).
ghstack-source-id: 127295014

Test Plan: Added a test later in the stack

Reviewed By: mrshenli

Differential Revision: D27887189

fbshipit-source-id: 9d27e4e62390b836e5bb4f06f401cc002f0cf95b
2021-04-24 07:06:28 -07:00

99 lines
3.1 KiB
C++

#pragma once
#include <ATen/core/ivalue.h>
#include <pybind11/pybind11.h>
#include <torch/csrc/jit/python/pybind_utils.h>
#include <torch/csrc/python_headers.h>
namespace py = pybind11;
namespace c10 {
namespace ivalue {
// concrete ivalue Holder that hold a py::object
struct C10_EXPORT ConcretePyObjectHolder final : PyObjectHolder {
public:
static c10::intrusive_ptr<PyObjectHolder> create(py::object py_obj) {
return c10::make_intrusive<ConcretePyObjectHolder>(std::move(py_obj));
}
static c10::intrusive_ptr<PyObjectHolder> create(const py::handle& handle) {
py::gil_scoped_acquire ag;
return c10::make_intrusive<ConcretePyObjectHolder>(
handle.cast<py::object>());
}
PyObject* getPyObject() override {
return py_obj_.ptr();
}
InferredType tryToInferType() override {
pybind11::gil_scoped_acquire ag;
return torch::jit::tryToInferType(py_obj_);
}
IValue toIValue(const TypePtr& type, c10::optional<int32_t> N = c10::nullopt)
override {
pybind11::gil_scoped_acquire ag;
return torch::jit::toIValue(py_obj_, type, N);
}
std::string toStr() override {
pybind11::gil_scoped_acquire ag;
return py::str(py_obj_);
}
std::vector<at::Tensor> extractTensors() override {
// We could implement this entirely in C++ via pybind11 but it turns out to
// be substantially slower. Namely, the total time taken by markCompleted on
// a CUDAFuture is 21.5us with this implementation, but goes up to 58.7us
// when using C++. The reason is unclear.
try {
pybind11::gil_scoped_acquire ag;
return py::module::import("torch._jit_internal")
.attr("_extract_tensors")(py_obj_)
.cast<std::vector<at::Tensor>>();
} catch (py::error_already_set& e) {
auto err = std::runtime_error(
c10::str("Cannot extract tensors from value: ", e.what()));
{
pybind11::gil_scoped_acquire ag;
e.restore();
PyErr_Clear();
}
throw err;
}
}
// Note [Destructing py::object]
// ~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// (1) Why py_obj_ = py::none(); does not work. Because we also need to
// acquire GIL when destructing py::object of None that de-references None.
// https://docs.python.org/3/c-api/none.html#c.Py_RETURN_NONE
//
// https://stackoverflow.com/questions/15287590/why-should-py-increfpy-none-be-required-before-returning-py-none-in-c
//
// (2) Why we need to call dec_ref() explicitly. Because py::object of
// nullptr, on destruction, effectively does nothing because of it calls
// Py_XDECREF(NULL) underlying.
// https://docs.python.org/3/c-api/refcounting.html#c.Py_XDECREF
~ConcretePyObjectHolder() {
pybind11::gil_scoped_acquire ag;
py_obj_.dec_ref();
// explicitly setting PyObject* to nullptr to prevent py::object's dtor to
// decref on the PyObject again.
py_obj_.ptr() = nullptr;
}
// explicit construction to avoid errornous implicit conversion and
// copy-initialization
explicit ConcretePyObjectHolder(py::object py_obj)
: py_obj_(std::move(py_obj)) {}
private:
py::object py_obj_;
};
} // namespace ivalue
} // namespace c10