pytorch/torch/csrc/utils/python_arg_parser.h
Edward Z. Yang 711e5a6ceb
Port THS to ATen. (#8409)
* Port THS to ATen.

The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to TH/BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_copy.  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Some last mile fixes after this commit went into PR

- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
  The dynamic_type situation is very delicate; probably need
  to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
  by this change, but being fixed in a parallel track.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-15 17:52:21 -04:00

430 lines
14 KiB
C++

#pragma once
// Parse arguments to Python functions implemented in C++
// This is similar to PyArg_ParseTupleAndKeywords(), but specifically handles
// the types relevant to PyTorch and distinguishes between overloaded function
// signatures.
//
// Example:
//
// static PythonArgParser parser({
// "norm(Scalar p, int64_t dim, bool keepdim=False)",
// "norm(Scalar p=2)",
// });
// ParsedArgs<3> parsed_args;
// auto r = parser.parse(args, kwargs, parsed_args);
// if (r.idx == 0) {
// norm(r.scalar(0), r.int64(1), r.bool(0));
// } else {
// norm(r.scalar(0));
// }
//
// We auto-generate most uses of PythonArgParser; the generated files
// are torch/csrc/autograd/generated/python_*.cpp
//
// Some gotchas that you should watch out for:
//
// - Note [Order of overloads matters]
// Order of overloads matters. A set of input arguments may
// bind to multiple argument specs; we will always pick the
// first one in PythonArgParser. However, when you are writing
// overloads in, e.g., native_functions.yaml, you don't have to
// worry about what order you write them, because the code
// generation logic always gives the overloads a canonical
// order, where Tensor overloads come first, before Scalar overloads.
// This logic is in sort_declarations in
// tools/autograd/gen_python_functions.py
//
// - Zero-dim tensors (e.g., torch.tensor(2)) bind to both
// Scalar and Tensor, UNLESS they require grad (in which case
// they only bind to Tensor).
#include "torch/csrc/python_headers.h"
#include <string>
#include <sstream>
#include <vector>
#include <ATen/ATen.h>
#include "torch/csrc/Device.h"
#include "torch/csrc/Dtype.h"
#include "torch/csrc/DynamicTypes.h"
#include "torch/csrc/Exceptions.h"
#include "torch/csrc/Generator.h"
#include "torch/csrc/autograd/python_variable.h"
#include "torch/csrc/autograd/generated/VariableType.h"
#include "torch/csrc/jit/tracer.h"
#include "torch/csrc/tensor/python_tensor.h"
#include "torch/csrc/utils/device.h"
#include "torch/csrc/utils/object_ptr.h"
#include "torch/csrc/utils/python_numbers.h"
#include "torch/csrc/utils/python_strings.h"
#include "torch/csrc/utils/numpy_stub.h"
namespace torch {
enum class ParameterType {
TENSOR, SCALAR, INT64, DOUBLE, TENSOR_LIST, INT_LIST, GENERATOR,
BOOL, STORAGE, PYOBJECT, SCALARTYPE, LAYOUT, DEVICE, STRING
};
struct FunctionParameter;
struct FunctionSignature;
struct PythonArgs;
// Contains bound Python arguments in declaration order
template<int N>
struct ParsedArgs {
PyObject* args[N];
};
struct PythonArgParser {
explicit PythonArgParser(std::vector<std::string> fmts, bool traceable=false);
template<int N>
inline PythonArgs parse(PyObject* args, PyObject* kwargs, ParsedArgs<N>& dst);
private:
[[noreturn]]
void print_error(PyObject* args, PyObject* kwargs, PyObject* dst[]);
PythonArgs raw_parse(PyObject* args, PyObject* kwargs, PyObject* dst[]);
std::vector<FunctionSignature> signatures_;
std::string function_name;
ssize_t max_args;
bool traceable;
};
struct PythonArgs {
PythonArgs(int idx, bool traceable, const FunctionSignature& signature, PyObject** args)
: idx(idx)
, traceable(traceable)
, signature(signature)
, args(args) {}
int idx;
bool traceable;
const FunctionSignature& signature;
PyObject** args;
inline at::Tensor tensor(int i);
inline at::Scalar scalar(int i);
inline at::Scalar scalarWithDefault(int i, at::Scalar default_scalar);
inline std::vector<at::Tensor> tensorlist(int i);
template<int N>
inline std::array<at::Tensor, N> tensorlist_n(int i);
inline std::vector<int64_t> intlist(int i);
inline std::vector<int64_t> intlistWithDefault(int i, std::vector<int64_t> default_intlist);
inline at::Generator* generator(int i);
inline std::unique_ptr<at::Storage> storage(int i);
inline at::ScalarType scalartype(int i);
inline at::ScalarType scalartypeWithDefault(int i, at::ScalarType default_scalartype);
inline at::optional<at::ScalarType> scalartypeOptional(int i);
inline const THPLayout& layout(int i);
inline const THPLayout& layoutWithDefault(int i, const THPLayout& default_layout);
inline Device device(int i);
inline Device deviceWithDefault(int i, const Device& default_device);
inline int64_t deviceInt64(int i);
inline at::optional<Device> deviceOptional(int i);
inline std::string string(int i);
inline PyObject* pyobject(int i);
inline int64_t toInt64(int i);
inline int64_t toInt64WithDefault(int i, int64_t default_int);
inline double toDouble(int i);
inline double toDoubleWithDefault(int i, double default_double);
inline bool toBool(int i);
inline bool toBoolWithDefault(int i, bool default_bool);
inline bool isNone(int i);
};
struct FunctionSignature {
explicit FunctionSignature(const std::string& fmt);
bool parse(PyObject* args, PyObject* kwargs, PyObject* dst[], bool raise_exception);
std::string toString() const;
std::string name;
std::vector<FunctionParameter> params;
ssize_t min_args;
ssize_t max_args;
ssize_t max_pos_args;
bool hidden;
bool deprecated;
};
struct FunctionParameter {
FunctionParameter(const std::string& fmt, bool keyword_only);
bool check(PyObject* obj);
void set_default_str(const std::string& str);
std::string type_name() const;
ParameterType type_;
bool optional;
bool allow_none;
bool keyword_only;
int size;
std::string name;
// having this as a raw PyObject * will presumably leak it, but these are only held by static objects
// anyway, and Py_Finalize can already be called when this is destructed.
PyObject *python_name;
at::Scalar default_scalar;
std::vector<int64_t> default_intlist;
union {
bool default_bool;
int64_t default_int;
double default_double;
at::ScalarType default_scalartype;
THPLayout* default_layout;
};
};
template<int N>
inline PythonArgs PythonArgParser::parse(PyObject* args, PyObject* kwargs, ParsedArgs<N>& dst) {
if (N < max_args) {
throw ValueError("PythonArgParser: dst ParsedArgs buffer does not have enough capacity, expected %d (got %d)",
(int)max_args, N);
}
return raw_parse(args, kwargs, dst.args);
}
inline at::Tensor PythonArgs::tensor(int i) {
if (!args[i]) return at::Tensor();
if (!THPVariable_Check(args[i])) {
// NB: Are you here because you passed None to a Variable method,
// and you expected an undefined tensor to be returned? Don't add
// a test for Py_None here; instead, you need to mark the argument
// as *allowing none*; you can do this by writing 'Tensor?' instead
// of 'Tensor' in the ATen metadata.
throw TypeError("expected Tensor as argument %d, but got %s", i,
Py_TYPE(args[i])->tp_name);
}
return reinterpret_cast<THPVariable*>(args[i])->cdata;
}
inline at::Scalar PythonArgs::scalar(int i) {
return scalarWithDefault(i, signature.params[i].default_scalar);
}
inline at::Scalar PythonArgs::scalarWithDefault(int i, at::Scalar default_scalar) {
if (!args[i]) return default_scalar;
// Zero-dim tensors are converted to Scalars as-is. Note this doesn't currently
// handle most NumPy scalar types except np.float64.
if (THPVariable_Check(args[i])) {
return at::Scalar(((THPVariable*)args[i])->cdata);
}
if (THPUtils_checkLong(args[i])) {
return at::Scalar(static_cast<int64_t>(THPUtils_unpackLong(args[i])));
}
return at::Scalar(THPUtils_unpackDouble(args[i]));
}
inline std::vector<at::Tensor> PythonArgs::tensorlist(int i) {
if (!args[i]) return std::vector<at::Tensor>();
PyObject* arg = args[i];
auto tuple = PyTuple_Check(arg);
auto size = tuple ? PyTuple_GET_SIZE(arg) : PyList_GET_SIZE(arg);
std::vector<at::Tensor> res(size);
for (int idx = 0; idx < size; idx++) {
PyObject* obj = tuple ? PyTuple_GET_ITEM(arg, idx) : PyList_GET_ITEM(arg, idx);
if (!THPVariable_Check(obj)) {
throw TypeError("expected Tensor as element %d in argument %d, but got %s",
idx, i, Py_TYPE(args[i])->tp_name);
}
res[idx] = reinterpret_cast<THPVariable*>(obj)->cdata;
}
return res;
}
template<int N>
inline std::array<at::Tensor, N> PythonArgs::tensorlist_n(int i) {
auto res = std::array<at::Tensor, N>();
PyObject* arg = args[i];
if (!arg) return res;
auto tuple = PyTuple_Check(arg);
auto size = tuple ? PyTuple_GET_SIZE(arg) : PyList_GET_SIZE(arg);
if (size != N) {
throw TypeError("expected tuple of %d elements but got %d", N, (int)size);
}
for (int idx = 0; idx < size; idx++) {
PyObject* obj = tuple ? PyTuple_GET_ITEM(arg, idx) : PyList_GET_ITEM(arg, idx);
if (!THPVariable_Check(obj)) {
throw TypeError("expected Tensor as element %d in argument %d, but got %s",
idx, i, Py_TYPE(args[i])->tp_name);
}
res[idx] = reinterpret_cast<THPVariable*>(obj)->cdata;
}
return res;
}
inline std::vector<int64_t> PythonArgs::intlist(int i) {
return intlistWithDefault(i, signature.params[i].default_intlist);
}
inline std::vector<int64_t> PythonArgs::intlistWithDefault(int i, std::vector<int64_t> default_intlist) {
if (!args[i]) return default_intlist;
PyObject* arg = args[i];
auto size = signature.params[i].size;
if (size > 0 && THPUtils_checkLong(arg)) {
return std::vector<int64_t>(size, THPUtils_unpackIndex(arg));
}
auto tuple = PyTuple_Check(arg);
size = tuple ? PyTuple_GET_SIZE(arg) : PyList_GET_SIZE(arg);
std::vector<int64_t> res(size);
for (int idx = 0; idx < size; idx++) {
PyObject* obj = tuple ? PyTuple_GET_ITEM(arg, idx) : PyList_GET_ITEM(arg, idx);
try {
// Elements of torch.Size are tensors during tracing, and we need to record extra
// information before they are turned into an IntList
if (traceable && THPVariable_Check(obj)) {
auto & var = THPVariable_Unpack(obj);
jit::tracer::ArgumentStash::stashIntListElem(
signature.params[i].name, size, idx, var);
res[idx] = var.toCLong();
continue;
} else {
res[idx] = THPUtils_unpackIndex(obj);
}
} catch (std::runtime_error &e) {
throw TypeError("%s(): argument '%s' must be %s, but found element of type %s at pos %d",
signature.name.c_str(), signature.params[i].name.c_str(),
signature.params[i].type_name().c_str(), Py_TYPE(obj)->tp_name, idx + 1);
}
}
return res;
}
inline at::ScalarType PythonArgs::scalartypeWithDefault(int i, at::ScalarType default_scalartype) {
if (!args[i]) return default_scalartype;
return scalartype(i);
}
inline at::ScalarType PythonArgs::scalartype(int i) {
if (!args[i]) {
auto scalartype = signature.params[i].default_scalartype;
return (scalartype == at::ScalarType::Undefined) ?
torch::tensor::get_default_tensor_type().scalarType() : scalartype;
}
return reinterpret_cast<THPDtype*>(args[i])->scalar_type;
}
inline at::optional<at::ScalarType> PythonArgs::scalartypeOptional(int i) {
if (!args[i]) return at::nullopt;
return scalartype(i);
}
inline const THPLayout& PythonArgs::layout(int i) {
if (!args[i]) return *signature.params[i].default_layout;
return *reinterpret_cast<THPLayout*>(args[i]);
}
inline const THPLayout& PythonArgs::layoutWithDefault(int i, const THPLayout& default_layout) {
if (!args[i]) return default_layout;
return layout(i);
}
static std::string cuda_str = "cuda";
static std::string cpu_str = "cpu";
static std::string cuda_prefix = "cuda:";
static std::string cpu_prefix = "cpu:";
inline Device PythonArgs::device(int i) {
if (!args[i]) {
const auto& default_tensor_type = torch::tensor::get_default_tensor_type();
const auto device_type = torch::getDeviceType(default_tensor_type);
return Device(device_type, -1, true);
}
if (THPDevice_Check(args[i])) {
auto device = reinterpret_cast<THPDevice*>(args[i]);
return device->device;
}
if (THPUtils_checkLong(args[i])) {
auto index = THPUtils_unpackLong(args[i]);
return Device(DeviceType::CUDA, index, index == -1);
}
std::string device_str = THPUtils_unpackString(args[i]);
if (device_str == cpu_str) {
return Device(DeviceType::CPU, -1, true);
} else if (device_str == cuda_str) {
return Device(DeviceType::CUDA, -1, true);
} else if (device_str.compare(0, cpu_prefix.length(), cpu_prefix) == 0) {
auto device_index = std::stoi(device_str.substr(cpu_prefix.length()));
return Device(DeviceType::CPU, device_index, false);
} else if (device_str.compare(0, cuda_prefix.length(), cuda_prefix) == 0) {
auto device_index = std::stoi(device_str.substr(cuda_prefix.length()));
return Device(DeviceType::CUDA, device_index, false);
}
throw torch::TypeError("only \"cuda\" and \"cpu\" are valid device types, got %s", device_str.c_str());
}
inline Device PythonArgs::deviceWithDefault(int i, const Device& default_device) {
if (!args[i]) return default_device;
return device(i);
}
inline int64_t PythonArgs::deviceInt64(int i) {
auto dev = device(i);
return dev.deviceInt64();
}
inline at::optional<Device> PythonArgs::deviceOptional(int i) {
if (!args[i]) return at::nullopt;
return device(i);
}
inline std::string PythonArgs::string(int i) {
if (!args[i]) return "";
return THPUtils_unpackString(args[i]);
}
inline int64_t PythonArgs::toInt64(int i) {
if (!args[i]) return signature.params[i].default_int;
return THPUtils_unpackLong(args[i]);
}
inline int64_t PythonArgs::toInt64WithDefault(int i, int64_t default_int) {
if (!args[i]) return default_int;
return toInt64(i);
}
inline double PythonArgs::toDouble(int i) {
if (!args[i]) return signature.params[i].default_double;
return THPUtils_unpackDouble(args[i]);
}
inline double PythonArgs::toDoubleWithDefault(int i, double default_double) {
if (!args[i]) return default_double;
return toDouble(i);
}
inline bool PythonArgs::toBool(int i) {
if (!args[i]) return signature.params[i].default_bool;
return args[i] == Py_True;
}
inline bool PythonArgs::toBoolWithDefault(int i, bool default_bool) {
if (!args[i]) return default_bool;
return toBool(i);
}
inline bool PythonArgs::isNone(int i) {
return args[i] == nullptr;
}
inline at::Generator* PythonArgs::generator(int i) {
if (!args[i]) return nullptr;
return reinterpret_cast<THPGenerator*>(args[i])->cdata;
}
inline std::unique_ptr<at::Storage> PythonArgs::storage(int i) {
if (!args[i]) return nullptr;
return createStorage(args[i]);
}
inline PyObject* PythonArgs::pyobject(int i) {
if (!args[i]) return Py_None;
return args[i];
}
} // namespace torch