pytorch/c10/core/TensorImpl.cpp
Edward Yang aa49aa856c Tensor type set (#25308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308

Instead of storing a single TensorTypeId in a Tensor, we store a bitset of tensor type IDs in a Tensor, TensorTypeSet. This class comes with some unit tests.  This is in preparation for making Variable a TensorTypeId. In order to help flush out places where this makes a semantic difference, we rename `Tensor::type_id()` to `Tensor::type_set()` and smoke out all of the locations where this was semantically meaningful.

Because the new tensor type set is 64-bits, this increases the size of Tensor by a word.

Listing of semantic changes:
* Many TensorImpl related constructors just propagate TensorTypeId to a parent constructor. These are pretty simple to adjust.
  * Backend extensions are now in the business of explicitly constructing a TensorTypeSet and then passing it in. This is probably OK for now but when Variable drops, these dispatch IDs may get immediately overwritten to have Variable set.
* `sparseTensorSetToDeviceType` and similar functions previously did an equality test with TensorTypeId, to determine what an appropriate device type is. This equality is now replaced with a set inclusion test. This is valid, under the assumption that we don't ever have weird sets like "this tensor is simultaneously a sparse CPU tensor and a sparse CUDA tensor", which will be true in the short term plan of adding Variable to the dispatch ID.
* `impl::dispatchTypeId` was generally introduced for cases where we legitimately need to convert from `TensorTypeSet -> TensorTypeId` in a dispatch related manner. At the moment, the implementation is trivial, but they will soon be adjusted to handle TLS. I've tried to make these call sites as forwards compatible as possible:
  * `checked_tensor_unwrap` and co now use `dispatchTypeId`. When Variable is added to the type set, these will always be called in a context where the Variable type ID is disabled, so we will get the correct underlying tensor type ID.
  * Uses of `Backend` in dispatch are now replaced with `TensorTypeSet`. The general heuristic here for whether or not to accept a `TensorTypeId` or `TensorTypeSet` is that we want to make the generated code as simple as possible. It is easier to retrieve a `TensorTypeSet`, so that's a more appropriate API in these cases.
* In some cases, I could not conveniently switch an implementation to the new semantics, because it was blocked on some other refactor. In this case, I introduced `legacyExtractTypeId`, which gives what would be a BC-compatible `TensorTypeSet` to `TensorTypeId` implementation that will continue to report the same values it would have prior to this change. This is **different** from `dispatchTypeId`, because this function does NOT respect TLS; it always ignores Variable type IDs.
  * c10 dispatcher tests, which are oblivious to Variable dispatch, use this BC function (actually, they use `extractTypeId`, an overload for Tensor.
  * The implementation of `new_*` methods heavily relies on tensor type ID, I chose not to unwind this. PR to refactor this at https://github.com/pytorch/pytorch/pull/25475
  * Slicing also relies on tensor type ID, see `torch/csrc/autograd/python_variable_indexing.cpp` (though in some cases in this file, I was able to replace use of tensor type ID with TensorOptions)
* In some cases, there is an equality test on tensor type ID which would be better done by testing "tensor axes". In those cases, I replaced those equality tests with more equality tests.
  * Example: `torch/csrc/nn/type_checks.h`
  * There is a total punt in `torch/csrc/tensor/python_tensor.cpp` where "instance of" checking is done via dispatch ids. In general, the Variable-ness of a tensor doesn't participate in instanceof testing. It's not entirely clear what to do here.
  * Instead of storing `Backend` in `VariableInfo`, we now just store Layout.

c10 dispatcher test updates were done with:

```
:%s/\([^ ]\+\)\.type_id()/extractTypeId(\1)/g
:%s/\([^( ]\+\)->type_id()/extractTypeId(*\1)/g
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308

Differential Revision: D17092791

Test Plan: sandcastle and ossci

Reviewed By: bwasti

Pulled By: ezyang

fbshipit-source-id: 22207d14fe62dd31ee19cc5011af22e3d9aabb5b
2019-09-10 10:30:54 -07:00

262 lines
7.4 KiB
C++

#include <c10/core/TensorImpl.h>
#include <c10/core/Backend.h>
#include <c10/core/WrapDimMinimal.h>
#include <c10/util/Optional.h>
C10_DEFINE_bool(
caffe2_keep_on_shrink,
true,
"If set, keeps memory when a tensor is shrinking its size.");
C10_DEFINE_int64(
caffe2_max_keep_on_shrink_memory,
LLONG_MAX,
"The maximum memory in bytes to keep on shrink, if the difference between "
"tensor sizes is bigger than this then tensor will be reset.");
namespace c10 {
const char * const TensorImpl::err_msg_tensor_metadata_change_not_allowed =
"is not allowed on a Tensor created from .data or .detach().\n"
"If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)\n"
"without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.\n"
"For example, change:\n"
" x.data.set_(y)\n"
"to:\n"
" with torch.no_grad():\n"
" x.set_(y)";
at::Tensor& TensorImpl::grad() {
if (autograd_meta()) {
return autograd_meta()->grad();
} else {
AT_ERROR("grad is not implemented for Tensor");
}
}
const at::Tensor& TensorImpl::grad() const {
if (autograd_meta()) {
return autograd_meta()->grad();
} else {
AT_ERROR("grad is not implemented for Tensor");
}
}
TensorImpl::TensorImpl(Storage&& storage, TensorTypeSet type_set)
: TensorImpl(std::move(storage), type_set, storage.dtype(), storage.device()) {}
TensorImpl::TensorImpl(TensorTypeSet type_set, const caffe2::TypeMeta& data_type, c10::optional<c10::Device> device_opt)
: TensorImpl({}, type_set, data_type, std::move(device_opt)) {}
TensorImpl::TensorImpl(Storage&& storage, TensorTypeSet type_set, const caffe2::TypeMeta& data_type,
c10::optional<c10::Device> device_opt)
: storage_(std::move(storage)),
sizes_{0},
storage_offset_(0),
numel_(0),
data_type_(data_type),
device_opt_(device_opt),
type_set_(type_set) {
if (!type_set.empty()) {
AT_ASSERT(data_type.id() == caffe2::TypeIdentifier::uninitialized() ||
device_opt_.has_value());
// UndefinedTensorImpl is a singleton, so we skip logging it
C10_LOG_API_USAGE_ONCE("tensor.create");
}
// we would also like to check that non-cpu devices have an index, but some Caffe2 operators create
// Storages with default devices.
strides_.push_back(1);
}
IntArrayRef TensorImpl::sizes() const {
return sizes_;
}
IntArrayRef TensorImpl::strides() const {
return strides_;
}
bool TensorImpl::compute_contiguous() const {
bool is_contiguous = true;
if (is_empty())
return is_contiguous;
int64_t z = 1;
for (int64_t d = dim() - 1; d >= 0; d--) {
if (size(d) != 1) {
if (stride(d) == z) {
z *= size(d);
} else {
is_contiguous = false;
break;
}
}
}
return is_contiguous;
}
bool TensorImpl::compute_channels_last_contiguous() const {
if (dim() == 4) {
int64_t expected = 1;
for (auto& d : {1, 3, 2, 0}) {
if (size(d) != 1) {
if (stride(d) == expected) {
expected *= size(d);
} else {
return false;
}
}
}
return true;
}
return false;
}
bool TensorImpl::compute_strides_like_channels_last() const {
if (dim() == 4) {
int64_t min = 0;
for (auto& d : {1, 3, 2, 0}) {
if (size(d) != 1) {
if (stride(d) > min) {
min = stride(d);
} else {
return false;
}
}
}
return true;
}
return false;
}
void TensorImpl::release_resources() {
autograd_meta_.reset();
if (storage_) {
storage_ = {};
}
}
int64_t TensorImpl::dim() const {
return sizes_.size();
}
int64_t TensorImpl::size(int64_t d) const {
d = at::maybe_wrap_dim(d, dim(), false);
return sizes_[d];
}
int64_t TensorImpl::stride(int64_t d) const {
d = at::maybe_wrap_dim(d, dim(), false);
return strides_[d];
}
TensorImpl* TensorImpl::maybe_zero_dim(bool condition_when_zero_dim) {
bool set_zero_dim = condition_when_zero_dim && this->sizes().size() == 1 && this->size(0) == 1;
if (set_zero_dim) {
resize_dim(0);
}
return this;
}
bool TensorImpl::has_storage() const {
return storage_;
}
bool TensorImpl::is_contiguous(at::MemoryFormat memory_format) const {
#ifdef DEBUG
AT_ASSERT(compute_contiguous() == is_contiguous_);
#endif
if (memory_format == at::MemoryFormat::ChannelsLast) {
return is_channels_last_contiguous_;
}
return is_contiguous_;
}
const Storage& TensorImpl::storage() const {
return storage_;
}
static void deletePlacementDeleteContext(void* ptr) {
delete static_cast<PlacementDeleteContext*>(ptr);
}
at::DataPtr PlacementDeleteContext::makeDataPtr(
at::DataPtr&& data_ptr,
PlacementDtor placement_dtor,
size_t size,
at::Device device) {
auto* ptr = data_ptr.get();
return {ptr,
new PlacementDeleteContext(std::move(data_ptr), placement_dtor, size),
&deletePlacementDeleteContext,
device};
}
AutogradMetaInterface::~AutogradMetaInterface() {}
#ifdef BUILD_NAMEDTENSOR
NamedTensorMetaInterface::~NamedTensorMetaInterface() {}
std::unique_ptr<NamedTensorMetaInterface> NamedTensorMetaInterface::clone() const {
TORCH_INTERNAL_ASSERT(
false,
"Attempting to clone a NamedTensorMetaInterface instance.");
}
int64_t NamedTensorMetaInterface::slow_dim() const {
TORCH_INTERNAL_ASSERT(
false,
"NamedTensorMetaInterface::slow_dim not implemented.");
}
#endif
/// NOTE [ Treating Variables as non-Variables in type dispatch ]
///
/// Previously, in VariableType_*.cpp (generated by gen_variable_type.py), when
/// a function is using the 'use_derived' strategy, we call its implementation
/// on the base non-Variable type (`baseType`), passing unwrapped tensors to the
/// call so that any `.dispatch_type()` calls in the implementation can treat the passed
/// tensors as non-Variables and won't dispatch back to functions in VariableType.
///
/// However, after the Variable/Tensor merge, there is no concept of unwrapping
/// a tensor anymore, and directly passing variables to the base type calls will
/// cause the `.dispatch_type()` dispatch in the implementation to treat the tensor as a
/// variable, and any function dispatch based on `.dispatch_type()` will dispatch back to
/// VariableType, which is not what we want.
///
/// The solution to the above problem is to add `at::NonVariableTypeMode`, which
/// when enabled will cause `legacyTensorType()` and `getType()` to always return
/// non-Variable type, even if the tensor being called on is a variable.
///
/// TODO: Since `torch::NoGradGuard` serves the same purpose in libtorch, we should
/// merge these two thread-local guards.
/// In the CAFFE2_FB_LIMITED_MOBILE_CAPABILITY build setting,
/// thread_local is not supported. In that case, we don't provide
/// `at::NonVariableTypeMode`.
#ifndef CAFFE2_FB_LIMITED_MOBILE_CAPABILITY
thread_local bool NonVariableTypeMode_enabled = false;
bool NonVariableTypeMode::is_enabled() {
return NonVariableTypeMode_enabled;
}
void NonVariableTypeMode::set_enabled(bool enabled) {
NonVariableTypeMode_enabled = enabled;
}
#else // defined(CAFFE2_FB_LIMITED_MOBILE_CAPABILITY)
bool NonVariableTypeMode::is_enabled() {
throw std::runtime_error("NonVariableTypeMode is not supported on mobile");
}
void NonVariableTypeMode::set_enabled(bool enabled) {
throw std::runtime_error("NonVariableTypeMode is not supported on mobile");
}
#endif
} // namespace c10