pytorch/c10/core/UndefinedTensorImpl.cpp
Scott Wolchok 059ee85ca4 [PyTorch] Devirtualize TensorImpl::storage() (#51050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51050

Subclasses want to be able to make storage() calls throw, so
we find some free space in TensorImpl to add a flag that they can set
to make that happen without making storage() virtual. It should still
be inlineable.
ghstack-source-id: 121819684

Test Plan:
Compared `perf stat` on 1M iterations on AdIndexer benchmark before/after

Before:
```
         74,483.15 msec task-clock                #    0.999 CPUs utilized            ( +-  0.14% )
            16,637      context-switches          #    0.223 K/sec                    ( +- 11.97% )
                 3      cpu-migrations            #    0.000 K/sec                    ( +-  7.20% )
           107,085      page-faults               #    0.001 M/sec                    ( +-  2.39% )
   147,356,440,831      cycles                    #    1.978 GHz                      ( +-  0.14% )  (50.06%)
   278,678,430,378      instructions              #    1.89  insn per cycle           ( +-  0.01% )  (50.05%)
    43,540,698,177      branches                  #  584.571 M/sec                    ( +-  0.01% )  (50.05%)
       141,028,843      branch-misses             #    0.32% of all branches          ( +-  1.00% )  (50.05%)

```

After:
```
         74,178.77 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
            17,125      context-switches          #    0.231 K/sec                    ( +-  3.41% )
                 3      cpu-migrations            #    0.000 K/sec
           109,535      page-faults               #    0.001 M/sec                    ( +-  1.04% )
   146,803,364,372      cycles                    #    1.979 GHz                      ( +-  0.30% )  (50.03%)
   277,726,600,254      instructions              #    1.89  insn per cycle           ( +-  0.02% )  (50.03%)
    43,299,659,815      branches                  #  583.720 M/sec                    ( +-  0.03% )  (50.03%)
       130,504,094      branch-misses             #    0.30% of all branches          ( +-  1.14% )  (50.03%)

```

Looks like approximately 0.3% instruction count win (and similarly for cycles, but that's within noise).

Reviewed By: ezyang

Differential Revision: D26013815

fbshipit-source-id: 07939957929070e18b9981d492d8279c9bb33c55
2021-02-17 11:48:06 -08:00

42 lines
1.1 KiB
C++

#include <c10/core/UndefinedTensorImpl.h>
#include <c10/util/Exception.h>
namespace c10 {
// should this use the globalContext? Can it get a context passed in somehow?
UndefinedTensorImpl::UndefinedTensorImpl()
: TensorImpl(DispatchKey::Undefined, caffe2::TypeMeta(), c10::nullopt) {
set_storage_access_should_throw();
}
int64_t UndefinedTensorImpl::size(int64_t d) const {
AT_ERROR("size(dim) called on an undefined Tensor");
}
int64_t UndefinedTensorImpl::stride(int64_t d) const {
AT_ERROR("stride(dim) called on an undefined Tensor");
}
#ifdef DEBUG
bool UndefinedTensorImpl::has_storage() const {
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(!storage_, "UndefinedTensorImpl assumes that storage_ is never set");
return false;
}
#endif
void UndefinedTensorImpl::set_storage_offset(int64_t) {
AT_ERROR("set_storage_offset() called on an undefined Tensor");
}
IntArrayRef UndefinedTensorImpl::strides() const {
AT_ERROR("strides() called on undefined Tensor");
}
const char* UndefinedTensorImpl::tensorimpl_type_name() const {
return "UndefinedTensorImpl";
}
UndefinedTensorImpl UndefinedTensorImpl::_singleton;
}