pytorch/torch/csrc/jit/mobile/interpreter.cpp
Kimish Patel d6d726f781 [Pytorch Backend delegation] Add api for backend lowering to query debug (#55462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462

handles and symbolicate exception callstack thrown from backend.

Objective of this diff is to achieve improve error reporting when
exceptions are raised from lowered backend. We would effectively like to
get the same model level stack trace that you would get without having
lowered some module to backend.

For example:
```
class AA(nn.Module):
  def forward(self, x, y):
    return x + y

class A(nn.Module):
  def __init__(...):
    self.AA0 = AA()
  def forward(self, x, y):
    return self.AA0.forward(x, y) + 3

class B(nn.Module):
  def forward(self, x):
    return x + 2

class C(nn.Module):
  def __init__(...):
    self.A0 = A()
    self.B0 = B()
  def forward(self, x, y):
    return self.A0.forward(x, y) + self.B0.forward(x)
```
If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we
will likely see error stack like:
```
C++ exception with description "The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```

We would like to see the same error stack if we lowered C.A0 to some
backend.

With this diff we get something like:
```
  Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA)
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 5, in FunctionName_UNKNOWN
                typed_inputs: List[Any] = [x, y, ]
                if self.__backend.is_available() :
                  _0, = self.__backend.execute(self.__handles["forward"], typed_inputs)
                        ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                  assert isinstance(_0, Tensor)
                  return _0
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```
This is achieved in 3 parts:
Part 1:
A. BackendDebugInfoRecorder:
   During backend lowering, in `to_backend`, before calling the preprocess
   function corresponding to the backend. This will facilitate recording of
   debug info (such as source range + inlined callstack) for the lowered module.
B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder.
   This initializes thread local pointer to BackendDebugInfoRecorder.
C. generate_debug_handles:
   In preprocess function, the backend will call generate_debug_handles
   for each method being lowered separately. generate_debug_handles
   takes `Graph` of the method being lowered and returns a map
   of Node*-to-debug_handles. Backend is responsible for storing debug
   handles appropriately so as to raise exception (and later profiling)
   using debug handles when the exception being raised corresponds to
   particular Node that was lowered.
   Inside generate_debug_handles, we will query the current
   BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug
   handle manager will issue debug handles as well as record
   debug_handles-to-<source range, inlined callstack> map.
D. Back in `to_backend`, once the preprocess function is has finished
   lowering the module, we will call `stopRecord` on
   BackendDebugInfoRecorder. This will return the debug info map. This
   debug info is then stored inside the lowered module.

Part 2:
Serialization:
During serialization for bytecode (lite interpreter), we will do two
things:
1. Extract all the source ranges that are contained inside
debug_handles-to-<source range, inlined callstack> map for lowered
module. This will be source range corresponding to debug handles,
including what is there is inlined callstack. Since we replaced original
module with lowered module, we wont be serializing code for the original
module and thus no source range. That is why the source range will have
to be stored separately. We will lump all the source ranges for all the
lowered modules in one single debug_pkl file.
2. Then we will serialize debug_handles-to-<source range, inlined
callstack> map.

Now during deserialization we will be able to reconstruct
debug_handles-to-<source range, inlined callstack> map. Given all
debug_handles are unique we would not need any module information.

Test Plan:
Tests are added in test_backend.cpp

Tests are added in test_backend.cpp

Imported from OSS

Differential Revision:
D27621330
D27621330

Reviewed By: raziel

Pulled By: kimishpatel

fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890
2021-05-22 08:33:07 -07:00

266 lines
8.3 KiB
C++

#include <torch/csrc/jit/mobile/interpreter.h>
#include <ATen/core/function.h>
#include <ATen/core/jit_type.h>
#include <ATen/core/operator_name.h>
#include <torch/csrc/jit/mobile/function.h>
#include <torch/csrc/jit/runtime/jit_exception.h>
#include <torch/csrc/jit/runtime/vararg_functions.h>
#include <ATen/record_function.h>
#include <torch/csrc/jit/mobile/observer.h>
#include <torch/csrc/jit/backends/backend_exception.h>
namespace torch {
namespace jit {
char const* toString(OpCode op);
std::ostream& operator<<(std::ostream& out, Instruction inst);
namespace mobile {
InterpreterState::InterpreterState(std::shared_ptr<Code> code)
: code_(std::move(code)) {
registers_.resize(code_->register_size_);
}
namespace {
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
static thread_local int64_t exception_pc_{-1};
void createObject(Stack& stack, const at::ClassTypePtr& type) {
auto userObj = c10::ivalue::Object::create(
c10::StrongTypePtr(type->compilation_unit(), type),
type->numAttributes());
push(stack, std::move(userObj));
}
void isinstance(Stack& stack, at::ArrayRef<at::TypePtr> types) {
at::TypePtr ty = pop(stack).type();
for (const at::TypePtr& candidate : types) {
if (ty->isSubtypeOf(candidate)) {
push(stack, true);
return;
}
}
push(stack, false);
}
} // namespace
using namespace at;
int64_t getInterpretersExceptionPC() {
return exception_pc_;
}
bool InterpreterState::run(Stack& stack) {
size_t pc = 0;
while (true) {
try {
Instruction inst = code_->instructions_[pc];
// std::cout << "RUNNING " << pc << " " << code_->instructions_[pc];
// if (inst.op == OP) {
// std::cout << ", " << code_->op_names_[inst.X].name;
// if (!code_->op_names_[inst.X].overload_name.empty()) {
// std::cout << "." << code_->op_names_[inst.X].overload_name;
// }
// }
// std::cout << std::endl;
switch (inst.op) {
case OP: {
if (at::hasGlobalCallbacks()) {
if (auto* mobile_debug_info = static_cast<MobileDebugInfo*>(
c10::ThreadLocalDebugInfo::get(
c10::DebugInfoKind::MOBILE_RUNTIME_INFO))) {
mobile_debug_info->setOpIdx(pc);
}
}
// TODO(iliacher): remove the workaround after RecordFunction is in
// Dispatcher
bool prev_value = isRecordFunctionEnabled();
if (!prev_value) {
// enable only for the RecordFunction
enableRecordFunction(true);
}
RECORD_USER_SCOPE_WITH_INPUTS(code_->op_names_[inst.X].name, stack);
if (!prev_value) {
enableRecordFunction(false);
}
code_->operators_[inst.X](stack);
++pc;
} break;
case OPN: {
stack.push_back(inst.N);
code_->operators_[inst.X](stack);
++pc;
} break;
case INTERFACE_CALL: {
torch::jit::Function& method =
peek(stack, 0, inst.N)
.toObject()
->type()
->getMethod(code_->constants_[inst.X].toStringRef());
method.run(stack);
++pc;
} break;
case LOAD:
stack.emplace_back(reg(inst.X));
++pc;
break;
case MOVE:
stack.emplace_back(std::move(reg(inst.X)));
++pc;
break;
case STORE:
reg(inst.X) = pop(stack);
++pc;
break;
case STOREN:
for (size_t i = inst.N; i > 0; --i) {
reg(inst.X + i - 1) = pop(stack);
}
++pc;
break;
case DROP:
pop(stack);
++pc;
break;
case DROPR:
reg(inst.X) = IValue();
++pc;
break;
case LOADC:
stack.emplace_back(code_->constants_[inst.X]);
++pc;
break;
case GET_ATTR: {
auto userObj = pop(stack).toObject();
auto value = userObj->getSlot(inst.X);
push(stack, std::move(value));
++pc;
} break;
case SET_ATTR: {
auto v = pop(stack);
auto userObj = pop(stack).toObject();
// Mobile only: since the number of slots is not known, resize the
// numAttributes before setSlot.
// NOLINTNEXTLINE(clang-diagnostic-sign-compare)
while (userObj->type()->numAttributes() <= inst.X) {
std::stringstream ss;
ss << userObj->type()->numAttributes();
userObj->type()->addAttribute(ss.str(), c10::NoneType::get());
}
userObj->setSlot(inst.X, std::move(v));
++pc;
} break;
case JF:
pc += (pop(stack).toBool()) ? 1 : inst.X;
break;
case JMP:
pc += inst.X;
break;
case LOOP: {
// stack: iteration_count, max_iter, cond, loop_carried_deps...
auto frame = stack.end() - (inst.N + 1);
int64_t trip_count = frame[0].toInt();
int64_t max_trip_count = frame[1].toInt();
bool cond = frame[2].toBool();
if (trip_count < max_trip_count && cond) {
frame[2] = trip_count;
frame[0] = trip_count + 1;
++pc;
} else {
size_t n_loop_carried = inst.N - 2;
for (size_t i = 0; i < n_loop_carried; ++i) {
frame[i] = std::move(frame[i + 3]);
}
drop(stack, 3); // iteration_count, max_iter, cond
pc += inst.X;
}
} break;
case RET:
return false;
case LIST_CONSTRUCT: {
const auto& type = code_->types_[inst.X]->expectRef<at::ListType>();
listConstruct(stack, type, inst.N);
++pc;
} break;
case LIST_UNPACK: {
listUnpack(stack, inst.X);
++pc;
} break;
case TUPLE_CONSTRUCT: {
tupleConstruct(stack, inst.X);
++pc;
} break;
case TUPLE_SLICE: {
tupleSlice(stack, inst.X, inst.X + inst.N);
++pc;
} break;
case DICT_CONSTRUCT: {
const auto& type = code_->types_[inst.X]->expectRef<at::DictType>();
dictConstruct(stack, type, inst.N);
++pc;
} break;
case NAMED_TUPLE_CONSTRUCT: {
namedTupleConstruct(
stack, code_->types_[inst.X]->expect<at::TupleType>(), inst.N);
++pc;
} break;
case CREATE_OBJECT: {
auto type = code_->types_[inst.X]->expect<c10::ClassType>();
createObject(stack, type);
++pc;
} break;
case ISINSTANCE: {
at::ArrayRef<TypePtr> types(
&(code_->types_[inst.X]), &(code_->types_[inst.X + inst.N]));
isinstance(stack, types);
++pc;
} break;
case WARN: {
drop(stack, 1);
// Note: Please don't move the pop(stack) code below into the
// TORCH_WARN macro since TORCH_WARN fails to evaluate its arguments
// when STRIP_ERROR_MESSAGES is defined (which happens for production
// mobile builds). This will cause the stack to be in an inconsistent
// state. It has previously resulted in a SEV (S22350).
const auto& sref = stack.back().toStringRef();
TORCH_WARN(sref);
stack.pop_back();
++pc;
} break;
default:
AT_ERROR(toString(inst.op), " is invalid.");
}
// This exception must be caught first as it derived from c10::Error
} catch (c10::BackendRuntimeException& e) {
exception_pc_ = pc;
TORCH_RETHROW(e);
} catch (c10::Error& error) {
// Reason for catching and rethrowing the error is so that we can
// set the exception pc that is queried later
exception_pc_ = pc;
TORCH_RETHROW(error);
} catch (...) {
exception_pc_ = pc;
throw;
}
// for (auto val : stack) {
// if (val.isTensor()) {
// std::cout << val.toTensor().sizes() << std::endl;
// } else {
// std::cout << val << std::endl;
// }
// }
}
return false;
}
IValue& InterpreterState::reg(size_t reg) {
return *(registers_.end() - reg);
}
} // namespace mobile
} // namespace jit
} // namespace torch