pytorch/c10/core/impl/PythonDispatcherTLS.h
Edward Z. Yang 490727a35f New calling convention for Python dispatcher (#85133)
Instead of calling into the Python dispatcher for EVERY dispatcher
call, we now have a two step process.  First, we
getattr(op: OpOverload, dispatch_key) to "load" the handler for the
function.  This can either be a conventional function (in which
case we will call it, in the same way the old Python dispatcher
worked), or it can be a DispatchKey, in which case we will directly
call that DispatchKey in C++, bypassing marshalling between Python
and C++ entirely.  OpOverload.__getattr__ is carefully written so
that it will cache the

A further optimization would be to define __slots__ on OpOverload,
and ensuring that the DispatchKey strings are interned.

The resulting Python dispatcher is less flexible: after the first
lookup, the handler is cached and we won't recompute it.  Furthermore,
by default, dispatches will not go into Python, and so you won't
get stack frames for the Python dispatcher by default.  But we get
a huge performance improvement: on the following microbenchmark
we go from 2.5s to 1.9s.

```
import time
import torch
from functorch import make_fx

def f(x):
    for i in range(1000):
        x = x * x
    return x

begin = time.time()
res = make_fx(f, tracing_mode="symbolic")(torch.randn(10, 20))
print(time.time()-begin)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85133
Approved by: https://github.com/wconstab
2022-09-16 20:38:21 +00:00

28 lines
599 B
C++

#pragma once
#include <c10/core/SafePyObject.h>
#include <c10/macros/Macros.h>
#include <c10/util/Optional.h>
namespace c10 {
namespace impl {
struct C10_API PythonDispatcherTLS {
static void set_state(PyInterpreter* state);
static PyInterpreter* get_state();
static void reset_state();
};
struct C10_API DisablePythonDispatcher {
DisablePythonDispatcher() : old_(PythonDispatcherTLS::get_state()) {
PythonDispatcherTLS::set_state({});
}
~DisablePythonDispatcher() {
PythonDispatcherTLS::set_state(old_);
}
PyInterpreter* old_;
};
} // namespace impl
} // namespace c10