Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42845
- In server builds, always allow the compiler to inline the kernel into the unboxing wrapper, i.e. optimize for perf.
- In mobile builds, never inline the kernel into the unboxing wrapper, i.e. optimize for binary size.
Note that this only applies for registration API calls where we can actually inline it, i.e. calls with `TORCH_FN` or some of the old API calls.
Registrations that give the registration API a runtime function pointer can't inline and won't do so on server either.
Note also that in server builds, all we do is **allow** the compiler to inline. We don't force inlining.
ghstack-source-id: 114177591
Test Plan:
waitforsandcastle
https://www.internalfb.com/intern/fblearner/details/225217260/
Reviewed By: ezyang
Differential Revision: D23045772
fbshipit-source-id: f74fd600eaa3f5cfdf0da47ea080801a03db7917
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43985
Added
```
def(detail::SelectiveStr<true>, ...)
impl(detail::SelectiveStr<true>, ...)
```
in torch/library, which can also be used for other templated selective registration.
Size saves for this diff:
fbios-pika: 78 KB
igios: 87 KB
Test Plan: Imported from OSS
Reviewed By: ljk53, smessmer
Differential Revision: D23459774
Pulled By: iseeyuan
fbshipit-source-id: 86d34cfe8e3f852602f203db06f23fa99af2c018
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39401
This uses the technique proposed by smessmer in D16451848 to selectively
register operators without codegen. See the Note inside for more
details.
This PR has feature parity with the old selective build apparatus:
it can whitelist schema def()s, impl()s, and on a per dispatch key
basis. It has expanded dispatch key whitelisting, whereas previously
manually written registrations were not whitelisted at all. (This
means we may be dropping dispatch keys where we weren't previously!)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D21905593
Pulled By: ezyang
fbshipit-source-id: d4870f800c66be5ce57ec173c9b6e14a52c4a48b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
ghstack-source-id: 106194240
Test Plan: waitforsandcastle
Differential Revision: D22126701
fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
supersedes D17485438
ghstack-source-id: 106178820
Test Plan: waitforsandcastle
Differential Revision: D21534052
fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39823
Add a compile time function pointer that can be used to pass function pointers in template args.
This is very useful for metaprogramming function wrappers.
ghstack-source-id: 105944072
Test Plan: waitforsandcastle
Differential Revision: D21986243
fbshipit-source-id: a123571c18aa0e65908cbb131f28922ceb59061c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38739
Instead of codegenning the named tensor support checks into
CPUType/CUDAType, we instead add a new dispatch key that is put
into tensor whenever it has names. By default, the fallback
implementation says that named tensors are not supported, but
if they are supported, we register a fallthrough which lets
us through to the true backend implementation.
There are a bunch of small pieces which are necessary to make this
happen:
- NameMode now also excludes DispatchKey::Named from the dispatch set
- To avoid bad error messages, we add a teensy special case to
the dispatcher for named_not_supported_kernel: if we see that
the boxed kernel we need to invoke from unboxed is this kernel,
but we don't support boxing, but it's a kernel which is known
to not need boxing, we just pass in nullptr for the stack.
The special case here is very nice: it doesn't affect the fast
path and only gets exercised when things are not supported.
- I need to add support for per operator fallthrough registration.
This is done similarly to how we support fallthrough fallback,
by just keeping track if the registered kernel for an operator
is a fallthrough.
It is possible we could go even further down this path, and move
the named tensor logic itself into this key. I leave this
up to future work.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21662643
Pulled By: ezyang
fbshipit-source-id: 5bc6ae14a1f600189bd8bf865f74dd1700d932f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742
Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before. Instead of
```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```
you write
```
TORCH_LIBRARY(Namespace, m) {
m.class_<Class>("Class")
.def("foo", foo);
}
```
All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.
custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D21089648
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507