Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48308
The original regex that I added didn't correctly match namespaces that started with an underscore (e.g. `_test`), which caused a master-only test to fail.
The only change from the previous commit is that I updated the regex like so:
before: `^.*TORCH_LIBRARY_IMPL_init_([^_]+)_([^_]+)_[0-9]+(\(.*)?$`
after: `^.*TORCH_LIBRARY_IMPL_init_([_]*[^_]+)_([^_]+)_[0-9]+(\(.*)?$`
I added in a `[_]*` to the beginning of the namespace capture. I did the same for the `_FRAGMENT` regex.
Verified that running `ANALYZE_TEST=1 tools/code_analyzer/build.sh` (as the master-only test does) produces no diff in the output.
Fixing regex pattern to allow for underscores at the beginning of the
namespace
This reverts commit 3c936ecd3c.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25123295
Pulled By: bdhirsh
fbshipit-source-id: 54bd1e3f0c8e28145e736142ad62a18806bb9672
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47322
Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API.
I migrated all call sites that used the legacy dispatcher registration API (RegisterOperators()) to use the new API (TORCH_LIBRARY...). I found all call-sites by running `fbgs RegisterOperators()`. This includes several places, including other OSS code (nestedtensor, torchtext, torchvision). A few things to call out:
For simple ops that only had one registered kernel without a dispatch key, I replaced them with:
```
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName", fn_name);
}
```
For ops that registered to a specific dispatch key / had multiple kernels registered, I registered the common kernel (math/cpu) directly inside a `TORCH_LIBRARY_FRAGMENT` block, and registered any additional kernels from other files (e.g. cuda) in a separate `TORCH_LIBRARY_IMPL` block.
```
// cpu file
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName(schema_inputs) -> schema_outputs");
m.impl("opName", torch::dispatch(c10::DispatchKey::CPU, TORCH_FN(cpu_kernel)));
}
// cuda file
TORCH_LIBRARY_IMPL(ns, CUDA, m) {
m.impl("opName", torch::dispatch(c10::DispatchKey::CUDA, TORCH_FN(cuda_kernel)));
}
```
Special cases:
I found a few ops that used a (legacy) `CPUTensorId`/`CUDATensorId` dispatch key. Updated those to use CPU/CUDA- this seems safe because the keys are aliased to one another in `DispatchKey.h`
There were a handful of ops that registered a functor (function class) to the legacy API. As far as I could tell we don't allow this case in the new API, mainly because you can accomplish the same thing more cleanly with lambdas. Rather than delete the class I wrote a wrapper function on top of the class, which I passed to the new API.
There were a handful of ops that were registered only to a CUDA dispatch key. I put them inside a TORCH_LIBRARY_FRAGMENT block, and used a `def()` and `impl()` call like in case two above.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D24714803
Pulled By: bdhirsh
fbshipit-source-id: c809aad8a698db3fd0d832f117f833e997b159e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42845
- In server builds, always allow the compiler to inline the kernel into the unboxing wrapper, i.e. optimize for perf.
- In mobile builds, never inline the kernel into the unboxing wrapper, i.e. optimize for binary size.
Note that this only applies for registration API calls where we can actually inline it, i.e. calls with `TORCH_FN` or some of the old API calls.
Registrations that give the registration API a runtime function pointer can't inline and won't do so on server either.
Note also that in server builds, all we do is **allow** the compiler to inline. We don't force inlining.
ghstack-source-id: 114177591
Test Plan:
waitforsandcastle
https://www.internalfb.com/intern/fblearner/details/225217260/
Reviewed By: ezyang
Differential Revision: D23045772
fbshipit-source-id: f74fd600eaa3f5cfdf0da47ea080801a03db7917
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43985
Added
```
def(detail::SelectiveStr<true>, ...)
impl(detail::SelectiveStr<true>, ...)
```
in torch/library, which can also be used for other templated selective registration.
Size saves for this diff:
fbios-pika: 78 KB
igios: 87 KB
Test Plan: Imported from OSS
Reviewed By: ljk53, smessmer
Differential Revision: D23459774
Pulled By: iseeyuan
fbshipit-source-id: 86d34cfe8e3f852602f203db06f23fa99af2c018
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39401
This uses the technique proposed by smessmer in D16451848 to selectively
register operators without codegen. See the Note inside for more
details.
This PR has feature parity with the old selective build apparatus:
it can whitelist schema def()s, impl()s, and on a per dispatch key
basis. It has expanded dispatch key whitelisting, whereas previously
manually written registrations were not whitelisted at all. (This
means we may be dropping dispatch keys where we weren't previously!)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D21905593
Pulled By: ezyang
fbshipit-source-id: d4870f800c66be5ce57ec173c9b6e14a52c4a48b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
ghstack-source-id: 106194240
Test Plan: waitforsandcastle
Differential Revision: D22126701
fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
supersedes D17485438
ghstack-source-id: 106178820
Test Plan: waitforsandcastle
Differential Revision: D21534052
fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39823
Add a compile time function pointer that can be used to pass function pointers in template args.
This is very useful for metaprogramming function wrappers.
ghstack-source-id: 105944072
Test Plan: waitforsandcastle
Differential Revision: D21986243
fbshipit-source-id: a123571c18aa0e65908cbb131f28922ceb59061c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38739
Instead of codegenning the named tensor support checks into
CPUType/CUDAType, we instead add a new dispatch key that is put
into tensor whenever it has names. By default, the fallback
implementation says that named tensors are not supported, but
if they are supported, we register a fallthrough which lets
us through to the true backend implementation.
There are a bunch of small pieces which are necessary to make this
happen:
- NameMode now also excludes DispatchKey::Named from the dispatch set
- To avoid bad error messages, we add a teensy special case to
the dispatcher for named_not_supported_kernel: if we see that
the boxed kernel we need to invoke from unboxed is this kernel,
but we don't support boxing, but it's a kernel which is known
to not need boxing, we just pass in nullptr for the stack.
The special case here is very nice: it doesn't affect the fast
path and only gets exercised when things are not supported.
- I need to add support for per operator fallthrough registration.
This is done similarly to how we support fallthrough fallback,
by just keeping track if the registered kernel for an operator
is a fallthrough.
It is possible we could go even further down this path, and move
the named tensor logic itself into this key. I leave this
up to future work.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21662643
Pulled By: ezyang
fbshipit-source-id: 5bc6ae14a1f600189bd8bf865f74dd1700d932f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742
Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before. Instead of
```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```
you write
```
TORCH_LIBRARY(Namespace, m) {
m.class_<Class>("Class")
.def("foo", foo);
}
```
All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.
custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D21089648
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507