This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
Context
-------
This PR adds a new fallback to the Autograd dispatch keys.
If you would prefer the old behavior:
- A quick (unsupported) way to get the previous behavior is to call
`torch._C._set_autograd_fallback("nothing")`
- Register "torch::CppFunction::makeFallthrough()" to your Autograd key,
like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8
It is possible that this PR regresses performance of overhead-bound
models. If this is the case, please reach out (and apply one of the
temporary fixes in the previous section).
Description for reviewers
-------------------------
In order to deprecate registering autograd kernels at not an autograd
key, we add a fallback to the Autograd dispatch keys. This fallback
raises a warning if the user attempts to backprop through the operator
and is also configurable to either warn or not warn.
The goal of this PR is to
- preserve as much BC as possible
- raise a warning that whatever the user is doing is potentially wrong.
- be as performant as possible
There are roughly two cases:
- if the post-autograd kernels return a Tensor that requires grad, then
we install an autograd hook that raises a warning. We are preserving BC
in that it is possible that the user has a torch::autograd::Function
registered to their CPU key.
- if the post-autograd kernels return Tensors that do not require grad,
then we make them require_grad and install a WarnNotImplemented grad fn
that warns in the backward pass. This is mildy BC-breaking (see next
section).
Test Plan:
- bunch of new tests
BC-Breaking Note
----------------
This PR adds a new fallback to the Autograd dispatch keys. It affects
custom operators that do not have a kernel registered to the Autograd
keys (e.g. AutogradCPU and AutogradCUDA).
If the previous behavior was that the custom operator would return
Tensors that do not require grad if the inputs do require grad, then
this PR changes it so that all floating-point and complex returns do
require grad. See the "Context" section above for how to get the old
behavior.
Differential Revision: D47408353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105078
Approved by: https://github.com/soulitzer
Context
-------
This PR adds a new fallback to the Autograd dispatch keys.
If you would prefer the old behavior:
- A quick (unsupported) way to get the previous behavior is to call
`torch._C._set_autograd_fallback("nothing")`
- Register "torch::CppFunction::makeFallthrough()" to your Autograd key,
like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8
It is possible that this PR regresses performance of overhead-bound
models. If this is the case, please reach out (and apply one of the
temporary fixes in the previous section).
Description for reviewers
-------------------------
In order to deprecate registering autograd kernels at not an autograd
key, we add a fallback to the Autograd dispatch keys. This fallback
raises a warning if the user attempts to backprop through the operator
and is also configurable to either warn or not warn.
The goal of this PR is to
- preserve as much BC as possible
- raise a warning that whatever the user is doing is potentially wrong.
- be as performant as possible
There are roughly two cases:
- if the post-autograd kernels return a Tensor that requires grad, then
we install an autograd hook that raises a warning. We are preserving BC
in that it is possible that the user has a torch::autograd::Function
registered to their CPU key.
- if the post-autograd kernels return Tensors that do not require grad,
then we make them require_grad and install a WarnNotImplemented grad fn
that warns in the backward pass. This is mildy BC-breaking (see next
section).
Test Plan:
- bunch of new tests
BC-Breaking Note
----------------
This PR adds a new fallback to the Autograd dispatch keys. It affects
custom operators that do not have a kernel registered to the Autograd
keys (e.g. AutogradCPU and AutogradCUDA).
If the previous behavior was that the custom operator would return
Tensors that do not require grad if the inputs do require grad, then
this PR changes it so that all floating-point and complex returns do
require grad. See the "Context" section above for how to get the old
behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104481
Approved by: https://github.com/soulitzer
Our dangling impls were:
- positive_ (the in-place op just never existed)
- unique (something happened to this op, maybe it was renamed)
Test Plan:
- `import functorch; torch._C._dispatch_find_dangling_impls`
- It's difficult to write a test for this because the number of dangling
impls depends on if `test_dispatch` has been run already or not
(test_dispatch adds a dangling impl)
- Can't remove the torchdynamo skip for this yet either
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85299
Approved by: https://github.com/ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926
There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema.
The latter will not add parenthesis around the returned types if that a single item,
but the C++ FunctionSchema always add the parenthesis.
Make them consistent so we can convert one type to the other via its string representation and parse method.
Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/)
Approved by: https://github.com/bdhirsh
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74560
This PR add support for quantized tensors with "unknown quantizer",
which means that we can use standard APIs like torch.empty to allocate
quantized tensors, with the understanding that we will set the
quantizer later. This makes meta functions applicable to quantized
tensors (they will allocate with unknown quantizer and the kernel
will set the quantizer later) and fixes a bug David Dang reported
where structured kernels give a weird error message when you call them
with quantized inputs.
This is not a complete support for quantized structured kernels because
I haven't actually tried porting any of the quantized implementations
to structured; qadd is probably a good choice to try first as it
does its broadcasting implementation using TensorIterator. My goal
here is just to show that the error message is better.
See also https://github.com/pytorch/pytorch/issues/52680
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D35317441
Pulled By: dzdang
fbshipit-source-id: ffb85b0e06ccbcc2b01052ca6760517684048b39
(cherry picked from commit 2a54b8b7bf15912240dc2f12d2cd71dc620001e1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963
This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.
The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.
**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:
Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).
The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.
**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.
**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**
Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?
**The debugging experience was pretty difficult**
Debugging the Milan-specific failure was made difficult by the following:
(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.
(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)
(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542
Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.
Reviewed By: phding, albanD
Differential Revision: D35222806
fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402
The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.
Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728
Test Plan:
Steps to test:
(1) connect to a mobile OD
(2) run `one_world android emulator android-29` in a terminal to start the android emulator
(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`
I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.
Reviewed By: albanD
Differential Revision: D34034848
fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
Summary: I think this diff stack broke all the related tasks below.
Test Plan:
For our failing tests:
buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled
For the ubn:
Not really sure what to do, trying to build the app and see if I can use an effect?
Reviewed By: shoumikhin
Differential Revision: D34018849
fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104
This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information.
Test Plan:
python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext
Imported from OSS
Reviewed By: ezyang
Differential Revision: D29512520
fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61126
adding skeleton implementations of quantized embedding tables with zeroes
Test Plan:
compilation, farm test, and ran test_find_dangling_impls and passed
did a manual negative test and verified the message is printed properly
```
======================================================================
FAIL: test_find_dangling_impls (test_dispatch.TestPythonDispatcher)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/test_dispatch.py", line 892, in test_find_dangling_impls
self.assertEqual(
File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/torch/testing/_internal/common_utils.py", line 1498, in assertEqual
super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! 0 != 1
Expect zero dangling impls, but found: ['name: quantized::qembedding_bag_4bit_unpack\nschema: (none)\nCUDA: registered at caffe2/aten/src/ATen/native/quantized/cuda/embedding_bag.cu:394 :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ]\n']
Reviewed By: walterddr
Differential Revision: D29518274
fbshipit-source-id: d0cb81c8bf51cdc4b83038758131ccf61e4360f5
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.
Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28: print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:
- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
```
test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
```
I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272
Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:
- https://github.com/pytorch/pytorch/runs/2365189927
Reviewed By: janeyx99
Differential Revision: D27830127
Pulled By: samestep
fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470
```
git grep -l 'DefaultBackend' | xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g'
```
Plus a quick fixup in native/README.md
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D27253240
Pulled By: ezyang
fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466
I had to very carefully audit all the use sites since there are a lot
of other uses of the string Math; I did most of the conversion by
grepping for all occurrences of Math and then doing a search
replace.
I also updated documentation for clarity.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27253239
Pulled By: ezyang
fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953
Summary:
`test_dispatch.py` has many asserts about the error message. When running with `TORCH_SHOW_CPP_STACKTRACES=1`, the error message is different from when `TORCH_SHOW_CPP_STACKTRACES=0`, which makes many tests in `test_dispatch.py` fail. This PR fixes these failures when running with `TORCH_SHOW_CPP_STACKTRACES=1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50509
Reviewed By: ngimel
Differential Revision: D25956853
Pulled By: ezyang
fbshipit-source-id: 3b3696742a7dfb8f52f23a364838ec96945c5662
Summary:
This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys.
A few things are handled in this PR:
- Update alias dispatch key mapping and precompute dispatchTable logic
- Move `Autograd` key from `always_included` set to TensorImpl constructor.
- Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test.
- Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen)
A few planned followups ordered by priority:
- [cleanup] Update `test_dispatch.py` to include testing `Autograd`.
- [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`)
- [new feature] Add support for Math in native_functions.yaml
- [cleanup] Add iterator like functionality to DispatchKeySet
- [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070
Reviewed By: ezyang
Differential Revision: D23281535
Pulled By: ailzhang
fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40512
Fixes https://github.com/pytorch/pytorch/issues/32454
The heart of this diff is changing this:
```
inline const KernelFunction& Dispatcher::dispatch_(const DispatchTable& dispatchTable, DispatchKey dispatchKey) c
nst {
const KernelFunction* backendKernel = dispatchTable.lookup(dispatchKey);
if (nullptr != backendKernel) {
return *backendKernel;
}
const auto& backendFallbackKernel = backendFallbackKernels_[dispatchKey];
if (backendFallbackKernel.isValid()) {
return backendFallbackKernel;
}
const KernelFunction* catchallKernel = dispatchTable.lookupCatchallKernel();
if (C10_LIKELY(nullptr != catchallKernel)) {
return *catchallKernel;
}
reportError(dispatchTable, dispatchKey);
}
```
to this:
```
const KernelFunction& OperatorEntry::lookup(DispatchKey k) const {
const auto& kernel = dispatchTable_[static_cast<uint8_t>(k)];
if (C10_UNLIKELY(!kernel.isValid())) {
reportError(k);
}
return kernel;
}
```
The difference is that instead of checking a bunch of places to find the
right kernel to use for an operator, all of the operators are
precomputed into dispatchTable_ itself (so you don't have to consult
anything else at runtime.) OperatorEntry::computeDispatchTableEntry
contains that computation (which is exactly the same as it was before.)
By doing this, we are able to substantially simplify many runtime
components of dispatch.
The diff is fairly large, as there are also some refactors interspersed
with the substantive change:
- I deleted the DispatchTable abstraction, folding it directly into
OperatorEntry. It might make sense to have some sort of DispatchTable
abstraction (if only to let you do operator[] on DispatchKey without
having to cast it to integers first), but I killed DispatchTable to
avoid having to design a new abstraction; the old abstraction wasn't
appropriate for the new algorithm.
- I renamed OperatorEntry::KernelEntry to AnnotatedKernel, and use it
to store backend fallbacks as well as regular kernel registrations
(this improves error messages when you incorrectly register a backend
fallback twice).
- I moved schema_ and debug_ into an AnnotatedSchema type, to make the
invariant clearer that these are set together, or not at all.
- I moved catch-all kernels out of kernels_ into its own property
(undoing a refactor I did before). The main reason I did this was
because our intended future state is to not have a single catch-all,
but rather possibly multiple catch-alls which fill-in different
portions of the dispatch table. This may change some more in
the future: if we allow registrations for multiple types of
catch alls, we will need a NEW data type (representing bundles
of dispatch keys) which can represent this case, or perhaps
overload DispatchKey to also record these types.
The key changes for precomputation:
- OperatorEntry::updateDispatchTable_ is now updated to fill in the
entry at a DispatchKey, considering both kernels (what it did
before) as well as catch-all and backend fallback. There is also
OperatorEntry::updateDispatchTableFull_ which will update the
entire dispatch table (which is necessary when someone sets a
catch-all kernel). OperatorEntry::computeDispatchTableEntry
holds the canonical algorithm specifying how we decide what
function will handle a dispatch key for the operator.
- Because dispatch table entry computation requires knowledge of
what backend fallbacks are (which is recorded in Dispatcher,
not OperatorEntry), several functions on OperatorEntry now
take Dispatcher as an argument so they can query this information.
- I modified the manual boxing wrapper invariant: previously, kernels
stored in kernels_ did NOT have manual boxing wrappers and this
was maintained by DispatchTable. Now, we just ALWAYS maintain
manual boxing wrappers for all KernelFunctions we store.
- DispatchKeyExtractor is greatly simplified: we only need to maintain
a single per-operator bitmask of what entries are fallthrough
(we don't need the global bitmask anymore).
- Introduced a new debugging 'dumpComputedTable' method, which prints
out the computed dispatch table, and how we computed it to be some way.
This was helpful for debugging cases when the dispatch table and
the canonical metadata were not in sync.
Things that I didn't do but would be worth doing at some point:
- I really wanted to get rid of the C10_UNLIKELY branch for
whether or not the KernelFunction is valid, but it looks like
I cannot easily do this while maintaining good error messages.
In principle, I could always populate a KernelFunction which
errors, but the KernelFunction needs to know what the dispatch
key that is missing is (this is not passed in from the
calling convention). Actually, it might be possible to do
something with functors, but I didn't do it here.
- If we are going to get serious about catchalls for subsets of
operators, we will need to design a new API for them. This diff
is agnostic to this question; we don't change public API at all.
- Precomputation opens up the possibility of subsuming DispatchStub
by querying CPU capability when filling in the dispatch table.
This is not implemented yet. (There is also a mild blocker here,
which is that DispatchStub is also used to share TensorIterator
configuration, and this cannot be directly supported by the
regular Dispatcher.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D22236352
Pulled By: ezyang
fbshipit-source-id: d6d90f267078451816b1899afc3f79737b4e128c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40469
- The old testing interface C._dispatch_import was based off the old
c10::import variation, which meant the API lined up in a strange
way with the actual torch/library.h. This diff reduces the
differences by letting you program the Library constructor directly.
- Using this newfound flexibility, we add a test for backend fallbacks
from Python; specifically testing that we disallow registering a
backend fallback twice.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D22236351
Pulled By: ezyang
fbshipit-source-id: f8365e3033e9410c7e6eaf9f78aa32e1f7d55833
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258
Previous we had a && chaining style API. There are some downsides to
this API:
- It's easy to forget the 'static' qualifier in front, leading to
subtle ODR bugs.
- It is not compatible with torchbind class_ definitions, as these
need multiple levels of chaining. So in practice people end
up having to define multiple static initializers, one per class.
- It's not like pybind11.
- There's no way to conveniently get the file and line number of
the registration, as there is no macro point in the API.
- The old API doesn't really encourage people to put all of their
definitions for a library in one place, and to give a custom
namespace for it. Similarly, the old API wasn't very DRY, because
you had to keep repeating the namespace/dispatch key you
were writing implementations for.
The new API is modeled exactly off of the PYBIND11_MODULE macro:
you write:
```
TORCH_LIBRARY(aten, m) {
m.def("aten::add(Tensor self, Tensor other) -> Tensor");
...
}
```
in a non-chaining fashion, and under the hood the macro expands to
define a function, and define a static initializer that allocates
c10::Library (previously called c10::Module, but we renamed it
to avoid confusion with the existing NN module concept), passes
it to your function, and then retains it for the rest of the lifetime
of the program. Specification of the namespace is mandatory,
and in later commit I plan to make it a hard error to TORCH_LIBRARY
the same library name twice.
If you are specifying an implementation for an existing operator
(e.g., you're the XLA backend, or even if you're just putting
registrations for implementations at the implementation site),
you should use TORCH_LIBRARY_IMPL, which instead takes a backend
argument (instead of namespace) and can be used to specify an
implementation for a backend. Unlike TORCH_LIBRARY, you can do
as many of these as you want for a backend.
This needs updates to the mobile code analyzer.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20929257
Pulled By: ezyang
fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada