Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45277
Implements structured kernels as per https://github.com/pytorch/rfcs/pull/9 and ports upsample_nearest1d to use the framework.
The general structure of this diff:
- Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model
- NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model
- When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go.
- The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out.
- Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like.
Missing pieces:
- Stride calculation in TensorMeta
- Sufficient sanity checking for inplace/out variants
- Enough rope to make TensorIterator work
This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});`
* Functional: before 1314105, after 1150705
* Out: before 915705, after 838405
These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D24253555
Test Plan: Imported from OSS
Reviewed By: smessmer
Pulled By: ezyang
fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47011
smessmer has complained about how it is difficult to find generated
code. Well hopefully this diffs helps a bit with that.
There are three components to this refactor:
- Rename TypeDerived (CPUType) to RegisterDispatchKey (RegisterCPU).
The 'Type' nomenclature is vestigial and I think Register says
what these files do a lot more clearly. I also got rid of
the CPUType namespace; everything just goes in anonymous
namespace now, less moving parts this way.
- Give Math and DefaultBackend their own files (RegisterMath and
RegisterDefaultBackend)
- Restructure code generation so that schema definition is done
completely separately from RegisterDispatchKey
I decided to name the files RegisterCPU rather than the old convention
BackendSelectRegister, because it seems better to me if these
files clump together in an alphabetical listing rather than being
spread out everywhere. There are a few manual registration files
which should probably get similar renaming.
I also did a little garden cleaning about how we identify if a
dispatch key is a cuda key or a generic key (previously called
KEYWORD_ALL_BACKENDS but I like my naming better).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D24600806
Test Plan: Imported from OSS
Reviewed By: smessmer
Pulled By: ezyang
fbshipit-source-id: c1b510dd7515bd95e3ad25b8edf961b2fb30a25a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47008
bhosmer has been complaining about how it is difficult to distinguish
between local variables and closed over variables in the higher order
functions. Well, closures and objects do basically the same thing, so
just convert all these HOFs into objects.
The decoder ring:
- Higher order function => Constructor for object
- Access to closed over variable => Access to member variable on object
- with_native_function => method_with_native_function (because it's
hard writing decorators that work for both functions and methods)
I didn't even have to change indentation (much).
When there is no need for closed over variables (a few functions), I
kept them as plain old functions, no need for an object with no
members.
While I was at it, I also deleted the kwargs, since the types are
enough to prevent mistakes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D24600805
Pulled By: ezyang
fbshipit-source-id: 7e3ce8cb2446e3788f934ddcc17f7da6e9299511
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47002
There was no good reason for TypeDerived.h (CPUType.h) codegen
to exist after static dispatch was deleted, and now that we
have Math alias key TypeDefault.h header is not needed either.
Sorry to anyone who was using these out of tree.
I didn't entirely delete TypeDefault.h as it has a use in
a file that I can't conveniently compile test locally. Will
kill it entirely in a follow up.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D24596583
Pulled By: ezyang
fbshipit-source-id: b5095d3509098ff74f836c5d0c272db0b2d226aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46991
This change is motivated by a problem bdhirsh observed which is
that in internal builds that include both SchemaRegister.cpp
and TypeDefault.cpp, some operators have their schemas defined
multiple times. Instead of dumping schema registrations in
multiple files, it seems better to just toggle how many schemas
we write into TypeDefault.cpp.
ljk53 observes that technically SchemaRegister.cpp is only needed by
full-JIT frontend, and not by light interpreter (to resolve schema
lookups). However, in practice, the registration file seems to be
unconditionally loaded. This change will make it harder to do the
optimization where we drop schemas in the light interpreter, but you
probably want to architect this differently (similar to per-op
registrations, DON'T do any registrations in ATen, and then write out
the schema registrations in a separate library.)
I took this opportunity to also simplify the TypeDefault generation
logic by reworking things so that we only ever call with None argument
when registering. Soon, we should be able to just split these
files up entirely.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D24593704
Pulled By: ezyang
fbshipit-source-id: f01ea22a3999493da77b6e254d188da0ce9adf2f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46970
Now that catchall declarations are reinterpreted as registrations to
dispatch key Math, we can now simplify code generation logic by directly
generating to Math, and bypasing logic for catchall. This also helps
avoid bugs where we incorrectly classify some kernels as Math and others
as not, even though they get registered in the same way.
Bill of changes:
- Give Math its own unique TORCH_LIBRARY_IMPL
- Make it so NativeFunction.dispatch is always non-None. Simplify
downstream conditionals accordingly
- When parsing NativeFunction, fill in missing dispatch with a
singleton Math entry (pointing to the cpp.name!)
One thing that is a little big about this change is a lot of kernels
which previously didn't report as "math" now report as math. I picked
a setting for these booleans that made sense to me, but I'm not sure
if e.g. XLA will handle it 100% correctly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D24592391
Pulled By: ezyang
fbshipit-source-id: 2e3355f19f9525698864312418df08411f30a85d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46938
It turns out that after https://github.com/pytorch/pytorch/pull/42194
landed we no longer actually generate any registrations into this
file. That means it's completely unnecessary.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: IvanKobzarev
Differential Revision: D24573518
Pulled By: ezyang
fbshipit-source-id: b41ada9e394b780f037f5977596a36b896b5648c
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal
Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462
Reviewed By: zou3519
Differential Revision: D24422343
Pulled By: ezyang
fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722
This diff does a bunch of things:
1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files.
2. Adds helper methods to combine operators, debug info, operator lists, etc...
3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that.
4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list.
5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string.
There are a bunch of structural changes as well:
1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names)
2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments
3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library).
ghstack-source-id: 114578753
(Note: this ignores all push blocking failures!)
Test Plan:
`buck test caffe2/test:selective_build`
Generated YAML files after the change:
{P143981979}
{P143982025}
{P143982056}
Ensure that the generated files are same before and after the change:
```
[dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name "*.cpp" | xargs md5sum
d72c3d125baa7b77e4c5581bbc7110d2 ./after_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f ./after_change/lite_predictor_lib_aten/TypeDefault.cpp
d72c3d125baa7b77e4c5581bbc7110d2 ./before_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f ./before_change/lite_predictor_lib_aten/TypeDefault.cpp
```
`VariableTypes_N.cpp` are generated the same both before and after the change:
```
[dhruvbird@devvm2490 /tmp/VariableType] find -name "*.cpp" | xargs -n 1 md5sum | sort
3be89f63fd098291f01935077a60b677 ./after/VariableType_2.cpp
3be89f63fd098291f01935077a60b677 ./before/VariableType_2.cpp
40a3e59d64e9dbe86024cf314f127fd6 ./after/VariableType_4.cpp
40a3e59d64e9dbe86024cf314f127fd6 ./before/VariableType_4.cpp
a4911699ceda3c3a430f08c64e8243fd ./after/VariableType_1.cpp
a4911699ceda3c3a430f08c64e8243fd ./before/VariableType_1.cpp
ca9aa611fcb2a573a8cba4e269468c99 ./after/VariableType_0.cpp
ca9aa611fcb2a573a8cba4e269468c99 ./before/VariableType_0.cpp
e18f639ed23d802dc4a31cdba40df570 ./after/VariableType_3.cpp
e18f639ed23d802dc4a31cdba40df570 ./before/VariableType_3.cpp
```
Reviewed By: ljk53
Differential Revision: D23837010
fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45990
In #45890 we introduced the concept of a CppSignature, which bundled
up all of the information necessary to declare a C++ signature for
the cpp API. This PR introduces analogous concepts for dispatcher
and native: DispatcherSignature and NativeSignature.
The three interfaces are not particularly well coupled right now,
but they do have some duck typing coincidences:
- defn() which renders the C++ definition "bool f(int x)"
- decl() which renders the C++ declaration "bool f(int x = 2)"
- type() which renders the C++ function type "bool(int)"
Maybe at some point we'll introduce a Protocol, or a supertype.
Many other methods (like arguments()) have varying types. These
signatures also have some helper methods that forward back to real
implementations in the api modules. Something to think about is
whether or not we should attempt to reduce boilerplate here or
not; I'm not too sure about it yet.
The net effect is we get to reduce the number of variables we
have to explicitly write out in the codegen, since now these are all
bundled together into a signature. Something extra special happens
in BackendSelect, where we now dynamically select between dispatcher_sig
and native_sig as "how" the backend select is implemented.
A little bit of extra cleanup:
- Some places where we previously advertised Sequence, we now advertise
a more informative Tuple.
- defn() may take an optional positional parameter overriding the entire
name, or a kwarg-only prefix parameter to just add a prefix to the
name.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D24223100
Pulled By: ezyang
fbshipit-source-id: f985eced08af4a60ba9641d125d0f260f8cda9eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45975
I reordered declarations in the faithful API reimplementation to
make sure the diffs lined up nicely; they're not necessary now.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D24223102
Pulled By: ezyang
fbshipit-source-id: 77c6ae40c9a3dac36bc184dd6647d6857c63a50c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45974
The term "legacy dispatcher" caused a bunch of confusion between
me and Sebastian when discussing what the intended semantics of
legacy dispatcher argument is. Legacy dispatcher argument implies
that you ought NOT to use it when you have use_c10_dispatcher: full;
but that's not really what's going on; legacy dispatcher API describes
the API that you write native:: functions (NativeFunctions.h) to.
Renaming it here makes this more clear.
I applied these seds:
```
git grep -l 'legacy_dispatcher' | xargs sed -i 's/legacy_dispatcher/native/g'
git grep -l 'legacydispatcher' | xargs sed -i 's/legacydispatcher/native/g'
git grep -l 'LegacyDispatcher' | xargs sed -i 's/LegacyDispatcher/Native/g'
```
and also grepped for "legacy" in tools/codegen and fixed documentation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D24223101
Pulled By: ezyang
fbshipit-source-id: d1913b8b823b3b95e4546881bc0e876acfa881eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45918
This groups together related native functions (functional, inplace, out)
into a single group. It's not used by anything but Jiakai said this
would be useful for his stuff so I'm putting it in immediately.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D24163526
Pulled By: ezyang
fbshipit-source-id: 9979b0fe9249c78e4a64a50c5ed0e2ab99f499b9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45890
This rewrite is as per my comments at https://github.com/pytorch/pytorch/pull/44087#issuecomment-701664506
I did the rewrite by reverting #44087 and then reimplementing it on top.
You may find it easier to review by diffing against master with only #44087
reverted.
There are two main ideas.
First, we now factor cpp argument processing into two phases operating
on three representations of data:
1. `FunctionSchema` - this is the source from native_functions.yaml
2. `Union[Argument, ThisArgument, TensorOptionsArgument]` - this is
the arguments after doing some basic semantic analysis to group
them (for TensorOptions) or identify the this argument (if this
is a method). There is only ever one of these per functions.
3. `Union[CppArgument, CppThisArgument, CppTensorOptionsArgument]` -
this is the arguments after we've elaborated them to C++. There
may be multiple of these per actual C++ signature.
You can think of (2) as common processing, whereas (3) bakes in specific
assumptions about whether or not you have a faithful or non-faithful
signature.
Second, we now have CppSignature and CppSignatureGroup representing
the *total* public C++ API signature. So those dataclasses are what
know how to render definitions/declarations, and you no longer have
to manually type it out in the Functions/TensorMethods codegen.
Here is an exhaustive accounting of the changes.
tools.codegen.api.types
- CppSignature and CppSignatureGroup got moved to tools.codegen.api.types
- Add new CppThisArgument and CppTensorOptionsArguments (modeled off
of ThisArgument and TensorOptionsArguments) so that we can retain
high level semantic structure even after elaborating terms with C++
API information. Once this is done, we can refine
CppArgument.argument to no longer contain a ThisArgument (ThisArgument
is always translated to CppThisArgument. Note that this doesn't
apply to TensorOptionsArguments, as those may be expanded or not
expanded, and so you could get a single CppArgument for 'options')
- Add no_default() functional mutator to easily remove default arguments
from CppArgument and friends
- Add an explicit_arguments() method to CppArgument and friends to
extract (flat) argument list that must be explicitly written in the signature.
This is everything except (Cpp)ThisArgument, and is also convenient
when you don't care about the extra structure of
CppTensorOptionsArguments
tools.codegen.api.cpp
- group_arguments is back, and it doesn't send things directly to a
CppSignatureGroup; instead, it moves us from representation (1) to (2)
(perhaps it should live in model). Here I changed my mind from my
PR comment; I discovered it was not necessary to do classification at
grouping time, and it was simpler and easier to do it later.
- argument got split into argument_not_this/argument/argument_faithful.
argument and argument_faithful are obvious enough what they do,
and I needed argument_not_this as a more refined version of argument
so that I could get the types to work out on TensorOptionsArguments
tools.codegen.api.dispatcher
- Here we start seeing the payoff. The old version of this code had a
"scatter" mode and a "gather" mode. We don't need that anymore:
cppargument_exprs is 100% type-directed via the passed in cpp
arguments. I am able to write the functions without any reference
to use_c10_dispatcher
tools.codegen.gen
- Instead of having exprs_str and types_str functions, I moved these to
live directly on CppSignature, since it seemed pretty logical.
- The actual codegen for TensorMethods/Functions is greatly simplified,
since (1) all of the heavy lifting is now happening in
CppSignature(Group) construction, and (2) I don't need to proxy one
way or another, the new dispatcher translation code is able to handle
both cases no problem. There is a little faffing about with ordering
to reduce the old and new diff which could be removed afterwards.
Here are codegen diffs. For use_c10_dispatcher: full:
```
+// aten::_cudnn_init_dropout_state(float dropout, bool train, int dropout_seed, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
Tensor _cudnn_init_dropout_state(double dropout, bool train, int64_t dropout_seed, const TensorOptions & options) {
- return _cudnn_init_dropout_state(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
+ static auto op = c10::Dispatcher::singleton()
+ .findSchemaOrThrow("aten::_cudnn_init_dropout_state", "")
+ .typed<Tensor (double, bool, int64_t, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
+ return op.call(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
```
Otherwise:
```
+// aten::empty_meta(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
Tensor empty_meta(IntArrayRef size, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<Device> device, c10::optional<bool> pin_memory, c10::optional<MemoryFormat> memory_format) {
- return empty_meta(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format);
+ static auto op = c10::Dispatcher::singleton()
+ .findSchemaOrThrow("aten::empty_meta", "")
+ .typed<Tensor (IntArrayRef, const TensorOptions &, c10::optional<MemoryFormat>)>();
+ return op.call(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format);
}
```
Things that I probably did not get right:
- The Union[Argument, TensorOptionsArguments, ThisArgument] and
the Cpp variants are starting to get a little unwieldy. Not sure if
this means I should add a supertype (or at the very least an
alias); in some cases I do purposely omit one of these from the Union
- Code may not necessarily live in the most logical files. There isn't
very much rhyme or reason to it.
- The fields on CppSignature. They're not very well constrained and
it will be better if people don't use them directly.
- Disambiguation. We should do this properly in #44087 and we don't
need special logic for deleting defaulting for faithful signatures;
there is a more general story here.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D24144035
Pulled By: ezyang
fbshipit-source-id: a185f8bf9df8b44ca5718a7a44dac23cefd11c0a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45742
Add a new flag to native_functions.yaml: `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures`
and the codegen only wraps kernels in the aforementioned wrapper if that flag is set.
Apart from that, `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` is equivalent to `full`,
i.e. it has full boxing and unboxing support.
This greatly reduces the number of ops we apply the hacky_wrapper to, i.e. all ops marked as `use_c10_dispatcher: full` don't have it anymore.
ghstack-source-id: 113982139
Test Plan:
waitforsandcastle
vs fbcode:
https://www.internalfb.com/intern/fblearner/details/214511705/
vs base diff:
https://www.internalfb.com/intern/fblearner/details/214693207/
Reviewed By: ezyang
Differential Revision: D23328718
fbshipit-source-id: be120579477b3a05f26ca5f75025bfac37617620
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45665Fixes#43944
Note that the codegen doesn't use a proper parser so, in the same way as with lists, the string `, ` cannot appear in defaults or it will be interpreted as a splitting point between arguments.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D24141835
Pulled By: ezyang
fbshipit-source-id: 578127861fd2504917f4486c44100491a2c40343
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44087
Each op taking a TensorOptions argument now has an additional overload in the C++ frontend where it takes scattered ScalarType, Layout, Device, bool instead of one TensorOptions argument.
If it is a c10-full op, then the scattered version calls into the dispatcher and the gathered version is a proxy calling into the scattered version.
If it is a non-c10-full op, then the gathered version calls into the dispatcher and the scattered version is a proxy calling into the gathered version.
This should minimize the amount of gathering and scattering needed.
This PR is also a prerequisite to remove the re-gathering of arguments that is currently happening in VariableKernel. Currently, VariableKernels gather arguments into a TensorOptions object
to call into the C++ API. In a PR stacked on top of this, VariableKernel will just directly call into the scattered C++ API introduced here and avoid the gathering step.
ghstack-source-id: 113355689
Test Plan:
waitforsandcastle
vs master: https://www.internalfb.com/intern/fblearner/details/216169815/
vs previous diff: https://www.internalfb.com/intern/fblearner/details/216169957/
Reviewed By: ezyang
Differential Revision: D23492188
fbshipit-source-id: 3e84c467545ad9371e98e09075a311bd18411c5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062
Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch.
Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them.
ghstack-source-id: 112825789
Test Plan:
vs master: https://www.internalfb.com/intern/fblearner/details/216117032/
vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/
Reviewed By: ezyang
Differential Revision: D23484192
fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45134
Per-Op-Registration was a mechanism used for mobile selective build v0. Since then, a new dispathing mechanism has been built for PyTorch, and this code path isn't used any more. Remove it to simplify understanding/updating the code-generator's code-flow.
ghstack-source-id: 112723942
Test Plan: `buck build` and sandcastle.
Reviewed By: ezyang
Differential Revision: D23806632
fbshipit-source-id: d93cd324650c541d9bfc8eeff2ddb2833b988ecc
Summary:
This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys.
A few things are handled in this PR:
- Update alias dispatch key mapping and precompute dispatchTable logic
- Move `Autograd` key from `always_included` set to TensorImpl constructor.
- Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test.
- Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen)
A few planned followups ordered by priority:
- [cleanup] Update `test_dispatch.py` to include testing `Autograd`.
- [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`)
- [new feature] Add support for Math in native_functions.yaml
- [cleanup] Add iterator like functionality to DispatchKeySet
- [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070
Reviewed By: ezyang
Differential Revision: D23281535
Pulled By: ailzhang
fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629
How to approach reviewing this diff:
- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D23183978
Pulled By: ezyang
fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763