Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58889
fixes https://github.com/pytorch/pytorch/issues/58796
Planning on re-testing locally tomorrow morning to confirm, but this change should fix the non-determinism in the codegen output that was causing `ccache` not to re-use its cached output.
I built from the commit referenced in https://github.com/pytorch/pytorch/issues/58796 a few times and ran `diff -Naur` on the codegen output in `build/aten/src/ATen`. After a few tries, `NativeFunctions.h` had a few diffs. The diffs were all related to the ordering of functional/inplace/out variants of a NativeFunctionGroup, which looked non-deterministic.
That looks like it's coming from my calling `set()` to filter out duplicate NativeFunction declarations. The earlier version of the codegen also called `set()` to filter out duplicates, but it did so individually for each `NativeFunction` object, before merging the groups (I'm not too sure why this didn't introduce non-determinism before. though). With the refactor from https://github.com/pytorch/pytorch/pull/57361, we're calling `set()` on the declarations from every operator for a given DispatchKey, which is probably what introduced the nondeterminism.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D28675941
Pulled By: bdhirsh
fbshipit-source-id: bb66de00aafeeb9720d85e8156ac9f7539aed0d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57510
This is a re-write of https://github.com/pytorch/pytorch/pull/56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details.
The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes:
- I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D28474364
Pulled By: bdhirsh
fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361
Data model change in the codegen, which splits backend-specific information out of `NativeFunction`
### Overview
Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types.
Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key.
Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend.
There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends.
This has a few advantages:
- We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index
- The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend.
- There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved.
An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants.
### Data Model Changes
Removed 3 classes, which are used by the external codegen:
- ExternalBackendFunction
- ExternalBackendFunctionsGroup
- ExternalBackendMetadata
And added two new ones:
- BackendIndex
- BackendMetadata
`BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op).
`BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured.
### Questions
I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out:
**Dealing with `structured`.**
This PR separates out the notion of `structured` into two bits of information:
- Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`.
- Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper.
I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not.
I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything *too* controversial related to structured in the change, but I tried to call out any areas in the comments
**Localizing the fact that external backends follow Dispatcher convention.**
Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions.
Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away.
**Thoughts on the `BackendIndex` / `BackendMetadata` breakdown.**
One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D28474362
Pulled By: bdhirsh
fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56601
Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged
This reverts commit 90e532f3ef.
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27915305
Pulled By: bdhirsh
fbshipit-source-id: 491a025c44221690dad849f9a2166934130c0fec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55351
We incorrectly used `Tensor&` to mean "the underlying
TensorImpl cannot be changed", as explained in
https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 .
This diff gets us on the path to fixing this problem: we have an
incremental way to fix individual native functions so that we can
apply any handwritten fixes a few at a time. It gets the migration
started with the `resize` family of native functions.
ghstack-source-id: 127092677
Test Plan: fitsships
Reviewed By: ezyang
Differential Revision: D27583983
fbshipit-source-id: 4eeeec85f5d268e9d0f1645eb9396914a9f9557f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56307
This should fix https://github.com/pytorch/pytorch/issues/56273. I tested these changes locally by making them directly on top of https://github.com/pytorch/pytorch/pull/56151, and running the xla tests (`xla/test/cpp/build/test_ptxla`).
**Current state:** For ops that are ported to structured, If external backends like XLA have implemented the `out` op but not the `functional` version, they will call into our code-generated `CompositeExplicitAutograd` kernel, which calls the structured operator's `meta()` function and then redispatches to the external backend's `out` function.
If XLA has registered their own kernel to the `functional` variant of the op, it'll override our codegen'd composite kernel. XLA has logic to code-generate "CPU fallback" kernels for "required" ops. It gets this information based off of `RegistrationDeclarations.yaml`. That info was technically incorrect up until this PR, since we were code-generating `inplace/functional` composite kernels for structured ops, but not updating `RegistrationDeclarations.yaml` with that information.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27883950
Pulled By: bdhirsh
fbshipit-source-id: fe896b0d2bbd4369490dcdf7a87f227fd3d8b8b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047
Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace.
Important changes are in `types.py`.
How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor".
This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`.
One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :)
I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27708349
Pulled By: bdhirsh
fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046
Updating `returns` in the codegen to return a CType instead of a raw string.
This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces.
I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types.
There's a little bit of weirdness, types that are currently *only* used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27708348
Pulled By: bdhirsh
fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html
This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).
This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838
Test Plan: CI. You can also run `flake8` locally.
Reviewed By: jbschlosser
Differential Revision: D27724232
Pulled By: samestep
fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470
```
git grep -l 'DefaultBackend' | xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g'
```
Plus a quick fixup in native/README.md
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D27253240
Pulled By: ezyang
fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466
I had to very carefully audit all the use sites since there are a lot
of other uses of the string Math; I did most of the conversion by
grepping for all occurrences of Math and then doing a search
replace.
I also updated documentation for clarity.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27253239
Pulled By: ezyang
fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427
A StructuredNativeFunctions is no longer guaranteed to actually
be structured (test structured property for that), so we rename
this to a more neutral name.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27235380
Pulled By: ezyang
fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54426
Previously, we only put NativeFunctions in StructuredNativeFunctions
if the out variant advertised that the kernel was structured. However,
there are a few code generation things that can take advantage of
this trio structure, even if the kernel itself hasn't been ported
to be structured. So better to always group things when they are
related, and then let clients decide whether or not to use the
structure or throw it away.
While doing this, I had hoped that there weren't any functional/inplace
pairs that didn't also have an out variant. This turned out to not
be true. These are probably all oversights and should get fixed at
some point.
Bill of changes:
- The actual operational change happens in
StructuredNativeFunctions.from_dict; then I need to relax some
__post_init__ invariants. To tell if a StructuredNativeFunctions
is actually structured, there is a new structured property, which
is queried from a few new locations in code
- Refactor native_functions.py into gen_structured/gen_unstructured
functions so I can easily call gen_unstructured from two contexts
I intend to s/StructuredNativeFunctions/NativeFunctionsGroup/ but
for ease of review this rename hasn't been done in this PR.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27235379
Pulled By: ezyang
fbshipit-source-id: d8a15de9abb75b365348ab94e67b830704e30cf0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54419
I'm planning to break it into some helper functions, so let's put it in its own module first.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27235378
Pulled By: ezyang
fbshipit-source-id: c03c5440d2d753859e2c5ec2b2c8b1b82870f03a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53859
The redispatch API wasn't linking properly when static dispatch is enabled. I'm still not sure why this wasn't caught by the static dispatch test in CI- maybe, as swolchok pointed out, we have a flag set somewhere that defers undefined symbols until runtime.
Before, building with static dispatch enabled locally + running `import torch` gave me this error:
```
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/raid/hirsheybar/pytorch/torch/__init__.py", line 197, in <module>
from torch._C import *
ImportError: /raid/hirsheybar/pytorch/torch/lib/libtorch_cpu.so: undefined symbol: _ZN2at10redispatch11logical_or_EN3c1014DispatchKeySetERNS_6TensorERKS3_
>>>
```
Printing the symbol:
```
(pytorch) hirsheybar@devfair017:/scratch/hirsheybar/pytorch$ c++filt _ZN2at10redispatch11logical_or_EN3c1014DispatchKeySetERNS_6TensorERKS3_
at::redispatch::logical_or_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&)
```
Sure enough, the functions defined in `RedispatchFunctions.cpp` don't have the DispatchKeySet argument included. Adding them in this PR.
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D26998735
Pulled By: bdhirsh
fbshipit-source-id: c6c1104e42d13b7ec9d964b7e08d2adc8b344b78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957
This is a simplified version of #51554.
Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.
Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
return at::cpu::empty(size, options, memory_format);
}
// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
return at::math::conj(self);
}
// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
return at::cpu::conj_out(out, self);
}
// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
return at::cpu::conj_out(out, self);
}
// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
return at::defaultbackend::_conj(self);
}
```
For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```
Differential Revision: D26337857
Test Plan: Imported from OSS
Reviewed By: bhosmer
Pulled By: ljk53
fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51585
Some payoff from the stack of refactors. When I initially landed
at::cpu, Brian asked me why I couldn't just separate the anonymous
and namespaced definitions. Well, it used to be annoying. Now it's
not annoying anymore, so go ahead and split them up.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D26209873
Pulled By: ezyang
fbshipit-source-id: 63057d22acfaa0c17229947d9e65ec1193e360ec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508
No substantive changes. The codegen for this file was getting a
bit long so I moved it off into tools.codegen.dest submodule (I
wanted to do tools.codegen.gen but that conflicts with the existing
module; oy vey!) To do this I had to move some other functions around
so that they were more generally accessible. Otherwise
self-explanatory.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D26187856
Pulled By: ezyang
fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51500
I'm going to add some new Target types shortly, so having tighter
types for the individual unions will make it clearer which ones
are valid.
This is also the first use of typing_extensions in the codegen,
and I want to make sure it works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D26187854
Pulled By: ezyang
fbshipit-source-id: 6a9842f19b3f243b90b210597934db902b816c21
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51499
I'm going to turn on at::cpu signatures on for all operators; before
I do it I want to make sure I'm at feature parity everywhere.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D26187855
Pulled By: ezyang
fbshipit-source-id: 8fdfd9d843fc98435b1f1df8b475d3184d87dc96
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51490
Mutable Tensor ref is a source of endless confusion for kernel writers;
if we're going to make everyone rewrite their kernels, might as well
also get rid of mutable Tensor& while we're at it.
This is a refactor-then-small-update double whammy. The refactor
is to separate tools.codegen.api.structured from api.native for
describing the type signatures of structured kernels (previously,
I was naughtily reusing native for this purpose--now I need it to
behave differently as Tensor). This started off as a copy paste, but
since there are not that many structured kernels so far I could delete
all of the legacy logic from native that didn't make sense (without
having to go out and fix all the use sites all at once).
One more small addition was teaching translate to convert Tensor& to const Tensor&.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D26182413
Pulled By: ezyang
fbshipit-source-id: ed636866add3581179669cf9283f9835fcaddc06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51245
Splitting this out from #51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
#51164.
ghstack-source-id: 120697499
Test Plan:
double-check effect on empty benchmark with perf stat;
didn't move
Reviweers: ezyang, messmer
Reviewed By: ezyang
Differential Revision: D26112627
fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51115
Add enum type for dispatch key. Prepare to implement the DispatchTable
computation logic in python for static dispatch.
Verified byte-for-byte compatibility of the codegen output.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D26077430
Pulled By: ljk53
fbshipit-source-id: 86e74f3eb32266f31622a2ff6350b91668c8ff42
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50794
Original commit changeset: b4a7948088c0
There are some subtle extra tweaks on top of the original. I can unbundle them, but I've opted to keep it with the port because it's the easiest way to make sure the changes are exercised.
* There's a bugfix in the codegen to test if a dispatch key is structured *before* short circuiting because the dispatch key was missing in the table. This accounts for mixed structured-nonstructured situations where the dispatch table is present, but the relevant structured key isn't (because the dispatch table only exists to register, e.g., QuantizedCPU)
* Dispatch tables for functions which delegate to structured kernels don't have Math entries from generated for them.
* It's now illegal to specify a structured dispatch key in a delegated structured kernel (it will be ignored!) add is now fixed to follow this
* There are some extra sanity checks for NativeFunctions validation
* Finally, unlike the original PR, I switched the .vec variant of upsample_nearest2d to also be DefaultBackend, bringing it inline with upsample_nearest1d.
ghstack-source-id: 120038038
Test Plan:
```
buck test mode/dev //coreai/tiefenrausch:python_tests -- --exact 'coreai/tiefenrausch:python_tests - test_can_run_local_async_inference_cpu (coreai.tiefenrausch.tests.python_test.TiefenrauschPY)' --run-disabled
```
Reviewed By: ngimel
Differential Revision: D25962873
fbshipit-source-id: d29a9c97f15151db3066ae5efe7a0701e6dc05a3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49505
I have a problem which is that static runtime needs a way to bypass
dispatch and call into kernels directly. Previously, it used
native:: bindings to do this; but these bindings no longer exist
for structured kernels! Enter at::cpu: a namespace of exactly
at:: compatible functions that assume all of their arguments are
CPU and non-autograd! The header looks like this:
```
namespace at {
namespace cpu {
CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & upsample_nearest1d_out(Tensor & out, const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d(const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor & upsample_nearest1d_backward_out(Tensor & grad_input, const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d_backward(const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);
}}
```
This slows down static runtime because these are not the "allow
resize of nonzero tensor" variant binding (unlike the ones I had manually
written). We can restore this: it's a matter of adding codegen smarts to
do this, but I haven't done it just yet since it's marginally more
complicated.
In principle, non-structured kernels could get this treatment too.
But, like an evil mastermind, I'm withholding it from this patch, as an extra
carrot to get people to migrate to structured muahahahaha.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25616105
Pulled By: ezyang
fbshipit-source-id: 84955ae09d0b373ca1ed05e0e4e0074a18d1a0b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164
This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops.
This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet.
ghstack-source-id: 119450487
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25462977
fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49498
In the near future, I want to code generate some functions that are
visible externally to this compilation unit. I cannot easily do this
if all the codegen code is wrapped in a global anonymous namespace,
so push the namespace in.
Registration has to stay in an anonymous namespace to avoid name conflicts.
This could also have been solved by making the wrapper functions have
more unique names but I didn't do this in the end.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD, smessmer
Differential Revision: D25616104
Pulled By: ezyang
fbshipit-source-id: 323c0dda05a081502aab702f359a08dfac8c41a4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735
This is the final wave of autograd codegen data model migration.
After this PR:
- autograd codegen no longer depends on Declarations.yaml;
- autograd codegen sources are fully type annotated and pass mypy-strict check;
To avoid potential merge conflicts with other pending PRs, some structural
changes are intentionally avoided, e.g. didn't move inner methods out, didn't
change all inner methods to avoid reading outer function's variables, and etc.
Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
.jenkins/pytorch/codegen-test.sh <baseline_output_dir>
.jenkins/pytorch/codegen-test.sh <test_output_dir>
Then run diff to compare the generated files:
diff -Naur <baseline_output_dir> <test_output_dir>
```
Confirmed clean mypy-strict run:
```
mypy --config mypy-strict.ini
```
Test Plan: Imported from OSS
Reviewed By: ezyang, bhosmer
Differential Revision: D25678879
Pulled By: ljk53
fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49490
No reason to let people to do the legacy thing for the brand new kernel.
This simplifies the codegen. I have to port the two structured kernels
to this new format.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25595406
Pulled By: ezyang
fbshipit-source-id: b5931873379afdd0f3b00a012e0066af05de0a69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49489
Previously, it was done at a use site, but that meant other use
sites don't get the right logic. Pushing it in makes sure everyone
gets it.
I also fixed one case of confusion where defn() was used to define a decl().
If you want to define a declaration with no defaults, say no_default().decl()
which is more direct and will give us code reviewers a clue if you should
have pushed this logic in.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25595407
Pulled By: ezyang
fbshipit-source-id: 89c664f0ed4d95699794a0d3123d54d0f7e4cba4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348
This is a redux of #45666 post refactor, based off of
d534f7d4c5
Credit goes to peterbell10 for the implementation.
Fixes#43945.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25594004
Pulled By: ezyang
fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49341
I noticed that #49097 was using manual_kernel_registration incorrectly,
so this diff tightens up the testing so that:
1. We don't generate useless wrapper functions when manual_kernel_registration
is on (it's not going to be registered, so it does nothing).
2. manual_kernel_registration shouldn't affect generation of functions in
Functions.h; if you need to stop bindings, use manual_cpp_binding
3. Structured and manual_kernel_registration are a hard error
4. We raise an error if you set dispatch and manual_kernel_registration at the
same time.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25594003
Pulled By: ezyang
fbshipit-source-id: 655b10e9befdfd8bc95f1631b2f48f995a31a59a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245
This will make it easier to implement the POC in
d534f7d4c5
see also https://github.com/pytorch/pytorch/pull/45666
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25594005
Pulled By: ezyang
fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.
Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496
Reviewed By: malfet, samestep
Differential Revision: D25600726
Pulled By: janeyx99
fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49510
Adding old style operators with out arguments will break XLA. This prevents that. See for background: https://fb.workplace.com/groups/pytorch.dev/permalink/809934446251704/
This is a temporary change that will prevent this breakage for the next couple of days until the problem is resolved for good.
It will be deleted in https://github.com/pytorch/pytorch/pull/49164 then.
ghstack-source-id: 118756437
(Note: this ignores all push blocking failures!)
Test Plan: waitforsandcastle
Reviewed By: bhosmer
Differential Revision: D25599112
fbshipit-source-id: 6b0ca4da4b55da8aab9d1b332cd9f68e7602301e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122
cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes.
This PR is a bit of a whopper. Here is the order to review it.
- tools/codegen/api/types.py
- Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions.
- Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the *semantic C++ type* of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts.
- I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator.
- All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet
- tools/codegen/api/cpp.py
- We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens).
- `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation
- tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator
- tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works.
- Everything else: just some small call site tweaks for places when I changed API.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D25455887
Pulled By: ezyang
fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6