pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Guo Yejun	68ea9e9df5	gen_backend_stubs.py: fix typo for supported_autograd (#68562 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68562 Reviewed By: jbschlosser Differential Revision: D32758608 Pulled By: bdhirsh fbshipit-source-id: 496e1ec831edaa6fcc586f3c8f0361c31cad4e78	2022-02-07 17:26:04 -08:00
Brian Hirsh	7b8c43cd7c	Revert "Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen" (#69951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69951 This reverts commit `0ef523633f`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33113543 Pulled By: bdhirsh fbshipit-source-id: b28073ee0870b413ea9f617f27671ae5c6f3c696	2022-01-04 14:53:21 -08:00
Brian Hirsh	bb5b4cceb6	Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950 This reverts commit `f6cad53443`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33113545 Pulled By: bdhirsh fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288	2022-01-04 14:52:00 -08:00
Brian Hirsh	5e222d08a1	Revert "Revert D32498572: allow external backend codegen to be used without autograd kernels" (#69949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69949 This reverts commit `33363cea64`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D33113544 Pulled By: bdhirsh fbshipit-source-id: e219f10d52776498c9ad273e97bca3e3406cf702	2021-12-21 08:19:37 -08:00
Peter Bell	7bb4b683b5	Codegen: Registration now only includes the functions used (#68689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68689 Currently Register{DispatchKey}.cpp includes all of `NativeFunctions.h`, so any operator signature change requires all backend registration to be recompiled. However, most backends only have registrations for a small fraction of operators so it makes sense to only include the specific functions required. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596273 Pulled By: albanD fbshipit-source-id: 11d511f47937fbd5ff9f677c9914277b5d015c25	2021-12-15 14:29:32 -08:00
Brian Hirsh	33363cea64	Revert D32498572: allow external backend codegen to be used without autograd kernels Test Plan: revert-hammer Differential Revision: D32498572 (`b83b6f7424`) Original commit changeset: 3e7159c633f6 Original Phabricator Diff: D32498572 (`b83b6f7424`) fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535	2021-12-14 15:28:49 -08:00
Brian Hirsh	f6cad53443	Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels Test Plan: revert-hammer Differential Revision: D32498569 (`aa0cf68c17`) Original commit changeset: ebd932d042b9 Original Phabricator Diff: D32498569 (`aa0cf68c17`) fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d	2021-12-14 15:27:24 -08:00
Brian Hirsh	0ef523633f	Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen Test Plan: revert-hammer Differential Revision: D32498570 (`2e7a91c45f`) Original commit changeset: 0ce6a5614417 Original Phabricator Diff: D32498570 (`2e7a91c45f`) fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289	2021-12-14 15:04:10 -08:00
Brian Hirsh	2e7a91c45f	make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498570 Pulled By: bdhirsh fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0	2021-12-14 10:25:04 -08:00
Brian Hirsh	aa0cf68c17	allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498569 Pulled By: bdhirsh fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04	2021-12-14 10:25:02 -08:00
Brian Hirsh	b83b6f7424	allow external backend codegen to be used without autograd kernels (#68529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32498572 Pulled By: bdhirsh fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca	2021-12-14 10:23:12 -08:00
Will Constable	397183f44c	Add Lazy Tensor codegen infra (#69020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69020 Merges the lazy tensor codegen infra which has already been used on lazy_tensor_staging. Test Plan: Test via lazy_tensor_staging branch Reviewed By: alanwaketan, bdhirsh Differential Revision: D32570613 fbshipit-source-id: 2cd5698644398bda69669683f8de79fd3b6639b5	2021-12-02 07:51:52 -08:00
Sean Silva	2e156f649e	Sort output of *NativeFunctions.h (#67046 ) Summary: This ensures deterministic output, allowing systems like ccache to be more effective. cc ezyang bhosmer bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/67046 Reviewed By: VitalyFedyunin Differential Revision: D31896114 Pulled By: bdhirsh fbshipit-source-id: d29ef0cf6c7e3408b104c5239b620eaa24327088	2021-10-29 09:03:39 -07:00
Brian Hirsh	665c148e42	move some codegen utilities into utils.py (#63094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094 This PR: - Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports - Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it) The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942096 Pulled By: bdhirsh fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e	2021-10-28 10:49:17 -07:00
Peter Bell	a9983ac09c	Refactor structured set_output in Register{DispatchKey}.cpp (#62188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62188 These parts of the `set_output` code are identical for all operators in the kernel registration files. So, this moves them from being copied into every class to two helper functions at the top of the file. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29962045 Pulled By: albanD fbshipit-source-id: 753b8aac755f3c91b77ffa2c30a89ac91a84b7c4	2021-08-27 09:38:27 -07:00
Peter Bell	d454c9e76e	Migrate THCTensor_copyIgnoringOverlaps to ATen (#63505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63505 This isn't a public operator, just a helper function used in CUDA_tensor_apply. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441305 Pulled By: ngimel fbshipit-source-id: 84fabc701cbd8479e02d80f373a3dd62d70df2ce	2021-08-24 18:50:28 -07:00
Brian Hirsh	8bc2ba3fe3	detect missing kernels from external backends in codegen (#60737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60737 Test Plan: Imported from OSS Reviewed By: ezyang, jdonald Differential Revision: D29392615 Pulled By: bdhirsh fbshipit-source-id: d49d013243dbc8c8b55fbdb0b9b3eed38df52255	2021-07-08 15:33:04 -07:00
Brian Hirsh	7318747a3b	move all external kernels into a class for better compiler error messages (#59839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59839 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29047680 Pulled By: bdhirsh fbshipit-source-id: 18cf4124be440a0a343b5983e1a4165db808e7c1	2021-07-08 15:31:02 -07:00
Brian Hirsh	9134b0e42f	add a boxed CPU fallback kernel (#58065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58065 This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback. Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests. ### Design To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes: * Confirm whether or not we can remove all C++ logging info directly in the yaml. Current Design All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding [xla-side PR with the xla changes](https://github.com/pytorch/xla/pull/2945/files#diff-1a005c10039f0cb11130a3b740f5de716d2f10acaea121017016025861886798R1). There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels. ``` // xla_cpu_fallback.h #include <ATen/native/CPUFallback.h> ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack); ... ``` ``` // xla_cpu_fallback.cpp #include "torch_xla/csrc/aten_cpu_fallback.h" ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { // Do custom logging here ... // Call the actual boxed CPU fallback. at::native::cpu_fallback(op, stack); } TORCH_LIBRARY_IMPL(_, XLA, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>()); } ``` Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.: ``` #include <ATen/native/CPUFallback.h> at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha); } ... } ``` That `decltype(at::addmm)` logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands. Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer? We could change the api to use `at::redispatch`, which would make it look something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha); } ... } ``` Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though! Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha); } ... } ``` Writing that out actually I actually like it more (I think it'll let us get rid of `decltype(...)`). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out. More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides: * Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback. * Passing custom C++ logging through yaml is just more fragile: right now xla uses an `iostream` to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later. To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated `out` wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since `out` wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback. ### Performance impact While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer. I ran my benchmarks using callgrind, benchmarking both `at::add` and `at::add_out` run on XLA. My callgrind benchmark for `at::add` can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind. I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the `at::add()` call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does. `at::add`: before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001 after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273 delta: ~15.5% increase `at::add_out`: before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227 delta: ~14.5% increase High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case. For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a `CompositeExplicitAutograd` kernel which calls into the `out` operator. So the extra work that we end up doing is: * An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add) * An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback) * An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel). * unboxing->boxing->unboxing logic (this is the only strictly required piece) There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's [an issue for it here](https://github.com/pytorch/pytorch/issues/55104)), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later. Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (`at::to_cpu` takes up a ton of instructions, but I don't see any attribution for the `at::native::add` kernel anywhere). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28833085 Pulled By: bdhirsh fbshipit-source-id: 537ebd5d7fb5858f1158764ff47132d503c3b92b	2021-06-25 16:26:50 -07:00
Brian Hirsh	e341bab8ae	bugfix: ensure that at::{dispatch_key}:: API gets external linkage (#58569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58569 This should allow external C++ files that aren't compiled into `libtorch.so`/`libtorch_cpu.so` (including all of fbcode) to use fast path functions like `at::cpu::add()`, which skip the dispatcher. So, after spending way too much time trying to figure out why I was getting linker errors when calling `at::meta::{op}` and `at::cpu::{op}` from C++ test files, I realized that we're not including the header files for C++ for the namespaced operator definitions. I.e. `RegisterCPU.cpp`, which provides definitions for the `at::cpu::{op}` fast path functions, wasn't including the `CPUFunctions.h` header. Why that breaks stuff: the `CPUFunctions.h` header file is what marks each function with the `TORCH_API` macro, so without including it, when we build `libtorch.so` and `libtorch_cpu.so`, the compiler will look at the definition in `RegisterCPU.cpp`, not see a `TORCH_API`, and decide that the function should get internal linkage. An alternative would be to directly mark the function definitions in `RegisterCPU.cpp` with `TORCH_API`, but this seemed cleaner. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711300 Pulled By: bdhirsh fbshipit-source-id: 535f245c20e977ff566d6da0757b3cefa137040b	2021-06-15 16:53:22 -07:00
albanD	30a18fe318	refactor yaml loader import, no runtime change (#59850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59850 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063816 Pulled By: albanD fbshipit-source-id: ca3067443d8e6282c1077d3dafa3b4f330d43b28	2021-06-12 06:58:34 -07:00
albanD	504ec30109	avoid error string formatting aten codegen 28s -> 23s (#59848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59848 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063818 Pulled By: albanD fbshipit-source-id: c68734672eeacd212d7bd9bebe3d53aaa20c3c24	2021-06-12 06:58:31 -07:00
Brian Hirsh	be47060af9	[remove xla from codegen] rename aten_xla_type.h -> DispatchKeyNativeFunctions.h (#58568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58568 I split out the file rename into a separate commit to make the diff easier. The template file name is `aten_xla_type.h` -> `{DispatchKey}NativeFunctions.h` Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28711298 Pulled By: bdhirsh fbshipit-source-id: 2fa7d2abede560a2c577300f0b5a1f7de263d897	2021-05-26 12:53:19 -07:00
Brian Hirsh	86ce2950f6	remove xla-specific stuff from codegen (minus CPU fallback) (#58064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58064 Summary This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. Notable changes The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](https://github.com/pytorch/xla/pull/2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28711095 Pulled By: bdhirsh fbshipit-source-id: 90a48440f2e865a948184e2fb167ea240ada47bb	2021-05-26 12:52:13 -07:00
Brian Hirsh	1a9efbbc92	generate inplace/out kernels for xla (#57510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57510 This is a re-write of https://github.com/pytorch/pytorch/pull/56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876	2021-05-17 12:25:38 -07:00
Brian Hirsh	9354a68e7d	[codegen] split out backend-specific information from NativeFunction in the model (#57361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361 Data model change in the codegen, which splits backend-specific information out of `NativeFunction` ### Overview Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types. Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key. Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend. There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends. This has a few advantages: - We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index - The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend. - There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved. An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants. ### Data Model Changes Removed 3 classes, which are used by the external codegen: - ExternalBackendFunction - ExternalBackendFunctionsGroup - ExternalBackendMetadata And added two new ones: - BackendIndex - BackendMetadata `BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op). `BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured. ### Questions I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out: Dealing with `structured`. This PR separates out the notion of `structured` into two bits of information: - Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`. - Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper. I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not. I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything too controversial related to structured in the change, but I tried to call out any areas in the comments Localizing the fact that external backends follow Dispatcher convention. Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions. Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away. Thoughts on the `BackendIndex` / `BackendMetadata` breakdown. One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474362 Pulled By: bdhirsh fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661	2021-05-17 12:25:35 -07:00
Brian Hirsh	3d9f10f530	[external codegen] better yaml error messaging, added explicit error message tests (#56597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56597 3 small changes, all centered around error messaging. 1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml 2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them. 3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out: - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird. If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know! Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474363 Pulled By: bdhirsh fbshipit-source-id: 8b5ec804b388dbbc0350a20c053da657fad0474f	2021-05-17 12:25:29 -07:00
Brian Hirsh	76fbd755c1	Reland of "D27708346: generate xla codegen in-tree" (#56601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56601 Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit `90e532f3ef`. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27915305 Pulled By: bdhirsh fbshipit-source-id: 491a025c44221690dad849f9a2166934130c0fec	2021-04-21 19:36:31 -07:00
Brian Hirsh	90e532f3ef	Revert D27708346: generate xla codegen in-tree Test Plan: revert-hammer Differential Revision: D27708346 (`51d0212d0f`) Original commit changeset: 2289edd641f3 fbshipit-source-id: 86711c07db19833b9e772c558e12accba1432499	2021-04-21 11:07:45 -07:00
Brian Hirsh	51d0212d0f	generate xla codegen in-tree (#55050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9	2021-04-21 08:19:08 -07:00

30 Commits