pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Alex Suhan	b176feec1e	Add device and key for lazy tensors (#61621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621 Test Plan: CI Reviewed By: mruberry Differential Revision: D29912934 Pulled By: asuhan fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537	2021-07-26 23:00:22 -07:00
Brian Hirsh	7318747a3b	move all external kernels into a class for better compiler error messages (#59839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59839 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29047680 Pulled By: bdhirsh fbshipit-source-id: 18cf4124be440a0a343b5983e1a4165db808e7c1	2021-07-08 15:31:02 -07:00
Freey0	b52849b589	Port silu_backward to structured (#58661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58661 I removed dispatch: CompositeImplicitAutograd: math_silu_backward Definitely not right, but I don't know how it works with structured core. Keeping it will trigger an assertion failure ``` assert dispatch.keys() != {DispatchKey.CompositeImplicitAutograd}, \ f"unexpected name for singleton CompositeImplicitAutograd dispatch entry: expected {cpp.name(func)} " \ f"but got {dispatch[DispatchKey.CompositeImplicitAutograd]}. Rename your implementation to the expected " \ "name, then delete the dispatch table" ``` Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572530 Pulled By: ezyang fbshipit-source-id: 410f03bddf79cda7c9f0fd66f697383ee2925d32	2021-06-28 10:37:45 -07:00
Brian Hirsh	7bc86458e1	Revert "Revert D28833086: beef up at::_ops API" (#60214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214 Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871) This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA` This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675 This reverts commit `6d0fb85a62`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29213932 Pulled By: bdhirsh fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248	2021-06-24 18:08:54 -07:00
Brian Hirsh	6d0fb85a62	Revert D28833086: beef up at::_ops API Test Plan: revert-hammer Differential Revision: D28833086 (`e2129d1c06`) Original commit changeset: 55f322a8378c fbshipit-source-id: e55bf812ec411bb6bee87654f1d65ff10c046106	2021-06-17 14:28:32 -07:00
Brian Hirsh	e2129d1c06	beef up at::_ops API (#59115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59115 This PR beefs up the `at::_ops::` API as a source of truth for compile-time information about each operator. ### Changes For every op defined in native_functions.yaml, e.g. `at::_ops::add_Tensor` previously defined an unambiguous function; effectively an unambiguously named version of the C++ API that you could decltype() successfully because it had no overloads with a user-facing macro: `decltype(ATEN_FN2(add, Tensor)) // expands to decltype(at::_ops::add_Tensor)`. Now, `at::_ops::add_Tensor` is a struct containing a few static fields and methods (declared in `Operators.h`, defined in `Operators.cpp`): ``` struct TORCH_API add_Tensor { using schema = at::Tensor (const at::Tensor &, const at::Tensor &, const at::Scalar &); using ptr_schema = at::Tensor ()(const at::Tensor &, const at::Tensor &, const at::Scalar &); static constexpr const char name = "aten::add"; static constexpr const char* overload_name = "Tensor"; static constexpr const char* schema_str = "add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> Tensor"; static at::Tensor call(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha); static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, const at::Tensor & ot }; ``` What used to be the function `at::_ops::add_Tensor` can now be accessed as `at::_ops::add_Tensor::call`, and I've added a new macro to access the entire struct (naming suggestions welcome) - `ATEN_OP2(add, Tensor)`. ### Motivation There were two motivations for this change: Codegen refactor* The `at::_ops::` API as it exists now is (yet another) C++ entry point into the dispatcher, in addition to the Function, Method, and Redispatch APIs. Instead, after this PR, the existing three API's are all inline-able wrapper API's that call into the `at::_ops` API to do the real work. The function and method API's call into `at::_ops::{op}::call`, while the redispatch API calls into `at::_ops::{op}::redispatch`. This will hopefully make it easier to pile in any future C++ API's that we want to code-generate. It also means that stuff like the string name, overload name, and schema of each operator is consolidated in a single place, rather than having the codegen hardcode various strings in multiple codegen output files. Extra compile-time metadata In the [boxed CPU fallback PR](https://github.com/pytorch/pytorch/pull/58065/files#diff-c9b55f0d692a9bea8019c6f19bc46877f1efa0f9d4fc2086cf299b52768343b4R31) above this in the stack, I added a new API that external backends can use to call directly into their boxed fallback from an unboxed context. Adding extra metadata to `at::_ops` means that XLA's usage of that API doesn't require passing in the string name and overload of each name as arguments; we can just infer them. The updated API looks like this (see [the XLA-side PR ](https://github.com/pytorch/xla/pull/2945/files#diff-5e65c3c1d847191cb691d1874732e971f09fa1aad7a980a555c3b0504a5b6470R250) for more examples) ``` return at::native::call_fallback_fn<&xla_cpu_fallback, ATEN_OP2(add, Tensor)>::call(a, b, 1.0); ``` Characteristics of the `at::_ops` API (I also commented this in the codegen) (1) It follows the Dispatcher API. This means, e.g., that it takes in the expanded arguments rather than `TensorOptions`. This is kind of necessary for perf, if we want to `at::_ops` to serve as the main implementation of the existing C++ API's. For example: if it followed the C++ API, then all of the faithful C++ factory functions would need to wrap their arguments into TensorOptions only to unwrap them again. (2) Overload names are disambiguated. This is the same as before; it's helpful for pytorch extenders who would like to decltype() an aten operator, that has overloads, e.g. decltype(at::_ops::mul_Tensor::call) (3) No argument defaulting is allowed. This is more of an implementation detail to avoid #include cycles, since TensorBody.h (which defines the Tensor class) needs to include this file. The #include situation is precarious though! (4) manual_cpp_bindings and faithful names are not included in the API. I think that this is one we have a choice with. This applies to stuff like __dispatch__is_complex(), and add_outf(). These aren't "real native_functions.yaml ops", they're just additional functions provided by the C++ API. They're implemented as wrappers in Functions.h that call into the actual operators defined here, i.e. at::_ops::is_complex::call() and at::_ops::add_out::call(). This means that ATEN_OP(is_complex) will not fastpath, and will go through the dispatcher. It also means that `ATEN_OP2(add, out)` is automatically faithful and takes its out argument at the end (this is just because it follows the dispatcher API). Details Instead of codegen'ing the existing 3 API's in `Functions.cpp`, `TensorMethods.cpp` and `RedispatchFunctions.cpp`, I codegen them directly into the headers: `Functions.h`, `TensorBody.h`, and `RedispatchFunctions.h`. I mostly did this for perf, since we want to avoid introducing an extra function call in the hot path of every operator. These functions are also now all one-liners that call into `at::_ops`, so the compiler should just inline them all anyway. The main downside in doing that though was that I had to bend over backwards in a few cases to avoid cyclical #include statements. The issue is that `TensorBody.h` now includes `Operators.h` (because the codegen'd method API is implemented by calling into `at::_ops`), but `TensorBody.h` also includes the definition of the Tensor class. That means that `Operators.h` can't be aware of the Tensor class; it needs to forward declare everything and avoid using the Tensor class directly. To fix cyclic includes, I had to: - Not allow defaulting in the `at::_ops` API - Move some code that was called when translating from C++ to Dispatcher API's directly into the codegen template (`check_tensor_options_and_extract_memory_format`) It's not great, but I don't think this specific include cycle will break down in the near future; the only code that we need to call before getting to `Operators.cpp` is the translations from various API's to the dispatcher API; there aren't many of them, and there's no major reason for them to live an external utils file somewhere. Moving the code into the headers also meant that the codegen no longer needs to deal with `Functions.cpp`/`TensorMethods.cpp`/`RedispatchFunctions.cpp`. All of the functions that used to be defined in `TensorMethods.cpp` seemed small enough for me to lump into `TensorBody.h`, but some of the functions in `Functions.cpp` looked pretty big to put in a header, so I moved the file to `aten/src/ATen/native/Functions.cpp`. It might be worth keeping `TensorMethods.cpp` there and leaving it too, in-case we have any beefy hand-written tensor methods that we don't want to put in a header. Perf I ran a few benchmarks in callgrind, and didn't see a noticeable instruction count change when calling `at::add()`. I also saw in the output that `at::add()` was successfully getting inlined. There's also probably a light risk of binary size increase; I think that there's a binary size regression test that I can run in phabricator (going to try it). I can also try inspecting `libtorch.so` directly and seeing if it's any bigger, but my hope is that the inline-ing means that we aren't generated separate symbols for `at::add` and `at::_ops::add_Tensor::call`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D28833086 Pulled By: bdhirsh fbshipit-source-id: 55f322a8378cb9a3cb6642f72aa291be381dd95b	2021-06-17 13:09:46 -07:00
Brian Hirsh	1a9efbbc92	generate inplace/out kernels for xla (#57510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57510 This is a re-write of https://github.com/pytorch/pytorch/pull/56835, which is significantly shorter thanks to the data model change in the PR below this one in the stack. See the original description in the linked PR for details. The functional changes in this PR are the same as in the above linked one, so the description is the same with a few small changes: - I don't bother generating `at::xla::{op}` entries for CPU fallbacks. After looking around, I see precedent for that. For example, we don't have `at::cpu::{op}` entries for composite ops- if you really want to bypass the dispatcher you need to call `at::compositeimplicitautograd::{op}`. Maybe we should revisit that later if we find an important use case for having full namespace coverage, but that doesn't seem worth half-fixing for external backends in this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474364 Pulled By: bdhirsh fbshipit-source-id: 4d58b60e5debad6f1ff06420597d8df8505b2876	2021-05-17 12:25:38 -07:00
Brian Hirsh	9354a68e7d	[codegen] split out backend-specific information from NativeFunction in the model (#57361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361 Data model change in the codegen, which splits backend-specific information out of `NativeFunction` ### Overview Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types. Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key. Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend. There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends. This has a few advantages: - We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index - The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend. - There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved. An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants. ### Data Model Changes Removed 3 classes, which are used by the external codegen: - ExternalBackendFunction - ExternalBackendFunctionsGroup - ExternalBackendMetadata And added two new ones: - BackendIndex - BackendMetadata `BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op). `BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured. ### Questions I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out: Dealing with `structured`. This PR separates out the notion of `structured` into two bits of information: - Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`. - Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper. I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not. I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything too controversial related to structured in the change, but I tried to call out any areas in the comments Localizing the fact that external backends follow Dispatcher convention. Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions. Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away. Thoughts on the `BackendIndex` / `BackendMetadata` breakdown. One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474362 Pulled By: bdhirsh fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661	2021-05-17 12:25:35 -07:00
Wenlei Xie	20085f6d23	Support auto generation of device check (#56872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872 ghstack-source-id: 127914018 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986429 fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f	2021-05-01 12:02:09 -07:00
Wenlei Xie	22ecb8885f	Disable device check for foreach kernels (#56871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56871 foreach kernels fall back to slow path when tensor are on different devices Generated by codemod: ``` fastmod '(- func: _foreach.*)' '${1} device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices' aten/src/ATen/native/native_functions.yaml ``` ghstack-source-id: 127914017 Test Plan: autotest Reviewed By: ezyang Differential Revision: D27986560 fbshipit-source-id: b0cd963cdba04b4e1589bbf369eb26b48d523968	2021-05-01 12:02:07 -07:00
Wenlei Xie	183320df96	Add device_check place holder for functions (#56870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56870 Automatic generation of device check code will be supported in following PRs. Changed are generaetd via: 1. Codemod ``` fastmod ' device_guard: False' ' device_check: NoCheck device_guard: False' aten/src/ATen/native/native_functions.yaml ``` 2. Python script: https://gist.github.com/wenleix/be20c34bbbfcee0b289cdea2cf15b16c ghstack-source-id: 127914016 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986427 fbshipit-source-id: 4e598a30306b80b5ade27af70d3e58770e401fc2	2021-05-01 12:02:05 -07:00
Peter Bell	8b3bf98cb8	Tell codegen that SparseCsrCUDA is cuda (#56602 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/50937, Fixes build failures in https://github.com/pytorch/pytorch/issues/56561 Currently SparseCsrCUDA is included in cpu build and also doesn't get code-generated device guards. This fixed both issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56602 Reviewed By: albanD Differential Revision: D27921001 Pulled By: ezyang fbshipit-source-id: 2b3b0b66d0a7c5ef96e0817d8852d511dd954ae4	2021-04-22 11:57:10 -07:00
Brian Hirsh	76fbd755c1	Reland of "D27708346: generate xla codegen in-tree" (#56601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56601 Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit `90e532f3ef`. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27915305 Pulled By: bdhirsh fbshipit-source-id: 491a025c44221690dad849f9a2166934130c0fec	2021-04-21 19:36:31 -07:00
Scott Wolchok	1211bccc65	[PyTorch] Fix const correctness for resize native functions (#55351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55351 We incorrectly used `Tensor&` to mean "the underlying TensorImpl cannot be changed", as explained in https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 . This diff gets us on the path to fixing this problem: we have an incremental way to fix individual native functions so that we can apply any handwritten fixes a few at a time. It gets the migration started with the `resize` family of native functions. ghstack-source-id: 127092677 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27583983 fbshipit-source-id: 4eeeec85f5d268e9d0f1645eb9396914a9f9557f	2021-04-21 14:51:41 -07:00
Brian Hirsh	90e532f3ef	Revert D27708346: generate xla codegen in-tree Test Plan: revert-hammer Differential Revision: D27708346 (`51d0212d0f`) Original commit changeset: 2289edd641f3 fbshipit-source-id: 86711c07db19833b9e772c558e12accba1432499	2021-04-21 11:07:45 -07:00
Brian Hirsh	51d0212d0f	generate xla codegen in-tree (#55050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9	2021-04-21 08:19:08 -07:00
Edward Yang	f17c9ea2ed	Port all unary float functions to structured (#56082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56082 The native_functions.yaml changes were done by codemod using the following script: ``` import ruamel.yaml from ruamel.yaml.tokens import CommentToken from ruamel.yaml.error import CommentMark from tools.codegen.model import * # noqa: F403 with open("aten/src/ATen/native/native_functions.yaml", "r") as f: contents = f.read() yaml = ruamel.yaml.YAML() yaml.preserve_quotes = True yaml.width = 1000 yaml.boolean_representation = ['False', 'True'] r = yaml.load(contents) convert = '''\ acos acosh asin asinh atan atanh cos cosh digamma erf erfc erfinv exp expm1 exp2 lgamma log log10 log1p log2 reciprocal sigmoid sin sinc sinh special_entr sqrt tan tanh'''.split() for e in r: f = NativeFunction.from_yaml(e, Location("", 0)) if f.structured or f.structured_delegate is not None: continue n = f.func.name.name.base if n not in convert: continue # mutate e to make changes if f.func.kind() == SchemaKind.out: e.insert(1, 'structured', True) e.insert(2, 'structured_inherits', 'TensorIteratorBase') else: # TODO: The .out overload assumption is not sound in general e.insert(1, 'structured_delegate', f'{n}.out') e['dispatch'].pop('CPU', None) e['dispatch'].pop('CUDA', None) e['dispatch'].pop('CPU, CUDA', None) e['dispatch'].pop('CompositeExplicitAutograd', None) _, last_k = e.keys() needs_fixup = False if not e['dispatch']: if last_k == 'dispatch': needs_fixup = True del e['dispatch'] # Manually fix up newlines at the end, because ruamel # made some bad life choices about where to associate trailing # whitespace for nested dicts; see # https://stackoverflow.com/questions/42172399/modifying-yaml-using-ruamel-yaml-adds-extra-new-lines if needs_fixup: _, last_k = e.keys() # post_key, pre_key, post_value, pre_value e.ca.items[last_k] = [None, None, CommentToken('\n\n', CommentMark(0), None), None] with open("aten/src/ATen/native/native_functions.yaml.new", "w") as f: yaml.dump(r, f) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27777769 Pulled By: ezyang fbshipit-source-id: 1ecbac7cb3e0093167bb61c7d2b1ecb95b8ae17c	2021-04-15 16:06:42 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
Ailing Zhang	6842da6251	[WIP]Relax some limitations of InferenceMode. (#54403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403 A few important points about InferenceMode behavior: 1. All tensors created in InferenceMode are inference tensors except for view ops. - view ops produce output has the same is_inference_tensor property as their input. Namely view of normal tensor inside InferenceMode produce a normal tensor, which is exactly the same as creating a view inside NoGradMode. And view of inference tensor outside InferenceMode produce inference tensor as output. 2. All ops are allowed inside InferenceMode, faster than normal mode. 3. Inference tensor cannot be saved for backward. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27316483 Pulled By: ailzhang fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b	2021-04-09 14:40:37 -07:00
Wenlei Xie	70af5db7ca	Remove use_c10_dispatcher option (#54969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969 With all use cases to hacky wrapper removed, all kernels will be dispatched with c10 full dispatcher. ghstack-source-id: 125434790 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang, walterddr Differential Revision: D27436596 fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2	2021-03-31 16:24:24 -07:00
Ailing Zhang	43d4f3b8d0	Implement public API InferenceMode and its error handling (#55008 ) Summary: https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008 Test Plan: Imported from OSS Pulled By: ailzhang Reviewed By: bhosmer Differential Revision: D27443229 fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c	2021-03-31 10:48:00 -07:00
Ailing Zhang	263180d7fc	Revert D26973911: Implement public API InferenceMode and its error handling Test Plan: revert-hammer Differential Revision: D26973911 (`7caa464631`) Original commit changeset: 0ebdac7a3cd5 fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273	2021-03-29 11:17:49 -07:00
Ailing Zhang	7caa464631	Implement public API InferenceMode and its error handling (#53343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 Test Plan: Imported from OSS Reviewed By: ezyang, nikithamalgifb Differential Revision: D26973911 Pulled By: ailzhang fbshipit-source-id: 0ebdac7a3cd554822d26d5a40f539b6e2aaec61d	2021-03-27 13:44:23 -07:00
Edward Yang	13b1ca9466	Rename DefaultBackend to CompositeExplicitAutograd (#54470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470 ``` git grep -l 'DefaultBackend' \| xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g' ``` Plus a quick fixup in native/README.md Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27253240 Pulled By: ezyang fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f	2021-03-26 10:53:30 -07:00
Edward Yang	145bc5cd51	Rename Math to CompositeImplicitAutograd (#54466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466 I had to very carefully audit all the use sites since there are a lot of other uses of the string Math; I did most of the conversion by grepping for all occurrences of Math and then doing a search replace. I also updated documentation for clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27253239 Pulled By: ezyang fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953	2021-03-24 13:49:24 -07:00
Edward Yang	6d0027197c	Delete all unnecessary singular Math entries (#54436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54436 An operator entry with no dispatch table implicitly generates a Math entry, so you don't need to define one yourself. I also added some asserts in the codegen to fail on these cases. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235381 Pulled By: ezyang fbshipit-source-id: f8c905090b863120f4f3656c37e2b7f26e8bb9ef	2021-03-23 00:44:01 -07:00
Edward Yang	6e8c4ad7fd	s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427 A StructuredNativeFunctions is no longer guaranteed to actually be structured (test structured property for that), so we rename this to a more neutral name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235380 Pulled By: ezyang fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8	2021-03-23 00:43:57 -07:00
Edward Yang	bf2ca35f35	Rejigger to use NativeFunctionsGroup even without structured: True (#54426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54426 Previously, we only put NativeFunctions in StructuredNativeFunctions if the out variant advertised that the kernel was structured. However, there are a few code generation things that can take advantage of this trio structure, even if the kernel itself hasn't been ported to be structured. So better to always group things when they are related, and then let clients decide whether or not to use the structure or throw it away. While doing this, I had hoped that there weren't any functional/inplace pairs that didn't also have an out variant. This turned out to not be true. These are probably all oversights and should get fixed at some point. Bill of changes: - The actual operational change happens in StructuredNativeFunctions.from_dict; then I need to relax some __post_init__ invariants. To tell if a StructuredNativeFunctions is actually structured, there is a new structured property, which is queried from a few new locations in code - Refactor native_functions.py into gen_structured/gen_unstructured functions so I can easily call gen_unstructured from two contexts I intend to s/StructuredNativeFunctions/NativeFunctionsGroup/ but for ease of review this rename hasn't been done in this PR. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235379 Pulled By: ezyang fbshipit-source-id: d8a15de9abb75b365348ab94e67b830704e30cf0	2021-03-23 00:43:54 -07:00
Edward Yang	282eefebf3	Delete defunct ComplexCPU/ComplexCUDA dispatch keys (#54013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54013 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27051837 Pulled By: ezyang fbshipit-source-id: c2a20737b6bd4a1317905bafceb2d8cb39f37e76	2021-03-16 15:20:04 -07:00
Ailing Zhang	274b96b878	Move as_view/increment_version to its separate key. (#53342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53342 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973913 Pulled By: ailzhang fbshipit-source-id: bc7fc25d1a3a1f20cdfa1d7126fa559a84d194a4	2021-03-15 14:47:12 -07:00
Edward Yang	93c4f9f972	Split out RegisterDispatchKey to its own file (#51508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508 No substantive changes. The codegen for this file was getting a bit long so I moved it off into tools.codegen.dest submodule (I wanted to do tools.codegen.gen but that conflicts with the existing module; oy vey!) To do this I had to move some other functions around so that they were more generally accessible. Otherwise self-explanatory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26187856 Pulled By: ezyang fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408	2021-02-04 09:19:32 -08:00
Jiakai Liu	83287a6f2b	[pytorch] change codegen dispatch key from string to enum (#51115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51115 Add enum type for dispatch key. Prepare to implement the DispatchTable computation logic in python for static dispatch. Verified byte-for-byte compatibility of the codegen output. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26077430 Pulled By: ljk53 fbshipit-source-id: 86e74f3eb32266f31622a2ff6350b91668c8ff42	2021-01-27 22:28:52 -08:00
Edward Yang	5e79b8e06d	Back out "Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d" (#50794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50794 Original commit changeset: b4a7948088c0 There are some subtle extra tweaks on top of the original. I can unbundle them, but I've opted to keep it with the port because it's the easiest way to make sure the changes are exercised. * There's a bugfix in the codegen to test if a dispatch key is structured before short circuiting because the dispatch key was missing in the table. This accounts for mixed structured-nonstructured situations where the dispatch table is present, but the relevant structured key isn't (because the dispatch table only exists to register, e.g., QuantizedCPU) * Dispatch tables for functions which delegate to structured kernels don't have Math entries from generated for them. * It's now illegal to specify a structured dispatch key in a delegated structured kernel (it will be ignored!) add is now fixed to follow this * There are some extra sanity checks for NativeFunctions validation * Finally, unlike the original PR, I switched the .vec variant of upsample_nearest2d to also be DefaultBackend, bringing it inline with upsample_nearest1d. ghstack-source-id: 120038038 Test Plan: ``` buck test mode/dev //coreai/tiefenrausch:python_tests -- --exact 'coreai/tiefenrausch:python_tests - test_can_run_local_async_inference_cpu (coreai.tiefenrausch.tests.python_test.TiefenrauschPY)' --run-disabled ``` Reviewed By: ngimel Differential Revision: D25962873 fbshipit-source-id: d29a9c97f15151db3066ae5efe7a0701e6dc05a3	2021-01-25 10:43:53 -08:00
Sebastian Messmer	d46210958e	Remove `use_c10_dispatcher: full` lines added in the last couple days (#50769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50769 There were a couple new of these lines added in the last couple of days but they're not necessary anymore. This PR removes them and also adds an assertion to make sure we don't add any more. ghstack-source-id: 120133715 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25961316 fbshipit-source-id: e2befc5b6215b42decb2acedcacfb50734857e2f	2021-01-21 20:35:26 -08:00
Sebastian Messmer	e4c41b6936	Remove codegen logic to support non-c10-full ops (#49164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164 This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops. This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet. ghstack-source-id: 119450487 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25462977 fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285	2021-01-06 14:17:36 -08:00
Jiakai Liu	e71a13e8a3	[pytorch][codegen] migrate gen_variable_type to new data model (#49735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735 This is the final wave of autograd codegen data model migration. After this PR: - autograd codegen no longer depends on Declarations.yaml; - autograd codegen sources are fully type annotated and pass mypy-strict check; To avoid potential merge conflicts with other pending PRs, some structural changes are intentionally avoided, e.g. didn't move inner methods out, didn't change all inner methods to avoid reading outer function's variables, and etc. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang, bhosmer Differential Revision: D25678879 Pulled By: ljk53 fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f	2021-01-05 14:12:39 -08:00
Edward Yang	0216366f0d	Make use_c10_dispatcher: full mandatory for structured kernels (#49490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49490 No reason to let people to do the legacy thing for the brand new kernel. This simplifies the codegen. I have to port the two structured kernels to this new format. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25595406 Pulled By: ezyang fbshipit-source-id: b5931873379afdd0f3b00a012e0066af05de0a69	2021-01-04 11:59:24 -08:00
Edward Yang	8eee8460f8	codegen: Resolve overload ambiguities created by defaulted arguments (#49348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348 This is a redux of #45666 post refactor, based off of `d534f7d4c5` Credit goes to peterbell10 for the implementation. Fixes #43945. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594004 Pulled By: ezyang fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f	2021-01-04 11:59:16 -08:00
Edward Yang	7202c0ec50	Tighten up error checking on manual_kernel_registration (#49341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49341 I noticed that #49097 was using manual_kernel_registration incorrectly, so this diff tightens up the testing so that: 1. We don't generate useless wrapper functions when manual_kernel_registration is on (it's not going to be registered, so it does nothing). 2. manual_kernel_registration shouldn't affect generation of functions in Functions.h; if you need to stop bindings, use manual_cpp_binding 3. Structured and manual_kernel_registration are a hard error 4. We raise an error if you set dispatch and manual_kernel_registration at the same time. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594003 Pulled By: ezyang fbshipit-source-id: 655b10e9befdfd8bc95f1631b2f48f995a31a59a	2021-01-04 11:59:12 -08:00
Sebastian Messmer	69ca5e1397	Enforce c10-fullness for all ops (#49619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49619 This is a minimal-change PR that enforces that all operators are c10-full by making it the default. This does not clean up any code yet, that will happen in PRs stacked on top. But this PR already ensures that there are no non-c10-full ops left and there will be no non-c10-full ops introduced anymore. ghstack-source-id: 119269182 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25650198 fbshipit-source-id: efc53e884cb53193bf58a4834bf148453e689ea1	2021-01-04 11:26:53 -08:00
Iurii Zdebskyi	6230e337d5	Add torch._foreach_zero_ API (#47286 ) Summary: In this PR - add `_foreach_zero_` API - Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method Performance improvement ----------------- OP: zero_ ----------------- for-loop: 630.36 us foreach: 90.84 us script ``` import torch import torch.optim as optim import torch.nn as nn import torchvision import torch.utils.benchmark as benchmark_utils inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)] def main(): for op in [ "zero_" ]: print("\n\n----------------- OP: ", op, " -----------------") stmt = "[torch.{op}(t) for t in inputs]" timer = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer)", ) print(f"autorange:\n{timer.blocked_autorange()}\n\n") stmt = "torch._foreach_{op}(inputs)" timer_mta = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer_mta)", ) print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` TODO - Refactor zero_grad once foreach APIs are stable. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/47286 Reviewed By: ngimel Differential Revision: D24706240 Pulled By: izdeby fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf	2020-12-16 20:04:25 -08:00
Edward Yang	59e822026c	Add manual_cpp_binding to native_functions.yaml (#49092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49092 Functions which specify manual_cpp_binding don't automatically get C++ bindings generated for them in TensorBody.h or Functions.h. This lets end users manually define the bindings themselves, which may be helpful if there is a way to short circuit the dispatcher entirely. contiguous() is switched to use this mechanism. Although manual_cpp_binding suggests that we don't generate the binding at all, it is often the case that there is some "fast path", but when this path is not satisfied, we should go back to the slow dispatch. So we still generate a fallback method/function which the user-defined binding can call into in case that we have to go slowpath. The correctness conditions for bindings manually written in this way are subtle. Here are the ones I can think of off the top of my head: - Whatever condition is tested in the C++ body, must ALSO be tested again in the native:: implementation on the other side of the dispatcher. This is because you are NOT GUARANTEED to hit the native:: implementation through the C++ binding, you may go straight to the implementation via a boxed call. - If a binding is written in this way, it is only safe to skip dispatch if you would have returned the same tensor as before. In any situation you would return a fresh tensor, you MUST go to the slow path, because you need to actually get to the autograd kernel. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25428440 Pulled By: swolchok fbshipit-source-id: 6e71767cb8d1086d56cd827c1d2d56cac8f6f5fe	2020-12-10 21:56:53 -08:00
Edward Yang	9b0ffb9fb3	Delete cpp.group_arguments (#49043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043 Previously, this function had nontrivial algorithmic content, but after #48195, this was just a swiss army knife for pasting together arguments while maintaining structure. I added some more properties for Arguments for convenient access in this way, and then inlined the implementation of group_arguments into all of its call sites, simplifying whenever contextual. This might be controversial, but I think the resulting code is easier to understand. You may notice that there is some modest code duplication between dispatcher.cpparguments_exprs and CppSignature.argument_packs. This is a known problem and I will be attempting to fix it in a follow up PR. Confirmed to be byte-for-byte compatible. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455885 Pulled By: ezyang fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f	2020-12-10 18:20:46 -08:00
Edward Yang	267641a245	Rename positional and kwarg_only to have flat prefix (#49042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49042 I want the names positional and kwarg_only to give the unflat representation (e.g., preserving TensorOptionsArguments in the returned Union). So I regret my original naming choice when I moved grouping to model. This renames them to have flat_ prefix and also adds a flat_non_out argument for cases where you just want to look at non-out arguments. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455884 Pulled By: ezyang fbshipit-source-id: f923f8881267a3e3e8e9521519412f7cc25034fc	2020-12-10 18:20:43 -08:00
Edward Yang	16b8e6ab01	Class-based structured kernels, with migration of add to framework (#48718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48718 This PR rewrites structured kernels to do the class-based mechanism (instead of defining a meta and impl function, they are methods on a class), and adds enough customizability on the class to support TensorIterator. To show it works, add is made a structured kernel. Don't forget to check https://github.com/pytorch/rfcs/pull/9 for a mostly up-to-date high level description of what's going on here. High level structure of this PR (the order you should review files): * TensorMeta.h - TensorMeta is deleted entirely; instead, meta functions will call `set_output` to allocate/resize their outputs. MetaBase gets a new `maybe_get_output` virtual method for retrieving the (possibly non-existent) output tensor in a meta function; this makes it easier to do special promotion behavior, e.g., as in TensorIterator. * TensorIterator.cpp - Two major changes: first, we add TensorIteratorBase::set_output, which is a "light" version of TensorIterator::set_output; it sets up the internal data structures in TensorIterator, but it doesn't do allocation (that is assumed to have been handled by the structured kernels framework). The control flow here is someone will call the subclassed set_output, which will allocate output, and then we will call the parent class (TensorIteratorBase) to populate the fields in TensorIterator so that other TensorIterator phases can keep track of it. Second, we add some tests for meta tensors, and skip parts of TensorIterator which are not necessary when data is not available. * tools/codegen/model.py - One new field in native_functions.yaml, structured_inherits. This lets you override the parent class of a structured meta class; normally it's MetaBase, but you can make it point at TensorIteratorBase instead for TensorIterator based kernels * tools/codegen/gen.py - Now generate all of the classes we promised. It's kind of hairy because this is the first draft. Check the RFC for what the output looks like, and then follow the logic here. There are some complications: I need to continue to generate old style wrapper functions even if an operator is structured, because SparseCPU/SparseCUDA/etc won't actually use structured kernels to start. The most complicated code generation is the instantiation of `set_output`, which by in large replicates the logic in `TensorIterator::set_output`. This will continue to live in codegen for the forseeable future as we would like to specialize this logic per device. * aten/src/ATen/native/UpSampleNearest1d.cpp - The previous structured kernel is ported to the new format. The changes are very modest. * aten/src/ATen/native/BinaryOps.cpp - Add is ported to structured. TODO: * Work out an appropriate entry point for static runtime, since native:: function stubs no longer are generated * Refactor TensorIteratorConfig construction into helper functions, like before * Make Tensor-Scalar addition structured to fix perf regression * Fix `verify_api_visibility.cpp` * Refactor tools/codegen/gen.py for clarity * Figure out why header changes resulted in undefined reference to `at::Tensor::operator[](long) const` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278031 Pulled By: ezyang fbshipit-source-id: 57c43a6e5df21929b68964d485995fbbae4d1f7b	2020-12-09 15:39:12 -08:00
Iurii Zdebskyi	dad74e58fc	[WIP] Added foreach_trunc, foreahc_reciprocal, foreach_sigmoid APIs (#47385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47385 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24737051 Pulled By: izdeby fbshipit-source-id: ed259d9184b2b784d8cc1983a8b85cc6cbf930ba	2020-12-07 10:47:23 -08:00
Edward Yang	742903c0df	Move argument grouping into FunctionSchema (#48195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195 The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process. Some fuzz in the process: * I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased) * I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn. * I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25231166 Pulled By: ezyang fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad	2020-12-02 07:57:11 -08:00
Edward Yang	ba5686f8c5	Refactor argument fields in FunctionSchema to Arguments (#48182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182 I'm planning to add a bunch more argument fields following https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and it will be a lot more convenient if the arguments get to live in their own dedicated struct. Type checker will tell you if I've done it wrong. No change to output. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25057897 Pulled By: ezyang fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691	2020-12-02 07:57:06 -08:00

1 2

77 Commits