pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Scott Wolchok	1211bccc65	[PyTorch] Fix const correctness for resize native functions (#55351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55351 We incorrectly used `Tensor&` to mean "the underlying TensorImpl cannot be changed", as explained in https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 . This diff gets us on the path to fixing this problem: we have an incremental way to fix individual native functions so that we can apply any handwritten fixes a few at a time. It gets the migration started with the `resize` family of native functions. ghstack-source-id: 127092677 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27583983 fbshipit-source-id: 4eeeec85f5d268e9d0f1645eb9396914a9f9557f	2021-04-21 14:51:41 -07:00
Brian Hirsh	90e532f3ef	Revert D27708346: generate xla codegen in-tree Test Plan: revert-hammer Differential Revision: D27708346 (`51d0212d0f`) Original commit changeset: 2289edd641f3 fbshipit-source-id: 86711c07db19833b9e772c558e12accba1432499	2021-04-21 11:07:45 -07:00
Brian Hirsh	51d0212d0f	generate xla codegen in-tree (#55050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9	2021-04-21 08:19:08 -07:00
Brian Hirsh	eca98fedb5	split out NamedCType from CType. Remove direct string comparison from autograd codegen (#55334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55334 The goal of this PR is to clean up some of the autograd codegen to compare C++ types using `CType` objects instead of raw strings. My last PR in the stack made that string comparison a little more fragile, since the raw C++ strings needed to be namespace-aware. I confirmed byte-for-byte no codegen changes vs. the last PR (which added namespaces to the codegen) by running `diff -qr ../pytorch-common_test/torch/csrc/autograd/generated/ ../pytorch-callgrind_test_after2/torch/csrc/autograd/generated/` and `diff -qr ../pytorch-common_test/build/aten/src/ATen/ ../pytorch-callgrind_test_after2/build/aten/src/ATen/` Note that a better end-state for the autograd codegen would be to do all of its type pattern matching directly off of JIT types, instead of off of CType’s (which are really just generated from JIT types, incorporating C++ specific semantics). That looks like it’ll require a pretty substantial change though, so I’m not doing it in this PR. As part of this change (and after talking with ezyang), I split off the `CType` data class into a separate `NamedCType` class, which holds a name and a `CType`. This way, `CType` only knows about actual C++ types, making it easier to compare CType’s to each other in the codegen when we only care about the type. The core change is in `types.py`, but it required a bunch of downstream changes to update all of the places where we create `CType`s to create `NamedCType`s instead. The main change in the autograd codegen was that I updated `SavedAttribute` to store a `NamedCType`. The other autograd changes all pretty much came from that change. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708347 Pulled By: bdhirsh fbshipit-source-id: 3e07c80569c7b229c638f389e76e319bff6315f9	2021-04-16 11:43:08 -07:00
Brian Hirsh	947c7a8215	add C++ namespacing logic to ctypes (#55047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047 Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace. Important changes are in `types.py`. How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor". This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`. One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :) I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708349 Pulled By: bdhirsh fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad	2021-04-16 11:43:06 -07:00
Brian Hirsh	164bee1d09	Return a CType instead of a string for returns, beef up CType (#55046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046 Updating `returns` in the codegen to return a CType instead of a raw string. This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces. I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types. There's a little bit of weirdness, types that are currently only used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708348 Pulled By: bdhirsh fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8	2021-04-16 11:41:46 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Wenlei Xie	70af5db7ca	Remove use_c10_dispatcher option (#54969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969 With all use cases to hacky wrapper removed, all kernels will be dispatched with c10 full dispatcher. ghstack-source-id: 125434790 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang, walterddr Differential Revision: D27436596 fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2	2021-03-31 16:24:24 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Scott Wolchok	4495b49ffa	[PyTorch] Pass TensorOptions by value (#51165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165 `TensorOptions` does not have a non-trivial copy, move, or destroy operation and is small enough to fit in a register, so it seems like we should pass it by value. ghstack-source-id: 120697498 Test Plan: Measured timing for empty framework overhead benchmark before & after this change: Before: ``` I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645 I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485 I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485 I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359 I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` After: ``` I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814 I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759 I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291 I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279 I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491 2,983.75 msec task-clock # 0.995 CPUs utilized ( +- 0.36% ) 243 context-switches # 0.082 K/sec ( +- 1.26% ) 1 cpu-migrations # 0.000 K/sec 11,422 page-faults # 0.004 M/sec ( +- 0.18% ) 5,928,639,486 cycles # 1.987 GHz ( +- 0.36% ) (50.02%) 16,105,928,210 instructions # 2.72 insn per cycle ( +- 0.05% ) (50.02%) 3,150,273,453 branches # 1055.809 M/sec ( +- 0.03% ) (50.05%) 3,713,617 branch-misses # 0.12% of all branches ( +- 0.83% ) (50.07%) ``` It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win. For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher: ``` 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` If I've done the arithmetic correctly, we have an 0.39% instruction count win. Reviewed By: ezyang Differential Revision: D25983863 fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2	2021-02-01 12:40:08 -08:00
Sebastian Messmer	e4c41b6936	Remove codegen logic to support non-c10-full ops (#49164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164 This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops. This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet. ghstack-source-id: 119450487 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25462977 fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285	2021-01-06 14:17:36 -08:00
Jiakai Liu	e71a13e8a3	[pytorch][codegen] migrate gen_variable_type to new data model (#49735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735 This is the final wave of autograd codegen data model migration. After this PR: - autograd codegen no longer depends on Declarations.yaml; - autograd codegen sources are fully type annotated and pass mypy-strict check; To avoid potential merge conflicts with other pending PRs, some structural changes are intentionally avoided, e.g. didn't move inner methods out, didn't change all inner methods to avoid reading outer function's variables, and etc. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang, bhosmer Differential Revision: D25678879 Pulled By: ljk53 fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f	2021-01-05 14:12:39 -08:00
Edward Yang	8eee8460f8	codegen: Resolve overload ambiguities created by defaulted arguments (#49348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348 This is a redux of #45666 post refactor, based off of `d534f7d4c5` Credit goes to peterbell10 for the implementation. Fixes #43945. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594004 Pulled By: ezyang fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f	2021-01-04 11:59:16 -08:00
Sebastian Messmer	c7e9abb66a	Making ops c10-full: list of optional tensors (#49138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138 See for details: https://fb.quip.com/QRtJAin66lPN We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this. ## Backwards Compatibility - This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`. - This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57 - This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`. ghstack-source-id: 119269131 Test Plan: ## Benchmarks (C++ instruction counts): ### Forward #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x[0] = 1 \|11566015 \|11566015\|0 \|0.00% \| \|x.index({0}) \|6807019 \|6801019 \|-6000 \|-0.09%\| \|x.index({0, 0}) \|13529019 \|13557019\|28000 \|0.21% \| \|x.index({0, 0, 0}) \|10677004 \|10692004\|15000 \|0.14% \| \|x.index({"..."}) \|5512015 \|5506015 \|-6000 \|-0.11%\| \|x.index({Slice(None, None, None)}) \|6866016 \|6936016 \|70000 \|1.02% \| \|x.index({None}) \|8554015 \|8548015 \|-6000 \|-0.07%\| \|x.index({false}) \|22400000 \|22744000\|344000 \|1.54% \| \|x.index({true}) \|27624088 \|27264393\|-359695\|-1.30%\| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})\|123472000\|123463306\|-8694\|-0.01%\| ### Autograd #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}, torch::requires_grad()); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` Note: the script measures the forward path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path. #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x.index({0}) \|14839019\|14833019\|-6000\| 0.00% \| \|x.index({0, 0}) \|28342019\|28370019\|28000\| 0.00% \| \|x.index({0, 0, 0}) \|24434004\|24449004\|15000\| 0.00% \| \|x.index({"..."}) \|12773015\|12767015\|-6000\| 0.00% \| \|x.index({Slice(None, None, None)}) \|14837016\|14907016\|70000\| 0.47% \| \|x.index({None}) \|15926015\|15920015\|-6000\| 0.00% \| \|x.index({false}) \|36958000\|37477000\|519000\| 1.40% \| \|x.index({true}) \|41971408\|42426094\|454686\| 1.08% \| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) \|168184392\|164545682\|-3638710\| -2.16% \| Reviewed By: bhosmer Differential Revision: D25454632 fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d	2021-01-04 05:04:02 -08:00
Edward Yang	3efd5d8f01	Introduce tools.codegen.api.translate (#49122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122 cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes. This PR is a bit of a whopper. Here is the order to review it. - tools/codegen/api/types.py - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions. - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the semantic C++ type of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts. - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator. - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet - tools/codegen/api/cpp.py - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens). - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation - tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator - tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works. - Everything else: just some small call site tweaks for places when I changed API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25455887 Pulled By: ezyang fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6	2020-12-16 16:18:40 -08:00
Edward Yang	9b0ffb9fb3	Delete cpp.group_arguments (#49043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043 Previously, this function had nontrivial algorithmic content, but after #48195, this was just a swiss army knife for pasting together arguments while maintaining structure. I added some more properties for Arguments for convenient access in this way, and then inlined the implementation of group_arguments into all of its call sites, simplifying whenever contextual. This might be controversial, but I think the resulting code is easier to understand. You may notice that there is some modest code duplication between dispatcher.cpparguments_exprs and CppSignature.argument_packs. This is a known problem and I will be attempting to fix it in a follow up PR. Confirmed to be byte-for-byte compatible. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455885 Pulled By: ezyang fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f	2020-12-10 18:20:46 -08:00
Sebastian Messmer	3ef36dca8e	Faithful out arguments (#47712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47712 This adds a faithful API for ops with out arguments, as described in https://docs.google.com/document/d/1h7nBibRwkRLQ8rsPhfALlwWR0QbkdQm30u4ZBwmaps8/edit# . After this, an op will generate the following overloads for the C++ API: ```cpp // Generated from the aten::abs operator (NOT from aten::abs.out) Tensor at::abs(Tensor& self) // Generated from the aten::abs.out operator Tensor& at::abs(Tensor& self, Tensor& out) Tensor& at::abs_out(Tensor& out, Tensor& self) ``` This is an important step towards making those ops c10-full (it allows VariableType, XLA and other backends to ignore reordering and just call through with the same argument order), but this does not make any of those ops c10-full yet. It enables the faithful API independent from c10-fullness. That means the API is more consistent with the same API for all ops and making an op c10-full in the future will not trigger future C++ API changes. ghstack-source-id: 118068091 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24835252 fbshipit-source-id: dedfabd07140fc8347bbf16ff219aad3b20f2870	2020-12-08 03:48:42 -08:00
Edward Yang	742903c0df	Move argument grouping into FunctionSchema (#48195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195 The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process. Some fuzz in the process: * I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased) * I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn. * I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25231166 Pulled By: ezyang fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad	2020-12-02 07:57:11 -08:00
Edward Yang	ba5686f8c5	Refactor argument fields in FunctionSchema to Arguments (#48182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182 I'm planning to add a bunch more argument fields following https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and it will be a lot more convenient if the arguments get to live in their own dedicated struct. Type checker will tell you if I've done it wrong. No change to output. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25057897 Pulled By: ezyang fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691	2020-12-02 07:57:06 -08:00
Jiakai Liu	499d2fad98	[pytorch] factor out return_names api (#47437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47437 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808213 Pulled By: ljk53 fbshipit-source-id: 8ec6d58952fd677ab2d97e63b060cafda052411a	2020-11-09 12:39:37 -08:00
Ansley Ussery	6c5f634657	Fix grammar and spelling errors (#46713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46713 Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D24477771 Pulled By: ansley fbshipit-source-id: bc39b63ab2158a5233e48b89bfaa97a4cfb1f7a1	2020-10-23 01:31:17 -07:00
chengjun	5741de883a	Define the record_stream method in native_functions.yaml (#44301 ) Summary: The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device. Fixes https://github.com/pytorch/pytorch/issues/36556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301 Reviewed By: glaringlee Differential Revision: D23763954 Pulled By: ezyang fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293	2020-10-13 09:15:22 -07:00
Edward Yang	8d5c899b19	Rename legacy_dispatcher to native. (#45974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45974 The term "legacy dispatcher" caused a bunch of confusion between me and Sebastian when discussing what the intended semantics of legacy dispatcher argument is. Legacy dispatcher argument implies that you ought NOT to use it when you have use_c10_dispatcher: full; but that's not really what's going on; legacy dispatcher API describes the API that you write native:: functions (NativeFunctions.h) to. Renaming it here makes this more clear. I applied these seds: ``` git grep -l 'legacy_dispatcher' \| xargs sed -i 's/legacy_dispatcher/native/g' git grep -l 'legacydispatcher' \| xargs sed -i 's/legacydispatcher/native/g' git grep -l 'LegacyDispatcher' \| xargs sed -i 's/LegacyDispatcher/Native/g' ``` and also grepped for "legacy" in tools/codegen and fixed documentation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24223101 Pulled By: ezyang fbshipit-source-id: d1913b8b823b3b95e4546881bc0e876acfa881eb	2020-10-13 08:34:43 -07:00
Edward Yang	9079aea1ac	Rewrite implementation of faithful cpp signatures (#45890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45890 This rewrite is as per my comments at https://github.com/pytorch/pytorch/pull/44087#issuecomment-701664506 I did the rewrite by reverting #44087 and then reimplementing it on top. You may find it easier to review by diffing against master with only #44087 reverted. There are two main ideas. First, we now factor cpp argument processing into two phases operating on three representations of data: 1. `FunctionSchema` - this is the source from native_functions.yaml 2. `Union[Argument, ThisArgument, TensorOptionsArgument]` - this is the arguments after doing some basic semantic analysis to group them (for TensorOptions) or identify the this argument (if this is a method). There is only ever one of these per functions. 3. `Union[CppArgument, CppThisArgument, CppTensorOptionsArgument]` - this is the arguments after we've elaborated them to C++. There may be multiple of these per actual C++ signature. You can think of (2) as common processing, whereas (3) bakes in specific assumptions about whether or not you have a faithful or non-faithful signature. Second, we now have CppSignature and CppSignatureGroup representing the total public C++ API signature. So those dataclasses are what know how to render definitions/declarations, and you no longer have to manually type it out in the Functions/TensorMethods codegen. Here is an exhaustive accounting of the changes. tools.codegen.api.types - CppSignature and CppSignatureGroup got moved to tools.codegen.api.types - Add new CppThisArgument and CppTensorOptionsArguments (modeled off of ThisArgument and TensorOptionsArguments) so that we can retain high level semantic structure even after elaborating terms with C++ API information. Once this is done, we can refine CppArgument.argument to no longer contain a ThisArgument (ThisArgument is always translated to CppThisArgument. Note that this doesn't apply to TensorOptionsArguments, as those may be expanded or not expanded, and so you could get a single CppArgument for 'options') - Add no_default() functional mutator to easily remove default arguments from CppArgument and friends - Add an explicit_arguments() method to CppArgument and friends to extract (flat) argument list that must be explicitly written in the signature. This is everything except (Cpp)ThisArgument, and is also convenient when you don't care about the extra structure of CppTensorOptionsArguments tools.codegen.api.cpp - group_arguments is back, and it doesn't send things directly to a CppSignatureGroup; instead, it moves us from representation (1) to (2) (perhaps it should live in model). Here I changed my mind from my PR comment; I discovered it was not necessary to do classification at grouping time, and it was simpler and easier to do it later. - argument got split into argument_not_this/argument/argument_faithful. argument and argument_faithful are obvious enough what they do, and I needed argument_not_this as a more refined version of argument so that I could get the types to work out on TensorOptionsArguments tools.codegen.api.dispatcher - Here we start seeing the payoff. The old version of this code had a "scatter" mode and a "gather" mode. We don't need that anymore: cppargument_exprs is 100% type-directed via the passed in cpp arguments. I am able to write the functions without any reference to use_c10_dispatcher tools.codegen.gen - Instead of having exprs_str and types_str functions, I moved these to live directly on CppSignature, since it seemed pretty logical. - The actual codegen for TensorMethods/Functions is greatly simplified, since (1) all of the heavy lifting is now happening in CppSignature(Group) construction, and (2) I don't need to proxy one way or another, the new dispatcher translation code is able to handle both cases no problem. There is a little faffing about with ordering to reduce the old and new diff which could be removed afterwards. Here are codegen diffs. For use_c10_dispatcher: full: ``` +// aten::_cudnn_init_dropout_state(float dropout, bool train, int dropout_seed, , ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor Tensor _cudnn_init_dropout_state(double dropout, bool train, int64_t dropout_seed, const TensorOptions & options) { - return _cudnn_init_dropout_state(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); + static auto op = c10::Dispatcher::singleton() + .findSchemaOrThrow("aten::_cudnn_init_dropout_state", "") + .typed<Tensor (double, bool, int64_t, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); + return op.call(dropout, train, dropout_seed, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` Otherwise: ``` +// aten::empty_meta(int[] size, , ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Tensor empty_meta(IntArrayRef size, c10::optional<ScalarType> dtype, c10::optional<Layout> layout, c10::optional<Device> device, c10::optional<bool> pin_memory, c10::optional<MemoryFormat> memory_format) { - return empty_meta(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format); + static auto op = c10::Dispatcher::singleton() + .findSchemaOrThrow("aten::empty_meta", "") + .typed<Tensor (IntArrayRef, const TensorOptions &, c10::optional<MemoryFormat>)>(); + return op.call(size, TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory), memory_format); } ``` Things that I probably did not get right: - The Union[Argument, TensorOptionsArguments, ThisArgument] and the Cpp variants are starting to get a little unwieldy. Not sure if this means I should add a supertype (or at the very least an alias); in some cases I do purposely omit one of these from the Union - Code may not necessarily live in the most logical files. There isn't very much rhyme or reason to it. - The fields on CppSignature. They're not very well constrained and it will be better if people don't use them directly. - Disambiguation. We should do this properly in #44087 and we don't need special logic for deleting defaulting for faithful signatures; there is a more general story here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D24144035 Pulled By: ezyang fbshipit-source-id: a185f8bf9df8b44ca5718a7a44dac23cefd11c0a	2020-10-13 08:31:54 -07:00
Sebastian Messmer	6ba6ecb048	Only use hacky_wrapper_for_legacy_signatures if an op needs it (#45742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45742 Add a new flag to native_functions.yaml: `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` and the codegen only wraps kernels in the aforementioned wrapper if that flag is set. Apart from that, `use_c10_dispatcher: hacky_wrapper_for_legacy_signatures` is equivalent to `full`, i.e. it has full boxing and unboxing support. This greatly reduces the number of ops we apply the hacky_wrapper to, i.e. all ops marked as `use_c10_dispatcher: full` don't have it anymore. ghstack-source-id: 113982139 Test Plan: waitforsandcastle vs fbcode: https://www.internalfb.com/intern/fblearner/details/214511705/ vs base diff: https://www.internalfb.com/intern/fblearner/details/214693207/ Reviewed By: ezyang Differential Revision: D23328718 fbshipit-source-id: be120579477b3a05f26ca5f75025bfac37617620	2020-10-12 09:39:18 -07:00
Peter Bell	8d14b50e94	codegen: Improve array default handing (#45163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45163 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24132279 Pulled By: mruberry fbshipit-source-id: 77069e7526b35cf8d13ba448e313c90f20cc67cf	2020-10-07 22:27:28 -07:00
Peter Bell	8b39498a23	codegen: Allow string arguments to have defaults (#45665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45665 Fixes #43944 Note that the codegen doesn't use a proper parser so, in the same way as with lists, the string `, ` cannot appear in defaults or it will be interpreted as a splitting point between arguments. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24141835 Pulled By: ezyang fbshipit-source-id: 578127861fd2504917f4486c44100491a2c40343	2020-10-06 21:53:56 -07:00
Sebastian Messmer	6e2eee2b9d	Add faithful C++ API (#44087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44087 Each op taking a TensorOptions argument now has an additional overload in the C++ frontend where it takes scattered ScalarType, Layout, Device, bool instead of one TensorOptions argument. If it is a c10-full op, then the scattered version calls into the dispatcher and the gathered version is a proxy calling into the scattered version. If it is a non-c10-full op, then the gathered version calls into the dispatcher and the scattered version is a proxy calling into the gathered version. This should minimize the amount of gathering and scattering needed. This PR is also a prerequisite to remove the re-gathering of arguments that is currently happening in VariableKernel. Currently, VariableKernels gather arguments into a TensorOptions object to call into the C++ API. In a PR stacked on top of this, VariableKernel will just directly call into the scattered C++ API introduced here and avoid the gathering step. ghstack-source-id: 113355689 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216169815/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216169957/ Reviewed By: ezyang Differential Revision: D23492188 fbshipit-source-id: 3e84c467545ad9371e98e09075a311bd18411c5a	2020-10-02 04:08:53 -07:00
Brian Hirsh	2739a7c599	Byte-for-byte compatibility fixes in codegen (#44879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44879 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23825163 Pulled By: bdhirsh fbshipit-source-id: 4d8028274f82c401b393c4fe1b9e32de3f4909c6	2020-09-25 08:06:50 -07:00
Edward Yang	6ea89166bd	Rewrite of ATen code generator (#42629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629 How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23183978 Pulled By: ezyang fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763	2020-08-31 09:00:22 -07:00

30 Commits