pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Brian Hirsh	eca98fedb5	split out NamedCType from CType. Remove direct string comparison from autograd codegen (#55334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55334 The goal of this PR is to clean up some of the autograd codegen to compare C++ types using `CType` objects instead of raw strings. My last PR in the stack made that string comparison a little more fragile, since the raw C++ strings needed to be namespace-aware. I confirmed byte-for-byte no codegen changes vs. the last PR (which added namespaces to the codegen) by running `diff -qr ../pytorch-common_test/torch/csrc/autograd/generated/ ../pytorch-callgrind_test_after2/torch/csrc/autograd/generated/` and `diff -qr ../pytorch-common_test/build/aten/src/ATen/ ../pytorch-callgrind_test_after2/build/aten/src/ATen/` Note that a better end-state for the autograd codegen would be to do all of its type pattern matching directly off of JIT types, instead of off of CType’s (which are really just generated from JIT types, incorporating C++ specific semantics). That looks like it’ll require a pretty substantial change though, so I’m not doing it in this PR. As part of this change (and after talking with ezyang), I split off the `CType` data class into a separate `NamedCType` class, which holds a name and a `CType`. This way, `CType` only knows about actual C++ types, making it easier to compare CType’s to each other in the codegen when we only care about the type. The core change is in `types.py`, but it required a bunch of downstream changes to update all of the places where we create `CType`s to create `NamedCType`s instead. The main change in the autograd codegen was that I updated `SavedAttribute` to store a `NamedCType`. The other autograd changes all pretty much came from that change. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708347 Pulled By: bdhirsh fbshipit-source-id: 3e07c80569c7b229c638f389e76e319bff6315f9	2021-04-16 11:43:08 -07:00
Brian Hirsh	947c7a8215	add C++ namespacing logic to ctypes (#55047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047 Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace. Important changes are in `types.py`. How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor". This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`. One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :) I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708349 Pulled By: bdhirsh fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad	2021-04-16 11:43:06 -07:00
Brian Hirsh	164bee1d09	Return a CType instead of a string for returns, beef up CType (#55046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046 Updating `returns` in the codegen to return a CType instead of a raw string. This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces. I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types. There's a little bit of weirdness, types that are currently only used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708348 Pulled By: bdhirsh fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8	2021-04-16 11:41:46 -07:00
Edward Yang	6ec71ed4f9	Replace all direct cdata access with THPVariable_Unpack (#55799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799 I'm going to change the implementation of cdata soon so I need to abstract over cdata access with a function. Additionally, many users are casting manually casting to THPVariable to access the member so I can remove these unsafe casts in the client code (the implementation, of course, is still doing an unsafe cast.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27712130 Pulled By: ezyang fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3	2021-04-15 08:57:04 -07:00
albanD	1d49fd31c4	[reland] Add formulas and basic tests (#56083 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/49098 See original issue for details. The only difference with previous PR is the fix of the _embedding_bag_dense_backward formula to stop declaring a backward formula for an argument that does not exists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56083 Reviewed By: samestep Differential Revision: D27778221 Pulled By: albanD fbshipit-source-id: 159ef91ca931ef2ccfbc3d1c46c7880c32919dc9	2021-04-15 07:52:43 -07:00
Sam Estep	817fd932ac	Revert D25607505: Add formulas and basic tests Test Plan: revert-hammer Differential Revision: D25607505 (`70f5905565`) Original commit changeset: fe2315d58768 fbshipit-source-id: 519d7426a6f32f0db51c4f360e5d5a79dbaac99d	2021-04-14 14:50:43 -07:00
albanD	70f5905565	Add formulas and basic tests (#49098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49098 RFC: https://github.com/pytorch/rfcs/pull/11 This PR adds: - Codegen support to define forward grad formulas and few manual formulas - Codegen support to automatically generate formulas as well as few usage - Tests for basic forward grad components Codegen generated examples. For each of them, the only part that is changed is the if statement before the return checking for fw grad defined. - For manual entry: ```yaml - name: max(Tensor self) -> Tensor self: evenly_distribute_backward(grad, self, result) result: max_forward(self_fw_grad, self, result) ``` ```cpp Tensor max(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<MaxBackward1> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<MaxBackward1>(new MaxBackward1(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::max(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "max"); if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto self_primal = toLegacyPrimal(self); auto result_new_fw_grad = max_forward(self_fw_grad, self_primal, result); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op / false); } } if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } return result; } ``` - For element wise entry: ```yaml - name: abs(Tensor self) -> Tensor self: grad self.sgn() result: auto_element_wise ``` ```cpp Tensor abs(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<AbsBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AbsBackward>(new AbsBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::abs(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "abs"); if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto self_primal = toLegacyPrimal(self); auto result_new_fw_grad = self_fw_grad * self_primal.sgn(); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op / false); } } return result; } ``` - For linear entry: ```yaml - name: clone(Tensor self, , MemoryFormat? memory_format=None) -> Tensor self: grad result: auto_linear ``` ```cpp Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<CloneBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<CloneBackward>(new CloneBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::clone(self_, memory_format); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto result_new_fw_grad = at::clone(self_fw_grad, memory_format); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op */ false); } } return result; } ``` - For no entry: ```yaml - name: angle(Tensor self) -> Tensor self: angle_backward(grad, self) ``` ```cpp Tensor angle(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<AngleBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AngleBackward>(new AngleBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::angle(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "angle"); TORCH_CHECK(!(isFwGradDefined(self)), "Trying to use forward prop with angle that does not support it."); return result; } ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25607505 Pulled By: albanD fbshipit-source-id: fe2315d587689af1cd5968536fa26c680b8b8829	2021-04-14 14:13:30 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Wenlei Xie	70af5db7ca	Remove use_c10_dispatcher option (#54969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969 With all use cases to hacky wrapper removed, all kernels will be dispatched with c10 full dispatcher. ghstack-source-id: 125434790 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang, walterddr Differential Revision: D27436596 fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2	2021-03-31 16:24:24 -07:00
Edward Yang	6e8c4ad7fd	s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427 A StructuredNativeFunctions is no longer guaranteed to actually be structured (test structured property for that), so we rename this to a more neutral name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235380 Pulled By: ezyang fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8	2021-03-23 00:43:57 -07:00
Edward Yang	d226985257	Read out layout from options directly, rather than via backend (#54074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54074 I don't see why this shouldn't work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D27086594 Pulled By: ezyang fbshipit-source-id: 1d5f1997017ec48c4140f43e44f0d8a3df28ac7f	2021-03-22 08:20:13 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Ailing Zhang	aeb3e93351	Move view handling logic to gen_inplace_or_view_type.py (#53341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53341 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973912 Pulled By: ailzhang fbshipit-source-id: ea31bdef0beac6996d509f5d45ebefa3ea8e2b89	2021-03-11 21:25:15 -08:00
Ailing Zhang	9f75de278f	Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973914 Pulled By: ailzhang fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957	2021-03-11 19:58:45 -08:00
Edward Yang	37bf6c134b	Register DefaultBackend implementations for functional/inplace structured operators (#53037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037 As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985	2021-03-02 14:13:08 -08:00
Edward Yang	c5a67f1675	Fix minor inaccuracy in translate error reporting (#53032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53032 Previously, you could get this error message: ``` Failed to synthesize the expression "Tensor & out". When I failed, the following bindings were available in the context: const Tensor & self; const Tensor & other; Scalar alpha; const Tensor & op.outputs_[0]; ``` There's a problem with this error message: it doesn't seem like there is any 'out' argument available, but actually there is: the last binding in the context is it. We printed the expression, not the ctype name. After this patch, the context now prints as: ``` const Tensor & self; // self const Tensor & other; // other Scalar alpha; // alpha const Tensor & out; // op.outputs_[0] ``` Now it becomes clear that it's a const mismatch. Maybe we could also beef up the error message so it points out near misses, but I'll leave that to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26729768 Pulled By: ezyang fbshipit-source-id: adb363551a7145eac788943c20969c86b1f8a81b	2021-03-02 14:11:28 -08:00
Brian Hirsh	d02a2bd5d1	codegen'd API for redispatching (#52008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52008 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26356079 Pulled By: bdhirsh fbshipit-source-id: 1fd34fbb4dbc48cc8390cad99e30e0d04fc75a4f	2021-02-22 10:55:38 -08:00
Jiakai Liu	c9c4b871a5	[pytorch] reintroduce static dispatch (#51957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364	2021-02-19 11:41:39 -08:00
Edward Yang	4d85e30133	Support at::cpu on non-structured kernels (#51590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51590 This PR backports a subset of Jiakai's changes from https://github.com/pytorch/pytorch/pull/51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26209871 Pulled By: ezyang fbshipit-source-id: 8162686039675ab92a2af7a14f6b18941f8944df	2021-02-04 09:19:45 -08:00
Edward Yang	81c7c3bae5	Add api.structured; switch structured kernels to use const Tensor& everywhere (#51490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51490 Mutable Tensor ref is a source of endless confusion for kernel writers; if we're going to make everyone rewrite their kernels, might as well also get rid of mutable Tensor& while we're at it. This is a refactor-then-small-update double whammy. The refactor is to separate tools.codegen.api.structured from api.native for describing the type signatures of structured kernels (previously, I was naughtily reusing native for this purpose--now I need it to behave differently as Tensor). This started off as a copy paste, but since there are not that many structured kernels so far I could delete all of the legacy logic from native that didn't make sense (without having to go out and fix all the use sites all at once). One more small addition was teaching translate to convert Tensor& to const Tensor&. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26182413 Pulled By: ezyang fbshipit-source-id: ed636866add3581179669cf9283f9835fcaddc06	2021-02-03 14:03:46 -08:00
Edward Yang	648cdb7d0a	Relax type signature for tools.codegen.api.translate (#51477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51477 Passing in a full binding is still OK, but if you have less (e.g., an Expr/CType), that will do too. I'll need this for some codegen patches I'm doing later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26179560 Pulled By: ezyang fbshipit-source-id: 5730dfb2c91bf5325496e57b0c91eb6823c9194d	2021-02-03 14:00:47 -08:00
Scott Wolchok	4495b49ffa	[PyTorch] Pass TensorOptions by value (#51165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165 `TensorOptions` does not have a non-trivial copy, move, or destroy operation and is small enough to fit in a register, so it seems like we should pass it by value. ghstack-source-id: 120697498 Test Plan: Measured timing for empty framework overhead benchmark before & after this change: Before: ``` I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645 I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485 I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485 I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359 I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` After: ``` I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814 I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759 I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291 I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279 I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491 2,983.75 msec task-clock # 0.995 CPUs utilized ( +- 0.36% ) 243 context-switches # 0.082 K/sec ( +- 1.26% ) 1 cpu-migrations # 0.000 K/sec 11,422 page-faults # 0.004 M/sec ( +- 0.18% ) 5,928,639,486 cycles # 1.987 GHz ( +- 0.36% ) (50.02%) 16,105,928,210 instructions # 2.72 insn per cycle ( +- 0.05% ) (50.02%) 3,150,273,453 branches # 1055.809 M/sec ( +- 0.03% ) (50.05%) 3,713,617 branch-misses # 0.12% of all branches ( +- 0.83% ) (50.07%) ``` It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win. For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher: ``` 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` If I've done the arithmetic correctly, we have an 0.39% instruction count win. Reviewed By: ezyang Differential Revision: D25983863 fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2	2021-02-01 12:40:08 -08:00
Edward Yang	2ab497012f	Add at::cpu namespace of functions for structured kernels (#49505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49505 I have a problem which is that static runtime needs a way to bypass dispatch and call into kernels directly. Previously, it used native:: bindings to do this; but these bindings no longer exist for structured kernels! Enter at::cpu: a namespace of exactly at:: compatible functions that assume all of their arguments are CPU and non-autograd! The header looks like this: ``` namespace at { namespace cpu { CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & upsample_nearest1d_out(Tensor & out, const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor upsample_nearest1d(const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor & upsample_nearest1d_backward_out(Tensor & grad_input, const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt); CAFFE2_API Tensor upsample_nearest1d_backward(const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt); }} ``` This slows down static runtime because these are not the "allow resize of nonzero tensor" variant binding (unlike the ones I had manually written). We can restore this: it's a matter of adding codegen smarts to do this, but I haven't done it just yet since it's marginally more complicated. In principle, non-structured kernels could get this treatment too. But, like an evil mastermind, I'm withholding it from this patch, as an extra carrot to get people to migrate to structured muahahahaha. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25616105 Pulled By: ezyang fbshipit-source-id: 84955ae09d0b373ca1ed05e0e4e0074a18d1a0b5	2021-01-22 13:11:59 -08:00
Jiakai Liu	24fd84313f	[pytorch] fix ConstRefCType usage in codegen/api/native.py (#50742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50742 Fixed the other usage of `BaseCType('const ...&)` on #49138. Checked byte-for-byte compatibility of the codegen output. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25955565 Pulled By: ljk53 fbshipit-source-id: 83ebd6b039892b805444867ed97a6e2fa6e72225	2021-01-20 15:01:37 -08:00
Sebastian Messmer	47c57b8836	Fix Native signature for optional Tensor arguments (#50767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50767 The native signature for optional tensor arguments wrongly produced "Tensor" instead of "optional<Tensor>". We didn't notice this because all internal ops currently use hacky_wrapper, and for hacky_wrapper, "Tensor" is correct. This PR fixes that and ports one op (batch_norm) to not use hacky_wrapper anymore as a proof of fix. ghstack-source-id: 120017543 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25960941 fbshipit-source-id: ca6fe133109b5d85cff52390792cf552f12d9590	2021-01-19 21:55:46 -08:00
Sebastian Messmer	e4c41b6936	Remove codegen logic to support non-c10-full ops (#49164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164 This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops. This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet. ghstack-source-id: 119450487 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25462977 fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285	2021-01-06 14:17:36 -08:00
Jiakai Liu	e71a13e8a3	[pytorch][codegen] migrate gen_variable_type to new data model (#49735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735 This is the final wave of autograd codegen data model migration. After this PR: - autograd codegen no longer depends on Declarations.yaml; - autograd codegen sources are fully type annotated and pass mypy-strict check; To avoid potential merge conflicts with other pending PRs, some structural changes are intentionally avoided, e.g. didn't move inner methods out, didn't change all inner methods to avoid reading outer function's variables, and etc. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang, bhosmer Differential Revision: D25678879 Pulled By: ljk53 fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f	2021-01-05 14:12:39 -08:00
Edward Yang	6c833efd65	Move default or no default logic into native.argument (#49489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49489 Previously, it was done at a use site, but that meant other use sites don't get the right logic. Pushing it in makes sure everyone gets it. I also fixed one case of confusion where defn() was used to define a decl(). If you want to define a declaration with no defaults, say no_default().decl() which is more direct and will give us code reviewers a clue if you should have pushed this logic in. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25595407 Pulled By: ezyang fbshipit-source-id: 89c664f0ed4d95699794a0d3123d54d0f7e4cba4	2021-01-04 11:59:20 -08:00
Edward Yang	8eee8460f8	codegen: Resolve overload ambiguities created by defaulted arguments (#49348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348 This is a redux of #45666 post refactor, based off of `d534f7d4c5` Credit goes to peterbell10 for the implementation. Fixes #43945. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594004 Pulled By: ezyang fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f	2021-01-04 11:59:16 -08:00
Edward Yang	8e20594b38	Construct CppSignatureGroup from NativeFunction (#49245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245 This will make it easier to implement the POC in `d534f7d4c5` see also https://github.com/pytorch/pytorch/pull/45666 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594005 Pulled By: ezyang fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e	2021-01-04 11:55:28 -08:00
Sebastian Messmer	c7e9abb66a	Making ops c10-full: list of optional tensors (#49138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138 See for details: https://fb.quip.com/QRtJAin66lPN We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this. ## Backwards Compatibility - This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`. - This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57 - This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`. ghstack-source-id: 119269131 Test Plan: ## Benchmarks (C++ instruction counts): ### Forward #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x[0] = 1 \|11566015 \|11566015\|0 \|0.00% \| \|x.index({0}) \|6807019 \|6801019 \|-6000 \|-0.09%\| \|x.index({0, 0}) \|13529019 \|13557019\|28000 \|0.21% \| \|x.index({0, 0, 0}) \|10677004 \|10692004\|15000 \|0.14% \| \|x.index({"..."}) \|5512015 \|5506015 \|-6000 \|-0.11%\| \|x.index({Slice(None, None, None)}) \|6866016 \|6936016 \|70000 \|1.02% \| \|x.index({None}) \|8554015 \|8548015 \|-6000 \|-0.07%\| \|x.index({false}) \|22400000 \|22744000\|344000 \|1.54% \| \|x.index({true}) \|27624088 \|27264393\|-359695\|-1.30%\| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})\|123472000\|123463306\|-8694\|-0.01%\| ### Autograd #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}, torch::requires_grad()); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` Note: the script measures the forward path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path. #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x.index({0}) \|14839019\|14833019\|-6000\| 0.00% \| \|x.index({0, 0}) \|28342019\|28370019\|28000\| 0.00% \| \|x.index({0, 0, 0}) \|24434004\|24449004\|15000\| 0.00% \| \|x.index({"..."}) \|12773015\|12767015\|-6000\| 0.00% \| \|x.index({Slice(None, None, None)}) \|14837016\|14907016\|70000\| 0.47% \| \|x.index({None}) \|15926015\|15920015\|-6000\| 0.00% \| \|x.index({false}) \|36958000\|37477000\|519000\| 1.40% \| \|x.index({true}) \|41971408\|42426094\|454686\| 1.08% \| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) \|168184392\|164545682\|-3638710\| -2.16% \| Reviewed By: bhosmer Differential Revision: D25454632 fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d	2021-01-04 05:04:02 -08:00
Edward Yang	3efd5d8f01	Introduce tools.codegen.api.translate (#49122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122 cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes. This PR is a bit of a whopper. Here is the order to review it. - tools/codegen/api/types.py - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions. - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the semantic C++ type of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts. - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator. - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet - tools/codegen/api/cpp.py - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens). - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation - tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator - tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works. - Everything else: just some small call site tweaks for places when I changed API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25455887 Pulled By: ezyang fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6	2020-12-16 16:18:40 -08:00
Iurii Zdebskyi	5716b7db72	Enabled Scalar lists (#48222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48222 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25074765 Pulled By: izdeby fbshipit-source-id: 96ebe3c9907178c9338c03fb7993b2ecb26db8f4	2020-12-11 16:04:50 -08:00
Brian Hirsh	b94ec8c9f7	pyi codegen - removing byte-for-byte compatibility hacks (#49055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49055 Removed the majority of the TODO hacks that I added to the original pyi PR to maintain byte-for-byte compatibility. I left a few of the divergences between pyi deprecated vs. native signatures, since (a) they're smaller and (b) it might make more sense to kill the deprecated functions at some point entirely. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410847 Pulled By: bdhirsh fbshipit-source-id: cf07cdda92f7492cd83d363cbb810e3810f6b8c8	2020-12-11 13:29:19 -08:00
Brian Hirsh	9920adebfd	pyi cleanup (#49054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49054 These are some followups from the first pyi codegen PR. Still maintaining byte-for-byte compatibility in this one. - Separated `argument_str() with a pyi flag into two functions, `argument_str()` and `argument_str_pyi()` - Added a notes section for pyi at the top of `python.py` - Added a `Python Interface` section that I moved the free-standing pyi functions to Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410848 Pulled By: bdhirsh fbshipit-source-id: db83a80af900c32b5e32d67ce27767f6e7c2adfb	2020-12-11 13:27:41 -08:00
Edward Yang	59e822026c	Add manual_cpp_binding to native_functions.yaml (#49092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49092 Functions which specify manual_cpp_binding don't automatically get C++ bindings generated for them in TensorBody.h or Functions.h. This lets end users manually define the bindings themselves, which may be helpful if there is a way to short circuit the dispatcher entirely. contiguous() is switched to use this mechanism. Although manual_cpp_binding suggests that we don't generate the binding at all, it is often the case that there is some "fast path", but when this path is not satisfied, we should go back to the slow dispatch. So we still generate a fallback method/function which the user-defined binding can call into in case that we have to go slowpath. The correctness conditions for bindings manually written in this way are subtle. Here are the ones I can think of off the top of my head: - Whatever condition is tested in the C++ body, must ALSO be tested again in the native:: implementation on the other side of the dispatcher. This is because you are NOT GUARANTEED to hit the native:: implementation through the C++ binding, you may go straight to the implementation via a boxed call. - If a binding is written in this way, it is only safe to skip dispatch if you would have returned the same tensor as before. In any situation you would return a fresh tensor, you MUST go to the slow path, because you need to actually get to the autograd kernel. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25428440 Pulled By: swolchok fbshipit-source-id: 6e71767cb8d1086d56cd827c1d2d56cac8f6f5fe	2020-12-10 21:56:53 -08:00
Edward Yang	9b0ffb9fb3	Delete cpp.group_arguments (#49043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043 Previously, this function had nontrivial algorithmic content, but after #48195, this was just a swiss army knife for pasting together arguments while maintaining structure. I added some more properties for Arguments for convenient access in this way, and then inlined the implementation of group_arguments into all of its call sites, simplifying whenever contextual. This might be controversial, but I think the resulting code is easier to understand. You may notice that there is some modest code duplication between dispatcher.cpparguments_exprs and CppSignature.argument_packs. This is a known problem and I will be attempting to fix it in a follow up PR. Confirmed to be byte-for-byte compatible. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455885 Pulled By: ezyang fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f	2020-12-10 18:20:46 -08:00
Edward Yang	267641a245	Rename positional and kwarg_only to have flat prefix (#49042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49042 I want the names positional and kwarg_only to give the unflat representation (e.g., preserving TensorOptionsArguments in the returned Union). So I regret my original naming choice when I moved grouping to model. This renames them to have flat_ prefix and also adds a flat_non_out argument for cases where you just want to look at non-out arguments. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455884 Pulled By: ezyang fbshipit-source-id: f923f8881267a3e3e8e9521519412f7cc25034fc	2020-12-10 18:20:43 -08:00
Edward Yang	0dea76ecda	Delete some dead functions from tools.codegen.api.meta (#49041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49041 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D25455886 Pulled By: ezyang fbshipit-source-id: 5d7834d52f7032820ac2c73358bda77187c17224	2020-12-10 18:16:09 -08:00
Edward Yang	16b8e6ab01	Class-based structured kernels, with migration of add to framework (#48718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48718 This PR rewrites structured kernels to do the class-based mechanism (instead of defining a meta and impl function, they are methods on a class), and adds enough customizability on the class to support TensorIterator. To show it works, add is made a structured kernel. Don't forget to check https://github.com/pytorch/rfcs/pull/9 for a mostly up-to-date high level description of what's going on here. High level structure of this PR (the order you should review files): * TensorMeta.h - TensorMeta is deleted entirely; instead, meta functions will call `set_output` to allocate/resize their outputs. MetaBase gets a new `maybe_get_output` virtual method for retrieving the (possibly non-existent) output tensor in a meta function; this makes it easier to do special promotion behavior, e.g., as in TensorIterator. * TensorIterator.cpp - Two major changes: first, we add TensorIteratorBase::set_output, which is a "light" version of TensorIterator::set_output; it sets up the internal data structures in TensorIterator, but it doesn't do allocation (that is assumed to have been handled by the structured kernels framework). The control flow here is someone will call the subclassed set_output, which will allocate output, and then we will call the parent class (TensorIteratorBase) to populate the fields in TensorIterator so that other TensorIterator phases can keep track of it. Second, we add some tests for meta tensors, and skip parts of TensorIterator which are not necessary when data is not available. * tools/codegen/model.py - One new field in native_functions.yaml, structured_inherits. This lets you override the parent class of a structured meta class; normally it's MetaBase, but you can make it point at TensorIteratorBase instead for TensorIterator based kernels * tools/codegen/gen.py - Now generate all of the classes we promised. It's kind of hairy because this is the first draft. Check the RFC for what the output looks like, and then follow the logic here. There are some complications: I need to continue to generate old style wrapper functions even if an operator is structured, because SparseCPU/SparseCUDA/etc won't actually use structured kernels to start. The most complicated code generation is the instantiation of `set_output`, which by in large replicates the logic in `TensorIterator::set_output`. This will continue to live in codegen for the forseeable future as we would like to specialize this logic per device. * aten/src/ATen/native/UpSampleNearest1d.cpp - The previous structured kernel is ported to the new format. The changes are very modest. * aten/src/ATen/native/BinaryOps.cpp - Add is ported to structured. TODO: * Work out an appropriate entry point for static runtime, since native:: function stubs no longer are generated * Refactor TensorIteratorConfig construction into helper functions, like before * Make Tensor-Scalar addition structured to fix perf regression * Fix `verify_api_visibility.cpp` * Refactor tools/codegen/gen.py for clarity * Figure out why header changes resulted in undefined reference to `at::Tensor::operator[](long) const` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278031 Pulled By: ezyang fbshipit-source-id: 57c43a6e5df21929b68964d485995fbbae4d1f7b	2020-12-09 15:39:12 -08:00
Sebastian Messmer	3ef36dca8e	Faithful out arguments (#47712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47712 This adds a faithful API for ops with out arguments, as described in https://docs.google.com/document/d/1h7nBibRwkRLQ8rsPhfALlwWR0QbkdQm30u4ZBwmaps8/edit# . After this, an op will generate the following overloads for the C++ API: ```cpp // Generated from the aten::abs operator (NOT from aten::abs.out) Tensor at::abs(Tensor& self) // Generated from the aten::abs.out operator Tensor& at::abs(Tensor& self, Tensor& out) Tensor& at::abs_out(Tensor& out, Tensor& self) ``` This is an important step towards making those ops c10-full (it allows VariableType, XLA and other backends to ignore reordering and just call through with the same argument order), but this does not make any of those ops c10-full yet. It enables the faithful API independent from c10-fullness. That means the API is more consistent with the same API for all ops and making an op c10-full in the future will not trigger future C++ API changes. ghstack-source-id: 118068091 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D24835252 fbshipit-source-id: dedfabd07140fc8347bbf16ff219aad3b20f2870	2020-12-08 03:48:42 -08:00
Brian Hirsh	ba6511b304	pyi codegen update - remove Declarations.yaml (#48754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48754 The goal of this PR is to kill Declarations.yaml in the pyi codegen, in favor of native_functions + the existing python object model. High-level design Since the python signatures used by the `python_arg_parser` are “supposed” to resemble the corresponding pyi type hint signatures, I re-used the existing python object model that Jiakai defined in `tools/codegen/api/python.py`. This means that the pyi codegen now reads `native_functions.yaml`, parses it into a bunch of `PythonSignatureGroup` objects, and emits corresponding method + function variants of type-hint signatures for each one, respectively into `__init__.pyi` and `_VariableFunctions.pyi`. What makes this uglier is that pyi and the python arg parser have a number of differences in how they’re emitted. I expressed that through a `pyi` flag on the `PythonSignature` dataclass, that tells it whether or not to print itself as a pyi vs. arg_parser signature. One thing worth noting is how pyi generates signatures differently for native / deprecated op signatures. For native ops: - The pyi codegen fuses functional and out variants of each op into a single signature with an optional `out` argument. Ops without an `out` variant just get an ordinary functional signature. - Some ops that fit certain criteria also get a second “varargs” signature - basically ops with a single positional argument of type List[int]. For deprecated signatures: - Functional and out variants are not fused - they each get their own signature entry - There are no varargs signatures This is currently implemented through the `signature_str()` and `signature_str_vararg()` methods on the `PythonSignature`/`PythonSignatureDeprecated` classes. `signature_str()` knows how to print itself with/without out arguments, differently for native/deprecated ops. `signature_str_vararg()` optionally returns a vararg variant of the signature if one exists. Calling out the gap between python_arg_parser vs. pyi The two formats are notably different, so I don’t think we can expect to unify them completely. That said, I encountered a number of differences in the pyi codegen that looked wrong- I tried to call them out in the PR, to be removed later. Just as an example, looking at the `svd` signature in the python_arg_parser vs. the pyi type hint: python_arg_parser ``` Static PythonArgParser parser({ “svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None”, }, /traceable=/true); ``` Pyi ``` def svd(input: Tensor, some: _bool=True, compute_uv: _bool=True, , out: Optional[Tensor]=None) -> namedtuple_U_S_V: … ``` The two have obvious syntactic differences that we probably don’t plan on changing: the python_arg_parser doesn’t include `def` or return types, and it includes the type hint before the variable name. But the type of `out` in pyi is probably wrong, since `svd` has multiple output params. I tried to clearly call out any instances of the pyi codegen diverging in a way that looks buggy, so we can clean it up in a later PR (see the comments for details). Another particularly ugly “bug” that I kept in to maintain byte-for-byte compatibility is the fact that the pyi codegen groups operator overloads together. It turns out that the only reason it does this (as far as I can tell) is because is tacks on an out argument to signatures that don’t have one, if ANY overloads of that op have an out variant. E.g. consider the pyi type hints generated for `nanmedian` in `_VF.pyi`: ``` overload def nanmedian(input: Tensor, , out: Optional[Tensor]=None) -> Tensor: ... overload def nanmedian(input: Tensor, dim: _int, keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... overload def nanmedian(input: Tensor, dim: Union[str, ellipsis, None], keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... ``` And the corresponding native_functions.yaml entries: ``` - func: nanmedian(Tensor self) -> Tensor - func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices) - func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) ``` Signature 2 corresponds to entries 2 and 3 in native_functions, and Signature 3 corresponds to entries 4 and 5. But signature 1 has an optional out argument, even though entry 1 in native_functions.yaml has no out variant. I’d like to delete that logic in a later PR- that will also have the added benefit no longer requiring to group overloads together in the pyi codegen. We can just operate independently on each PythonSignatureGroup. More detailed accounting of the changes* Per file: gen_python_functions.py - `load_signatures()` can now skip deprecated signatures. Needed because pyi only includes deprecated functions, and skips their method variants (maybe we should add them in…?) - Moved `namedtuple_fieldnames` into python.cpp - `group_overloads()` can now opt to not sort the overloads (needed for byte-for-byte compact, pyi doesn’t sort for some reason) Python.py: - Gave `PythonSignature`and `PythonSignatureDeprecated` a `pyi` flag that tells it whether or not to print itself in pyi vs. python_arg_parser format - Added a `PythonReturns` dataclass , which is now a member of PythonSignature. It is only used by pyi. I found this useful because python returns need to know how to deal with named tuple returns properly. I also moved `namedtuple_fieldnames` into this file from gen_python_functions gen_pyi.py - Merged `get_py_torch_functions` and `get_py_variable_methods` into a single function, since they’re very similar - Lifted out all of the pyi type hint type-mapping mess and dropped it into python.py. This required updating the mapping to deal with NativeFunction objects instead of the outputs of Declarations.yaml (this was most of the logic in `type_to_python`, `arg_to_type_hint`, and `generate_type_hints`). `generate_type_hints` is now a small orchestration function that gathers the different signatures for each PythonSignatureGroup. - NamedTuples are now generated by calling `PythonReturn.named_tuple()` (in `generate_named_tuples()`), rather than appending to a global list A lot of hardcoded pyi signatures still live in `gen_pyi.py`. I didn’t look to closely into whether or not any of that can be removed as part of this PR. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25343802 Pulled By: bdhirsh fbshipit-source-id: f73e99e1afef934ff41e4aca3dabf34273459a52	2020-12-07 10:39:38 -08:00
Edward Yang	742903c0df	Move argument grouping into FunctionSchema (#48195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195 The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process. Some fuzz in the process: * I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased) * I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn. * I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D25231166 Pulled By: ezyang fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad	2020-12-02 07:57:11 -08:00
Edward Yang	ba5686f8c5	Refactor argument fields in FunctionSchema to Arguments (#48182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182 I'm planning to add a bunch more argument fields following https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and it will be a lot more convenient if the arguments get to live in their own dedicated struct. Type checker will tell you if I've done it wrong. No change to output. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25057897 Pulled By: ezyang fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691	2020-12-02 07:57:06 -08:00
Jiakai Liu	de284b6d35	[pytorch][codegen] add autograd data model (#48249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249 Introduced autograd related data models at tools.codegen.api.autograd. Migrated load_derivatives.py to produce the new data models from derivatives.yaml. It has clean mypy-strict result. Changed both gen_autograd_functions.py and gen_variable_type.py to consume the new data model. Added type annotations to gen_autograd_functions.py - it has clean mypy-strict result except for the .gen_autograd import (so haven't added it to the strict config in this PR). To limit the scope of the PR, gen_variable_type.py is not refactored, and the main structure of load_derivatives.py / gen_autograd_functions.py is kept. We only make necessary changes to make it work. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25086561 Pulled By: ljk53 fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729	2020-11-19 21:47:05 -08:00
Jiakai Liu	a36e646878	[pytorch][codegen] simplify python signature creation logic (#47977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47977 Avoid calling CppSignatureGroup api - python signature shouldn't depend on cpp signature. Still use cpp.group_arguments() to group TensorOptions. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24976334 Pulled By: ljk53 fbshipit-source-id: 5df5a7bbfd2b8cb460153e5bea4d91e65f716390	2020-11-18 12:26:50 -08:00
Edward Yang	cdc2d2843b	Structured kernel definitions (#45277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45277 Implements structured kernels as per https://github.com/pytorch/rfcs/pull/9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});` * Functional: before 1314105, after 1150705 * Out: before 915705, after 838405 These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D24253555 Test Plan: Imported from OSS Reviewed By: smessmer Pulled By: ezyang fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456	2020-11-17 15:24:43 -08:00
Jiakai Liu	d91cefb0d8	[pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745 This is a relatively small codegen. Reintroduced 'simple_type' to preserve old codegen output. It depends on some methods defined in gen_python_functions.py - next PR will clean up the remaining Declarations.yaml methods in gen_python_functions.py. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885068 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: ljk53 fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f	2020-11-14 02:24:39 -08:00
Jiakai Liu	499d2fad98	[pytorch] factor out return_names api (#47437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47437 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808213 Pulled By: ljk53 fbshipit-source-id: 8ec6d58952fd677ab2d97e63b060cafda052411a	2020-11-09 12:39:37 -08:00
Jiakai Liu	16c72a5a6b	[pytorch] continue to rewrite gen_python_functions.py with typed models (#46978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46978 Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589210 Pulled By: ljk53 fbshipit-source-id: e0c7e5b3672b41983f321400c2e2330d1462e76e	2020-11-08 01:34:12 -08:00

1 2

70 Commits