pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
francescocastelli	5e6f296612	Structured Kernel Precompute codegen handle fields without replacement (#71368 ) Summary: I've added the parsing of an optional first line in native_functions.yaml after the precomputed keyword for arguments that will be precomputed without replacement. This line is optional, must be the first and does not contain any arrow. These new fields are precomputed as before in the meta function and added to the precompute struct returned by the meta function. For now I've put them as last args of the impl function where they can be reused. example: native_function.yaml: ``` ... precomputed: - int numBatch, int numPlanes, int inputT, int inputH, int inputW <- new - kernel_size -> int poolSizeT, int poolSizeH, int poolSizeW - output_size -> int outputT, int outputH, int outputW ``` meta: ``` TORCH_PRECOMPUTE_META_FUNC(fractional_max_pool3d)( const at::Tensor& input_, IntArrayRef pool_size, IntArrayRef output_size, const at::Tensor& randomSamples ) { ... return TORCH_PRECOMPUTE_STRUCT(fractional_max_pool3d)().set_numBatch(numBatch).set_numPlanes(numPlanes).set_inputT(inputT).set_inputH(inputH).set_inputW(inputW) .set_poolSizeT(poolSizeT) ... } ``` impl: ``` TORCH_IMPL_FUNC(fractional_max_pool3d_out_cpu)( const at::Tensor& input_, int64_t poolSizeT, int64_t poolSizeH, int64_t poolSizeW, int64_t outputT, int64_t outputH, int64_t outputW, const at::Tensor& randomSamples, const at::Tensor& output, const at::Tensor& indices, int64_t numBatch, <- for now I've put them here int64_t numPlanes, int64_t inputT, int64_t inputH, int64_t inputW) { ``` Fixes https://github.com/pytorch/pytorch/issues/71314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71368 Reviewed By: zou3519 Differential Revision: D33683984 Pulled By: bdhirsh fbshipit-source-id: 33066dd92b8743aadf0dc8102f6bf0689f843242 (cherry picked from commit `64e46af6a4`)	2022-02-08 03:56:56 +00:00
Brian Hirsh	665c148e42	move some codegen utilities into utils.py (#63094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094 This PR: - Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports - Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it) The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942096 Pulled By: bdhirsh fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e	2021-10-28 10:49:17 -07:00
Meghan Lele	968d7ee46a	[structured] Preserve computed elements from meta func to impl (#61746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746 Summary This commit introduces a new feature for structured kernels that allows kernels to declare quantities as "precomputed" in `native_functions.yaml`, compute them once in the `meta` function and reuse them again in the `impl`. The names and types of these quantities are used to generate code for a struct containing them that the `meta` function must return. In the case of a handful of surveyed kernels (`all,`, `any`, `avg_pool2d`), these quantities that are used both in the `meta` and `impl` have the same meaning as certain kernel arguments and in fact supersede them. Accordingly, the correspondence between a kernel argument and the precomputed elements that supersede it is also captured in `native_functions.yaml`. This information is used to unpack the struct returned by `meta` and pass its contents correctly to the `impl` function. The primary goal is to avoid recompute and enhance developer experience (e.g. sometimes people can forget to compute these elements while porting a kernel). Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D30407831 Pulled By: SplitInfinity fbshipit-source-id: 00975525ea373721fe52d06f75cd4ac91f3dc556	2021-09-01 14:34:25 -07:00
Meghan Lele	1d2ea76afb	`clamp`: port to structured kernel (#61361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361 This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag. For more information, see #55070. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29821533 Pulled By: SplitInfinity fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99	2021-07-23 02:02:07 -07:00
Meghan Lele	1c80b5220b	`nll_loss_forward`: port to structured kernel (#61443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61443 For more information, see #55070. This PR also adds a new type, `OptionalTensorRef` as a replacement for `c10::optional<Tensor>&` in order to avoid the reference count manipulations that are inevitable with the latter. I have confirmed using Godbolt/Compiler Explorer that this class does indeed avoid manipulating the reference count of the `intrusive_ptr` inside the `Tensor` it refers to: 1. [P429709479](https://www.internalfb.com/phabricator/paste/view/P429709479) - Given a `const Tensor&` in scope, an `OptionalTensorRef` can be constructed without bumping refcount. 2. [P429709883](https://www.internalfb.com/phabricator/paste/view/P429709883) - Given an `OptionalTensorRef`, a `const Tensor&` can be produced without bumping refcount. 3. [P429710335](https://www.internalfb.com/phabricator/paste/view/P429710335) - When `OptionalTensorRef` is destructed, the refcount should not be decremented. 4. [P429769525](https://www.internalfb.com/phabricator/paste/view/P429769525) - `OptionalTensorRef` can be assigned without refcount manipulation. 5. [P429769882](https://www.internalfb.com/phabricator/paste/view/P429769882) - `OptionalTensorRef` can be move assigned without refcount manipulation. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29780666 Pulled By: SplitInfinity fbshipit-source-id: 7af157215300e9254d635433cbd583f7329fe064	2021-07-20 11:45:44 -07:00
Brian Hirsh	eca98fedb5	split out NamedCType from CType. Remove direct string comparison from autograd codegen (#55334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55334 The goal of this PR is to clean up some of the autograd codegen to compare C++ types using `CType` objects instead of raw strings. My last PR in the stack made that string comparison a little more fragile, since the raw C++ strings needed to be namespace-aware. I confirmed byte-for-byte no codegen changes vs. the last PR (which added namespaces to the codegen) by running `diff -qr ../pytorch-common_test/torch/csrc/autograd/generated/ ../pytorch-callgrind_test_after2/torch/csrc/autograd/generated/` and `diff -qr ../pytorch-common_test/build/aten/src/ATen/ ../pytorch-callgrind_test_after2/build/aten/src/ATen/` Note that a better end-state for the autograd codegen would be to do all of its type pattern matching directly off of JIT types, instead of off of CType’s (which are really just generated from JIT types, incorporating C++ specific semantics). That looks like it’ll require a pretty substantial change though, so I’m not doing it in this PR. As part of this change (and after talking with ezyang), I split off the `CType` data class into a separate `NamedCType` class, which holds a name and a `CType`. This way, `CType` only knows about actual C++ types, making it easier to compare CType’s to each other in the codegen when we only care about the type. The core change is in `types.py`, but it required a bunch of downstream changes to update all of the places where we create `CType`s to create `NamedCType`s instead. The main change in the autograd codegen was that I updated `SavedAttribute` to store a `NamedCType`. The other autograd changes all pretty much came from that change. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708347 Pulled By: bdhirsh fbshipit-source-id: 3e07c80569c7b229c638f389e76e319bff6315f9	2021-04-16 11:43:08 -07:00
Brian Hirsh	947c7a8215	add C++ namespacing logic to ctypes (#55047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047 Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace. Important changes are in `types.py`. How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor". This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`. One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :) I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708349 Pulled By: bdhirsh fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad	2021-04-16 11:43:06 -07:00
Brian Hirsh	164bee1d09	Return a CType instead of a string for returns, beef up CType (#55046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046 Updating `returns` in the codegen to return a CType instead of a raw string. This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces. I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types. There's a little bit of weirdness, types that are currently only used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708348 Pulled By: bdhirsh fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8	2021-04-16 11:41:46 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Edward Yang	6e8c4ad7fd	s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427 A StructuredNativeFunctions is no longer guaranteed to actually be structured (test structured property for that), so we rename this to a more neutral name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235380 Pulled By: ezyang fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8	2021-03-23 00:43:57 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Ailing Zhang	9f75de278f	Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973914 Pulled By: ailzhang fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957	2021-03-11 19:58:45 -08:00
Edward Yang	81c7c3bae5	Add api.structured; switch structured kernels to use const Tensor& everywhere (#51490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51490 Mutable Tensor ref is a source of endless confusion for kernel writers; if we're going to make everyone rewrite their kernels, might as well also get rid of mutable Tensor& while we're at it. This is a refactor-then-small-update double whammy. The refactor is to separate tools.codegen.api.structured from api.native for describing the type signatures of structured kernels (previously, I was naughtily reusing native for this purpose--now I need it to behave differently as Tensor). This started off as a copy paste, but since there are not that many structured kernels so far I could delete all of the legacy logic from native that didn't make sense (without having to go out and fix all the use sites all at once). One more small addition was teaching translate to convert Tensor& to const Tensor&. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26182413 Pulled By: ezyang fbshipit-source-id: ed636866add3581179669cf9283f9835fcaddc06	2021-02-03 14:03:46 -08:00

13 Commits