pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	c7edcd6968	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `9790d90e4b`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487	2022-08-27 01:23:17 +00:00
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00
Max Podkorytov	68d2d7866d	[static-runtime] change the backend for permute_copy (#83532 ) Summary: Testing wrappable dims Differential Revision: D38717563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83532 Approved by: https://github.com/mikeiovine	2022-08-17 18:10:36 +00:00
Kurt Mohler	2bfae07a79	Enable `dim=None` for `torch.mean` (#81286 ) Part of #79525 This will require coordination with XLA before merging, just like #79881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81286 Approved by: https://github.com/albanD	2022-07-28 22:34:56 +00:00
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
Kurt Mohler	23bdb570cf	Reland: Enable `dim=None` for `torch.sum` (#79881 ) Part of #29137 Reland of #75845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79881 Approved by: https://github.com/albanD, https://github.com/kulinseth	2022-07-09 00:54:42 +00:00
Max Podkorytov	bf75708ce4	[static-runtime] add nnc codegen for aten::div (#76903 ) Differential Revision: D36151087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76903 Approved by: https://github.com/mikeiovine	2022-06-22 05:47:44 +00:00
Nikita Shulga	4a4890cfb2	[BE] Use CamelCase for enum class members (#79772 ) Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased, and only defines should be ALL_CAPS Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](`6e90572bb9`): ``` /Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier enum class MemOverlap { NO, YES, TOO_HARD }; ^ /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES' #define YES __objc_yes ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-06-17 05:53:57 +00:00
PyTorch MergeBot	ee6ebfc06b	Revert "Enable `dim=None` for `torch.sum` (#75845 )" This reverts commit `e79a51f7db`. Reverted https://github.com/pytorch/pytorch/pull/75845 on behalf of https://github.com/malfet due to Breaks MacOS builds, see `e79a51f7db`	2022-06-16 22:01:41 +00:00
Kurt Mohler	e79a51f7db	Enable `dim=None` for `torch.sum` (#75845 ) Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75845 Approved by: https://github.com/ezyang	2022-06-16 20:17:07 +00:00
Michael Andreas Dagitses	606b234336	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 22:11:54 +00:00
PyTorch MergeBot	bcd7a20953	Revert "turn on -Werror=unused-function in our Bazel CPU build" This reverts commit `67d313a032`. Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: `67d313a032`	2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses	67d313a032	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 18:30:08 +00:00
Brian Hirsh	7b3a0ff87a	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-10 17:27:47 +00:00
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
Akshay Parashar	28f87b9cf9	[Static Runtime] Fix aten::clone out variant (#78297 ) (#78322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297 Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details. Crashing test case: a = tensor(3,1) // strides = (1,1) B = tensor(3,2) // strides = (2,1) Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0) temp.clone() // crashe on copy_ due to memoryOverlap Fix: Disable the out variant for the expanded tensor. - Calls native clone instead of out variant for clone dealing with expanded tensors - Added test case for both clone variants (out and native clones) - Increased the tensor size for memory planner test case to trigger dynamic allocation Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36672180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322 Approved by: https://github.com/mikeiovine	2022-06-02 21:06:59 +00:00
Max Podkorytov	ebfc70f37a	[static-runtime] out variant for aten::mean (#78161 ) Summary: As subject Test Plan: Added unit tests Differential Revision: D36614633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78161 Approved by: https://github.com/mikeiovine	2022-06-02 20:56:42 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Max Podkorytov	2679755bdc	[static-runtime] out variant for aten::max (#78271 ) Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim Test Plan: Added a unit test Differential Revision: D36656378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271 Approved by: https://github.com/mikeiovine	2022-05-26 23:29:27 +00:00
mikeiovine	56c23f5633	[SR] Out variant for embedding_bag_byte_unpack Pull Request resolved: https://github.com/pytorch/pytorch/pull/77661 Add an out variant and wrapper in static runtime. I just added the declaration with the others in `qembeddingbag.h` for now (rather than properly adding the out variant to the torch library). This can be fixed in a followup. Differential Revision: [D36449840](https://our.internmc.facebook.com/intern/diff/D36449840/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36449840/)! Approved by: https://github.com/tenpercent	2022-05-25 23:24:11 +00:00
Natalia Gimelshein	225b037df8	port clamp.Tensor to structured (#77149 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/77149 Approved by: https://github.com/ezyang	2022-05-11 21:00:02 +00:00
Max Podkorytov	a41d4f27d7	[static-runtime] refactor out variant for `aten::embedding_bag` (#76207 ) Differential Revision: D35767504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76207 Approved by: https://github.com/mikeiovine	2022-05-11 18:29:18 +00:00
mikeiovine	02713221e3	[SR] Fuse clamp/nan_to_num Pull Request resolved: https://github.com/pytorch/pytorch/pull/77094 Fuse `clamp` and `nan_to_num` in an NNC kernel. This leads to a big speed up on many models. We can avoid comparisons since clamp potentially gets rid of all of the `inf`s in the input tensor. Differential Revision: [D36220967](https://our.internmc.facebook.com/intern/diff/D36220967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36220967/)! Approved by: https://github.com/navahgar	2022-05-10 23:33:59 +00:00
Mike Iovine	9e32cdeda6	[SR] Use at::DimVector in reshape_copy_out (#76473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76473 Avoid some extra heap allocations by using DimVector ghstack-source-id: 155569314 Test Plan: Existing unit tests Reviewed By: navahgar, huiguoo Differential Revision: D35972439 fbshipit-source-id: 971998d6bcaaf9bb598772f1e2ca6b13f29f92a4 (cherry picked from commit f2b70c38fffe6355cd8b2f0eb36f299c0d50e5d8)	2022-05-05 17:31:54 +00:00
Natalia Gimelshein	122798916f	Port clamp_min and clamp_max to structured per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/76874 Approved by: https://github.com/bdhirsh	2022-05-05 15:52:20 +00:00
Mike Iovine	cac2733af1	[SR] Codegen for aten::clamp (#76340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76340 NNC kernel for `clamp` scalar case ghstack-source-id: 155466507 Reviewed By: navahgar, huiguoo Differential Revision: D35904019 fbshipit-source-id: e4115757f7e2cbdf364b88be3f599dfc3028750f (cherry picked from commit bdc4b918bc5a14490f46c79793f764b28c18388f)	2022-05-04 23:08:49 +00:00
Ansha Yu	ee636e2fd1	[sr] remove max_indices argument of embedding_bag when unncessary (#75993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75993 Strobelight shows copy_ in embedding_bag taking up a lot of time in adfinder_story_post_ad_session_exit_model 334827604_0 {F723683014} More details in https://fb.quip.com/MKumAjz1YD4 (`1f47a80e88`)a#temp:C:FPD3 (`ecd5567980`)e5a0871ae5d481286b511ef7 The last 3 outputs of embedding_bag are unused in the graph: P495814049. * max_indices output isn't necessary for the main output, so remove it when it's not used in the graph. * offset2bag is used as an intermediate to calculate the main output, so we don't remove this output even though it's unused in the graph. * bag_size is used as an intermediate to calculate the main output for MODE_MEAN, so we don't remove this for now. Test Plan: `./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 334827604 0 /data/users/ansha/tmp/ads_tail sr_only` Inputs uploaded to `/mnt/persistent-public/ansha/ads_tail/334827604` Before: I0414 10:53:12.261133 1070948 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.121318. Iters per second: 8242.78 0.11156 ms. 99.0457%. aten::embedding_bag (52 nodes, out variant) After: I0418 13:05:10.837378 2354604 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.0881273. Iters per second: 11347.2 0.0789221 ms. 98.7096%. static_runtime::embedding_bag (52 nodes, out variant) * Ads prod canary: https://www.internalfb.com/intern/ads/canary/443002539593035806/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_inline_cvr_post_imp -a D35726594` https://www.internalfb.com/intern/servicelab/602875732/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_10x_ctr_mbl_feed_non_mimo -a D35726594` https://www.internalfb.com/intern/servicelab/1002874745/ Reviewed By: mikeiovine Differential Revision: D35726594 fbshipit-source-id: 3b71a0822657bf7a23ce37ca899baef9997b011a (cherry picked from commit fd5e3098c047a1e7d4348e1c97341eecb892536e)	2022-04-22 15:36:35 +00:00
Yukio Siraichi	22a10ce513	Port `cat` kernel to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640 Approved by: https://github.com/ezyang	2022-04-14 17:49:43 +00:00
Don Jang	85e163c56b	[Static Runtime] Fix a bug that `aten::full_like` reuses a tensor that does not match arguments (#74255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74255 This change fixes a bug that `aten::full_like` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full_like` are dynamically changed. Test Plan: - Enhanced `StaticRuntime.FullLike` to cover the modified code path. Reviewed By: mikeiovine Differential Revision: D34863639 fbshipit-source-id: ca6d4ee3c039e263cc3a4f643d949cea59381608 (cherry picked from commit ae7db0af5e7d95d866027abc968afcb162fd2ef8)	2022-04-05 22:30:41 +00:00
Raghavan Raman	60bda4d06b	[Static Runtime] Fix handling relu in quantized linear relu dynamic op Summary: The implementation of `PackedLinearWeightFp16::apply_dynamic_impl` [here](https://www.internalfb.com/code/fbsource/[b1ef7c31f022]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp?lines=393) does not handle `relu`. It completely ignores the `ReluFused` boolean template parameter. At this point, callers of that function handle `relu` explicitly. While the correct thing to do would be to handle the `ReluFused` parameter in that implementation, it is not clear if that semantics is being followed in this code. So, we are handling this in SR's out-variant implementation, until the owner fixes that issue. This issue resulted in incorrect results when Static Runtime was enabled for the MRS video model. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=StaticRuntime.QuantizedLinearReluDynamicFp16 ``` Reviewed By: mikeiovine Differential Revision: D35366309 fbshipit-source-id: e60126e3590d52681ceaee5583b81c4c0b5404d9 (cherry picked from commit cabeb96a792339e7dbfd16cb51a3ac9039812137)	2022-04-04 22:16:22 +00:00
Mike Iovine	688039859f	[PyTorch][Static Runtime] out variant for where.self (#73438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73438 Add out variant for `where.self`; requires PyTorch core changes as no out variant existed previously ghstack-source-id: 151505601 Test Plan: * Existing `where` tests in static runtime pass * CI for core `where` tests Reviewed By: hlu1 Differential Revision: D34469785 fbshipit-source-id: 8a4ebbf38b2364534fbf43812bfcfdf69ea174b3 (cherry picked from commit d3faf61f408a385d67b5b821dfaf084a8e713f30)	2022-03-17 00:14:11 +00:00
Don Jang	6294a2eb7f	[Static Runtime] Add out variant wrapper for aten::index_select (#74321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74321 This change adds out variant wrapper for aten::index_select. Test Plan: Added a unittest Reviewed By: mikeiovine Differential Revision: D34928012 fbshipit-source-id: d808363d740d79fa25abee4dd33920fbb6ec7283 (cherry picked from commit ba9b3c0cd4ba240c4a2174f3376580a1880b2b4a)	2022-03-16 23:43:21 +00:00
Mike Iovine	f14a0be302	[SR] Avoid allocating rstd/mean in layer_norm (#73606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73606 The single-output overload of `layer_norm` internally allocates two tensors. As an optimization, we previously added `static_runtime::layer_norm`. This variant of layer norm had two extra outputs to make the memory planner aware of these extra tensors. But these outputs were unused; it's actually better for us to avoid the allocation and associated computations entirely. ghstack-source-id: 151394116 Test Plan: Existing unit tests Reviewed By: hlu1 Differential Revision: D34562131 fbshipit-source-id: c6a6560e60db43b0b100aedc54ea4265acb347de (cherry picked from commit 3bed52b6f688b93b9b032c3d2b4be68d08d8eb76)	2022-03-15 22:07:11 +00:00
Don Jang	381c0c080f	[Static Runtime] Fix a bug that `aten::full` reuses a tensor that does not match requested one (#73990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73990 This change fixes a bug that `aten::full` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full` are dynamically changed. This fix is applied to multiple other out variant wrappers added to Static Runtime, and their fixes are following. Test Plan: - Added a unittest. Reviewed By: mikeiovine Differential Revision: D34768718 fbshipit-source-id: b6958d6601d36253dd5d4f93596fb14055cca9c9 (cherry picked from commit 42acb40d3a1e9359c0f1a3c25481854e5ad344b6)	2022-03-15 16:13:52 +00:00
Don Jang	1b80f609b0	[Static Runtime] Add out variant wrapper for aten::ones_like (#73945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73945 This change adds add out variant wrapper for aten::ones_like. Test Plan: - Added a unittest. - Checked that the op execution got switched to its added out variant (P485330978). Reviewed By: hlu1 Differential Revision: D34727057 fbshipit-source-id: 5022a7f547d53b0c00459d3959ad3c6e6a8a62d5 (cherry picked from commit 1bec4680e8173654400b165d720a0902136dba0f)	2022-03-14 20:29:58 +00:00
Don Jang	60f22a40ef	[Static Runtime] Add out variant wrapper for aten::zeros (#73946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73946 This change adds an out variant wrapper for aten::zeros. Test Plan: - Added a unittest. - Confirmed that the added out variant gets executed by the unittest (P485324923). Reviewed By: mikeiovine Differential Revision: D34725843 fbshipit-source-id: 3ac02ba1914c4a51969381e610d4243df65071ed (cherry picked from commit 368836d51709b7f96c79114984a95606b29766b1)	2022-03-11 00:52:30 +00:00
Mike Iovine	97b20b9b50	[SR][easy] Stack/concat out variants do not segfault on empty inputs (#73704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73704 Empty inputs are invalid for these ops. But while looking for optimizations, I noticed that these ops just segfault when that happens, which is not helpful for users. Added a check/error message. ghstack-source-id: 150812721 Test Plan: New unit tests Reviewed By: hlu1 Differential Revision: D34596954 fbshipit-source-id: 6b22a3a255273920210dcd41f54a9d238bbbcc14 (cherry picked from commit 9e950bfffef36c320638662bdb72f19eb805a228)	2022-03-09 00:55:57 +00:00
Don Jang	71961d37bb	[Static Runtime] Add out variant wrapper for aten::ones (#73851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73851 This change adds an out variant wrapper for aten::ones Test Plan: Added a unittest Reviewed By: mikeiovine Differential Revision: D34557095 fbshipit-source-id: 0d2ac8d0ad6f73067e28c2cebd3b4a018a9b17ae (cherry picked from commit cc1dda957b8c3acd71de3aa6054c11a9aab5cfa6)	2022-03-07 20:33:22 +00:00
Don Jang	c62de0ac15	[Static Runtime] [Code Cleanup] Use `SROperator` for operators' function type (#73450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73450 This change uses `SROperator` for operators' function type Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D34483246 fbshipit-source-id: ed544bb91b676ed08983dc8dc78cedd0f77d499f (cherry picked from commit eb9de3ad8de043990c02f30ffa48a29c8e5e81f2)	2022-03-01 02:30:48 +00:00
Mike Iovine	d398d4d32c	[SR] Disable aten::where out variant (#73367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73367 The op is currently bugged w.r.t. a `condition` input that is not the same shape as the others: ``` def forward(self, cond_1d, x, y): shape = [-1] + [1] * (x.dim() - 1) cond = cond_1d.view(shape) return torch.where(cond, x, y).clone() Condition: 01 00 [ CPUBoolType{2} ] A: 06 -9 08 -8 [ CPULongType{2,2} ] B: -4 05 -5 -2 [ CPULongType{2,2} ] Actual: 06 05 -5 -2 [ CPULongType{2,2} ] Expected: 06 -9 -5 -2 [ CPULongType{2,2} ] ``` ghstack-source-id: 149963254 Test Plan: Unit tests exercise broadcasting Reviewed By: d1jang Differential Revision: D34454770 fbshipit-source-id: 6ad4c4ca6893d2b87852a17d437437d99ca94ab4 (cherry picked from commit 7135bc40e9fd930c08f5291b7d6b4902ec30005b)	2022-02-26 01:08:45 +00:00
Don Jang	5772b1afbc	[Static Runtime] Avoid checks during op execution for TupleConstruct & ListConstruct (#69029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69029 This change optimizes the execution of `prim::TupleConstruct` & `prim::ListConstruct` by performing case analysis at op loading time, not op execution time. Test Plan: - Existing unittests - Ran inline_cvr nets via ptvsc2_predictor_bench with compare_result=1 Reviewed By: swolchok Differential Revision: D32518670 fbshipit-source-id: 575b29b06eadf77ba9f1be306119fa194d4f21bf (cherry picked from commit 88cc2253b927267cad33063284e9cc66e0d31e2f)	2022-02-24 16:38:55 +00:00
Raghavan Raman	a7f9e610af	[Static Runtime] Adding out-variant support for quantized::linear_relu_dynamic_fp16 (#73238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73238 ghstack-source-id: 149706142 Test Plan: Tested on video model. Profile before: ``` 4.36922 ms. 17.1047%. quantized::linear_relu_dynamic_fp16 (14 nodes) ``` Profile after: ``` 3.80852 ms. 17.1074%. quantized::linear_relu_dynamic_fp16 (14 nodes, out variant) ``` Reviewed By: mikeiovine Differential Revision: D34287961 fbshipit-source-id: b88e2f3432215eac14fd36f945a4810d29ba1051 (cherry picked from commit 076a766ab471c362af2f2ee3b55fe75829f5e955)	2022-02-23 18:33:46 +00:00
Mike Iovine	6f84c5f0b9	[SR] Generalize VarStackNodeWrapper (#71573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71573 Many ops (`gather_ranges_to_dense`, `sigrid_transforms`, etc) are implemented like this: ``` void op_out_(std::vector<Tensor>& output) { // actual op implementation } std::vector<Tensor> op() { std::vector<Tensor> outputs; // populate outputs with empty tensors op_out_(outputs) return outputs; } ``` This pattern is not ideal for ops that are fused with `ListUnpack` - it would be better if we wrote to the outputs directly. This diff extends the ideas from `VarStackNodeWrapper` to allow for this. The changes are: * `s/VarStackNodeWrapper/ProcessedNodeInputWrapper`. The old name was bad because the class is more general than the `VarStack` use case. Also moved the class to `processed_node_wrapper.h` * Add a `ProcessedNodeOutputWrapper`; it's essentially the same idea as `ProcessedNodeInputWrapper`, but it allows non-const access to the underlying tensors. * These classes are very similar, so CRTP is used to facilitate code re-use ghstack-source-id: 148825800 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D33687965 fbshipit-source-id: 5fa0107211116867bb2b63968c126550d32fbea6 (cherry picked from commit `75c263d960`)	2022-02-10 19:43:47 +00:00
Scott Wolchok	958f9cf5ff	[PyTorch][Static Runtime] Fix extra refcount bumps in layer_norm (#71237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71237 Noticed these on inspection. ghstack-source-id: 147171799 Test Plan: CI Reviewed By: mikeiovine Differential Revision: D33519799 fbshipit-source-id: 167c63323b345a5822303cecdbbbbb959f66f6e4 (cherry picked from commit `57e8da2d35`)	2022-01-20 00:16:17 +00:00
Scott Wolchok	bf82d2012e	[PyTorch] Add IValue::toDimVector & mostly replace toIntVector with it (#71247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71247 Most uses of toIntVector() were for a Tensor shape. We have DimVector to avoid heap allocations in those cases, so let's use it. ghstack-source-id: 146933314 Test Plan: CI -- if we think DimVector is good in general then I think we have to think this change is good? Reviewed By: mikeiovine Differential Revision: D33556198 fbshipit-source-id: cf2ad92c2d0b99ab1df4da0f6843e6ccb9a6320b	2022-01-14 14:32:40 -08:00
Elias Ellison	c8332256ee	[JIT] Refactor SR invocation of fusion (#70508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70508 We can create the code object at compile time instead or runtime to speed it up. This also makes unnecessary the compilation cache. TODO: figure out if theres a way to cache InterpreterState object Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458648 Pulled By: eellison fbshipit-source-id: 710389741e7c6210528f2f96ab496fcd533d942a	2022-01-10 12:16:35 -08:00

1 2 3 4 5

234 Commits