pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	4a4890cfb2	[BE] Use CamelCase for enum class members (#79772 ) Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased, and only defines should be ALL_CAPS Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](`6e90572bb9`): ``` /Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier enum class MemOverlap { NO, YES, TOO_HARD }; ^ /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES' #define YES __objc_yes ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-06-17 05:53:57 +00:00
PyTorch MergeBot	ee6ebfc06b	Revert "Enable `dim=None` for `torch.sum` (#75845 )" This reverts commit `e79a51f7db`. Reverted https://github.com/pytorch/pytorch/pull/75845 on behalf of https://github.com/malfet due to Breaks MacOS builds, see `e79a51f7db`	2022-06-16 22:01:41 +00:00
Kurt Mohler	e79a51f7db	Enable `dim=None` for `torch.sum` (#75845 ) Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75845 Approved by: https://github.com/ezyang	2022-06-16 20:17:07 +00:00
Michael Andreas Dagitses	606b234336	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 22:11:54 +00:00
PyTorch MergeBot	bcd7a20953	Revert "turn on -Werror=unused-function in our Bazel CPU build" This reverts commit `67d313a032`. Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: `67d313a032`	2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses	67d313a032	turn on -Werror=unused-function in our Bazel CPU build Summary: We also fix any existing issues. Note that we only do this for the CPU build because nvcc is considered a C++ toolchain but it does not have the same flag support. Adding flags to the GPU build will cause nvcc errors. Test Plan: Built locally, rely on CI to confirm. Reviewers: malfet Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154 Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD	2022-06-10 18:30:08 +00:00
Brian Hirsh	7b3a0ff87a	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-10 17:27:47 +00:00
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
Akshay Parashar	28f87b9cf9	[Static Runtime] Fix aten::clone out variant (#78297 ) (#78322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297 Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details. Crashing test case: a = tensor(3,1) // strides = (1,1) B = tensor(3,2) // strides = (2,1) Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0) temp.clone() // crashe on copy_ due to memoryOverlap Fix: Disable the out variant for the expanded tensor. - Calls native clone instead of out variant for clone dealing with expanded tensors - Added test case for both clone variants (out and native clones) - Increased the tensor size for memory planner test case to trigger dynamic allocation Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36672180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322 Approved by: https://github.com/mikeiovine	2022-06-02 21:06:59 +00:00
Max Podkorytov	ebfc70f37a	[static-runtime] out variant for aten::mean (#78161 ) Summary: As subject Test Plan: Added unit tests Differential Revision: D36614633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78161 Approved by: https://github.com/mikeiovine	2022-06-02 20:56:42 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Max Podkorytov	2679755bdc	[static-runtime] out variant for aten::max (#78271 ) Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim Test Plan: Added a unit test Differential Revision: D36656378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271 Approved by: https://github.com/mikeiovine	2022-05-26 23:29:27 +00:00
mikeiovine	56c23f5633	[SR] Out variant for embedding_bag_byte_unpack Pull Request resolved: https://github.com/pytorch/pytorch/pull/77661 Add an out variant and wrapper in static runtime. I just added the declaration with the others in `qembeddingbag.h` for now (rather than properly adding the out variant to the torch library). This can be fixed in a followup. Differential Revision: [D36449840](https://our.internmc.facebook.com/intern/diff/D36449840/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36449840/)! Approved by: https://github.com/tenpercent	2022-05-25 23:24:11 +00:00
Natalia Gimelshein	225b037df8	port clamp.Tensor to structured (#77149 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/77149 Approved by: https://github.com/ezyang	2022-05-11 21:00:02 +00:00
Max Podkorytov	a41d4f27d7	[static-runtime] refactor out variant for `aten::embedding_bag` (#76207 ) Differential Revision: D35767504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76207 Approved by: https://github.com/mikeiovine	2022-05-11 18:29:18 +00:00
mikeiovine	02713221e3	[SR] Fuse clamp/nan_to_num Pull Request resolved: https://github.com/pytorch/pytorch/pull/77094 Fuse `clamp` and `nan_to_num` in an NNC kernel. This leads to a big speed up on many models. We can avoid comparisons since clamp potentially gets rid of all of the `inf`s in the input tensor. Differential Revision: [D36220967](https://our.internmc.facebook.com/intern/diff/D36220967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36220967/)! Approved by: https://github.com/navahgar	2022-05-10 23:33:59 +00:00
Mike Iovine	9e32cdeda6	[SR] Use at::DimVector in reshape_copy_out (#76473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76473 Avoid some extra heap allocations by using DimVector ghstack-source-id: 155569314 Test Plan: Existing unit tests Reviewed By: navahgar, huiguoo Differential Revision: D35972439 fbshipit-source-id: 971998d6bcaaf9bb598772f1e2ca6b13f29f92a4 (cherry picked from commit f2b70c38fffe6355cd8b2f0eb36f299c0d50e5d8)	2022-05-05 17:31:54 +00:00
Natalia Gimelshein	122798916f	Port clamp_min and clamp_max to structured per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/76874 Approved by: https://github.com/bdhirsh	2022-05-05 15:52:20 +00:00
Mike Iovine	cac2733af1	[SR] Codegen for aten::clamp (#76340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76340 NNC kernel for `clamp` scalar case ghstack-source-id: 155466507 Reviewed By: navahgar, huiguoo Differential Revision: D35904019 fbshipit-source-id: e4115757f7e2cbdf364b88be3f599dfc3028750f (cherry picked from commit bdc4b918bc5a14490f46c79793f764b28c18388f)	2022-05-04 23:08:49 +00:00
Ansha Yu	ee636e2fd1	[sr] remove max_indices argument of embedding_bag when unncessary (#75993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75993 Strobelight shows copy_ in embedding_bag taking up a lot of time in adfinder_story_post_ad_session_exit_model 334827604_0 {F723683014} More details in https://fb.quip.com/MKumAjz1YD4 (`1f47a80e88`)a#temp:C:FPD3 (`ecd5567980`)e5a0871ae5d481286b511ef7 The last 3 outputs of embedding_bag are unused in the graph: P495814049. * max_indices output isn't necessary for the main output, so remove it when it's not used in the graph. * offset2bag is used as an intermediate to calculate the main output, so we don't remove this output even though it's unused in the graph. * bag_size is used as an intermediate to calculate the main output for MODE_MEAN, so we don't remove this for now. Test Plan: `./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 334827604 0 /data/users/ansha/tmp/ads_tail sr_only` Inputs uploaded to `/mnt/persistent-public/ansha/ads_tail/334827604` Before: I0414 10:53:12.261133 1070948 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.121318. Iters per second: 8242.78 0.11156 ms. 99.0457%. aten::embedding_bag (52 nodes, out variant) After: I0418 13:05:10.837378 2354604 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.0881273. Iters per second: 11347.2 0.0789221 ms. 98.7096%. static_runtime::embedding_bag (52 nodes, out variant) * Ads prod canary: https://www.internalfb.com/intern/ads/canary/443002539593035806/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_inline_cvr_post_imp -a D35726594` https://www.internalfb.com/intern/servicelab/602875732/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_10x_ctr_mbl_feed_non_mimo -a D35726594` https://www.internalfb.com/intern/servicelab/1002874745/ Reviewed By: mikeiovine Differential Revision: D35726594 fbshipit-source-id: 3b71a0822657bf7a23ce37ca899baef9997b011a (cherry picked from commit fd5e3098c047a1e7d4348e1c97341eecb892536e)	2022-04-22 15:36:35 +00:00
Yukio Siraichi	22a10ce513	Port `cat` kernel to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640 Approved by: https://github.com/ezyang	2022-04-14 17:49:43 +00:00
Don Jang	85e163c56b	[Static Runtime] Fix a bug that `aten::full_like` reuses a tensor that does not match arguments (#74255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74255 This change fixes a bug that `aten::full_like` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full_like` are dynamically changed. Test Plan: - Enhanced `StaticRuntime.FullLike` to cover the modified code path. Reviewed By: mikeiovine Differential Revision: D34863639 fbshipit-source-id: ca6d4ee3c039e263cc3a4f643d949cea59381608 (cherry picked from commit ae7db0af5e7d95d866027abc968afcb162fd2ef8)	2022-04-05 22:30:41 +00:00
Raghavan Raman	60bda4d06b	[Static Runtime] Fix handling relu in quantized linear relu dynamic op Summary: The implementation of `PackedLinearWeightFp16::apply_dynamic_impl` [here](https://www.internalfb.com/code/fbsource/[b1ef7c31f022]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp?lines=393) does not handle `relu`. It completely ignores the `ReluFused` boolean template parameter. At this point, callers of that function handle `relu` explicitly. While the correct thing to do would be to handle the `ReluFused` parameter in that implementation, it is not clear if that semantics is being followed in this code. So, we are handling this in SR's out-variant implementation, until the owner fixes that issue. This issue resulted in incorrect results when Static Runtime was enabled for the MRS video model. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=StaticRuntime.QuantizedLinearReluDynamicFp16 ``` Reviewed By: mikeiovine Differential Revision: D35366309 fbshipit-source-id: e60126e3590d52681ceaee5583b81c4c0b5404d9 (cherry picked from commit cabeb96a792339e7dbfd16cb51a3ac9039812137)	2022-04-04 22:16:22 +00:00
Mike Iovine	688039859f	[PyTorch][Static Runtime] out variant for where.self (#73438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73438 Add out variant for `where.self`; requires PyTorch core changes as no out variant existed previously ghstack-source-id: 151505601 Test Plan: * Existing `where` tests in static runtime pass * CI for core `where` tests Reviewed By: hlu1 Differential Revision: D34469785 fbshipit-source-id: 8a4ebbf38b2364534fbf43812bfcfdf69ea174b3 (cherry picked from commit d3faf61f408a385d67b5b821dfaf084a8e713f30)	2022-03-17 00:14:11 +00:00
Don Jang	6294a2eb7f	[Static Runtime] Add out variant wrapper for aten::index_select (#74321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74321 This change adds out variant wrapper for aten::index_select. Test Plan: Added a unittest Reviewed By: mikeiovine Differential Revision: D34928012 fbshipit-source-id: d808363d740d79fa25abee4dd33920fbb6ec7283 (cherry picked from commit ba9b3c0cd4ba240c4a2174f3376580a1880b2b4a)	2022-03-16 23:43:21 +00:00
Mike Iovine	f14a0be302	[SR] Avoid allocating rstd/mean in layer_norm (#73606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73606 The single-output overload of `layer_norm` internally allocates two tensors. As an optimization, we previously added `static_runtime::layer_norm`. This variant of layer norm had two extra outputs to make the memory planner aware of these extra tensors. But these outputs were unused; it's actually better for us to avoid the allocation and associated computations entirely. ghstack-source-id: 151394116 Test Plan: Existing unit tests Reviewed By: hlu1 Differential Revision: D34562131 fbshipit-source-id: c6a6560e60db43b0b100aedc54ea4265acb347de (cherry picked from commit 3bed52b6f688b93b9b032c3d2b4be68d08d8eb76)	2022-03-15 22:07:11 +00:00
Don Jang	381c0c080f	[Static Runtime] Fix a bug that `aten::full` reuses a tensor that does not match requested one (#73990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73990 This change fixes a bug that `aten::full` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full` are dynamically changed. This fix is applied to multiple other out variant wrappers added to Static Runtime, and their fixes are following. Test Plan: - Added a unittest. Reviewed By: mikeiovine Differential Revision: D34768718 fbshipit-source-id: b6958d6601d36253dd5d4f93596fb14055cca9c9 (cherry picked from commit 42acb40d3a1e9359c0f1a3c25481854e5ad344b6)	2022-03-15 16:13:52 +00:00
Don Jang	1b80f609b0	[Static Runtime] Add out variant wrapper for aten::ones_like (#73945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73945 This change adds add out variant wrapper for aten::ones_like. Test Plan: - Added a unittest. - Checked that the op execution got switched to its added out variant (P485330978). Reviewed By: hlu1 Differential Revision: D34727057 fbshipit-source-id: 5022a7f547d53b0c00459d3959ad3c6e6a8a62d5 (cherry picked from commit 1bec4680e8173654400b165d720a0902136dba0f)	2022-03-14 20:29:58 +00:00
Don Jang	60f22a40ef	[Static Runtime] Add out variant wrapper for aten::zeros (#73946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73946 This change adds an out variant wrapper for aten::zeros. Test Plan: - Added a unittest. - Confirmed that the added out variant gets executed by the unittest (P485324923). Reviewed By: mikeiovine Differential Revision: D34725843 fbshipit-source-id: 3ac02ba1914c4a51969381e610d4243df65071ed (cherry picked from commit 368836d51709b7f96c79114984a95606b29766b1)	2022-03-11 00:52:30 +00:00
Mike Iovine	97b20b9b50	[SR][easy] Stack/concat out variants do not segfault on empty inputs (#73704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73704 Empty inputs are invalid for these ops. But while looking for optimizations, I noticed that these ops just segfault when that happens, which is not helpful for users. Added a check/error message. ghstack-source-id: 150812721 Test Plan: New unit tests Reviewed By: hlu1 Differential Revision: D34596954 fbshipit-source-id: 6b22a3a255273920210dcd41f54a9d238bbbcc14 (cherry picked from commit 9e950bfffef36c320638662bdb72f19eb805a228)	2022-03-09 00:55:57 +00:00
Don Jang	71961d37bb	[Static Runtime] Add out variant wrapper for aten::ones (#73851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73851 This change adds an out variant wrapper for aten::ones Test Plan: Added a unittest Reviewed By: mikeiovine Differential Revision: D34557095 fbshipit-source-id: 0d2ac8d0ad6f73067e28c2cebd3b4a018a9b17ae (cherry picked from commit cc1dda957b8c3acd71de3aa6054c11a9aab5cfa6)	2022-03-07 20:33:22 +00:00
Don Jang	c62de0ac15	[Static Runtime] [Code Cleanup] Use `SROperator` for operators' function type (#73450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73450 This change uses `SROperator` for operators' function type Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D34483246 fbshipit-source-id: ed544bb91b676ed08983dc8dc78cedd0f77d499f (cherry picked from commit eb9de3ad8de043990c02f30ffa48a29c8e5e81f2)	2022-03-01 02:30:48 +00:00
Mike Iovine	d398d4d32c	[SR] Disable aten::where out variant (#73367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73367 The op is currently bugged w.r.t. a `condition` input that is not the same shape as the others: ``` def forward(self, cond_1d, x, y): shape = [-1] + [1] * (x.dim() - 1) cond = cond_1d.view(shape) return torch.where(cond, x, y).clone() Condition: 01 00 [ CPUBoolType{2} ] A: 06 -9 08 -8 [ CPULongType{2,2} ] B: -4 05 -5 -2 [ CPULongType{2,2} ] Actual: 06 05 -5 -2 [ CPULongType{2,2} ] Expected: 06 -9 -5 -2 [ CPULongType{2,2} ] ``` ghstack-source-id: 149963254 Test Plan: Unit tests exercise broadcasting Reviewed By: d1jang Differential Revision: D34454770 fbshipit-source-id: 6ad4c4ca6893d2b87852a17d437437d99ca94ab4 (cherry picked from commit 7135bc40e9fd930c08f5291b7d6b4902ec30005b)	2022-02-26 01:08:45 +00:00
Don Jang	5772b1afbc	[Static Runtime] Avoid checks during op execution for TupleConstruct & ListConstruct (#69029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69029 This change optimizes the execution of `prim::TupleConstruct` & `prim::ListConstruct` by performing case analysis at op loading time, not op execution time. Test Plan: - Existing unittests - Ran inline_cvr nets via ptvsc2_predictor_bench with compare_result=1 Reviewed By: swolchok Differential Revision: D32518670 fbshipit-source-id: 575b29b06eadf77ba9f1be306119fa194d4f21bf (cherry picked from commit 88cc2253b927267cad33063284e9cc66e0d31e2f)	2022-02-24 16:38:55 +00:00
Raghavan Raman	a7f9e610af	[Static Runtime] Adding out-variant support for quantized::linear_relu_dynamic_fp16 (#73238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73238 ghstack-source-id: 149706142 Test Plan: Tested on video model. Profile before: ``` 4.36922 ms. 17.1047%. quantized::linear_relu_dynamic_fp16 (14 nodes) ``` Profile after: ``` 3.80852 ms. 17.1074%. quantized::linear_relu_dynamic_fp16 (14 nodes, out variant) ``` Reviewed By: mikeiovine Differential Revision: D34287961 fbshipit-source-id: b88e2f3432215eac14fd36f945a4810d29ba1051 (cherry picked from commit 076a766ab471c362af2f2ee3b55fe75829f5e955)	2022-02-23 18:33:46 +00:00
Mike Iovine	6f84c5f0b9	[SR] Generalize VarStackNodeWrapper (#71573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71573 Many ops (`gather_ranges_to_dense`, `sigrid_transforms`, etc) are implemented like this: ``` void op_out_(std::vector<Tensor>& output) { // actual op implementation } std::vector<Tensor> op() { std::vector<Tensor> outputs; // populate outputs with empty tensors op_out_(outputs) return outputs; } ``` This pattern is not ideal for ops that are fused with `ListUnpack` - it would be better if we wrote to the outputs directly. This diff extends the ideas from `VarStackNodeWrapper` to allow for this. The changes are: * `s/VarStackNodeWrapper/ProcessedNodeInputWrapper`. The old name was bad because the class is more general than the `VarStack` use case. Also moved the class to `processed_node_wrapper.h` * Add a `ProcessedNodeOutputWrapper`; it's essentially the same idea as `ProcessedNodeInputWrapper`, but it allows non-const access to the underlying tensors. * These classes are very similar, so CRTP is used to facilitate code re-use ghstack-source-id: 148825800 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D33687965 fbshipit-source-id: 5fa0107211116867bb2b63968c126550d32fbea6 (cherry picked from commit `75c263d960`)	2022-02-10 19:43:47 +00:00
Scott Wolchok	958f9cf5ff	[PyTorch][Static Runtime] Fix extra refcount bumps in layer_norm (#71237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71237 Noticed these on inspection. ghstack-source-id: 147171799 Test Plan: CI Reviewed By: mikeiovine Differential Revision: D33519799 fbshipit-source-id: 167c63323b345a5822303cecdbbbbb959f66f6e4 (cherry picked from commit `57e8da2d35`)	2022-01-20 00:16:17 +00:00
Scott Wolchok	bf82d2012e	[PyTorch] Add IValue::toDimVector & mostly replace toIntVector with it (#71247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71247 Most uses of toIntVector() were for a Tensor shape. We have DimVector to avoid heap allocations in those cases, so let's use it. ghstack-source-id: 146933314 Test Plan: CI -- if we think DimVector is good in general then I think we have to think this change is good? Reviewed By: mikeiovine Differential Revision: D33556198 fbshipit-source-id: cf2ad92c2d0b99ab1df4da0f6843e6ccb9a6320b	2022-01-14 14:32:40 -08:00
Elias Ellison	c8332256ee	[JIT] Refactor SR invocation of fusion (#70508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70508 We can create the code object at compile time instead or runtime to speed it up. This also makes unnecessary the compilation cache. TODO: figure out if theres a way to cache InterpreterState object Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33458648 Pulled By: eellison fbshipit-source-id: 710389741e7c6210528f2f96ab496fcd533d942a	2022-01-10 12:16:35 -08:00
Mike Iovine	9ad21091dd	[SR] Give VarStackNodeWrapper an iterator (#69922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69922 D32596934 (`65f54bc000`) made the serial stack implementation a bit brittle. It introduced a new container type: `VarStackNodeWrapper`. This type was used as a template parameter in the serial stack implementation. The other type used in the serial stack implementation is `at::ArrayRef<at::Tensor>`. Ideally, the interface of `VarStackNodeWrapper` should be as close as possible to this other type. However, because the new container type did not have an iterator, expressions like this would fail to compile: ``` for (const auto& tensor : tensors) { // do something } ``` Introducing this iterator will make the code easier to maintain going forward. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` I consider this a `VarStack` implementation detail, so I'd prefer not to test it directly. We can test it implicitly by adding some code to the serial stack implementation that uses the iterator. Reviewed By: swolchok Differential Revision: D33101489 fbshipit-source-id: 7cf44c072d230c41bd9113cf2393bc6a6645a5b5	2022-01-07 07:24:47 -08:00
Scott Wolchok	4d8fc8693c	[PyTorch][Static Runtime] Support memory planning for torch.to() w/o requiring copying (#67223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67223 ghstack-source-id: 146482215 Test Plan: See perf measurements on ctr_mobile_feed local_ro net for this stack: P467203421 (local is neutral: P467267554) Reviewed By: hlu1 Differential Revision: D31776259 fbshipit-source-id: f84fcaa05029577213f3bf2ae9d4b987b68480b3	2022-01-04 22:36:10 -08:00
Scott Wolchok	99a10c371f	[PyTorch][Static Runtime] Fix dtype changing between iterations for to() (#67394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67394 ghstack-source-id: 146464294 Test Plan: Added new test, which failed but now passes. Checked perf on ctr_mobile_feed local net (still not on recordio inputs yet), looks neutral ``` Stable, local ======================================== I1027 13:40:23.411118 2156917 PyTorchPredictorBenchLib.cpp:131] PyTorch predictor: number of prediction threads 1 I1027 13:40:48.708222 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.16975. Iters per second: 162.081 I1027 13:41:13.915948 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.1487. Iters per second: 162.636 I1027 13:41:38.984462 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.11408. Iters per second: 163.557 I1027 13:42:04.138948 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.13566. Iters per second: 162.982 I1027 13:42:29.342630 2156917 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.14269. Iters per second: 162.795 I1027 13:42:29.342669 2156917 PyTorchPredictorBenchLib.cpp:264] Mean milliseconds per iter: 6.14218, standard deviation: 0.0202164 0 FixToDtypeChanges, local ======================================== I1027 13:44:59.632668 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.11023. Iters per second: 163.66 I1027 13:45:24.894635 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.16308. Iters per second: 162.257 I1027 13:45:50.275280 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.17868. Iters per second: 161.847 I1027 13:46:15.637431 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.18688. Iters per second: 161.632 I1027 13:46:40.670816 2176333 PyTorchPredictorBenchLib.cpp:249] PyTorch run finished. Milliseconds per iter: 6.10549. Iters per second: 163.787 I1027 13:46:40.670863 2176333 PyTorchPredictorBenchLib.cpp:264] Mean milliseconds per iter: 6.14887, standard deviation: 0.03843706 ``` Reviewed By: hlu1 Differential Revision: D31972722 fbshipit-source-id: 7a445b325a29020b31dd2bd61e4171ecc2793b15	2022-01-04 22:34:49 -08:00
Mike Iovine	6a84449290	[SR] Fast path for VarStack on scalars (#70210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70210 Add a fast-path for `VarStack` nodes for when the inputs are scalars. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarStack` Reviewed By: hlu1 Differential Revision: D33177498 fbshipit-source-id: 922ab76a6808fbfdb8eb6091163a380344e38de6	2021-12-23 10:31:17 -08:00
Raghavan Raman	633f770c3c	[StaticRuntime] Add out-variant support for TensorExprDynamicGroup op (#69479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69479 This diff adds support for out-variant optimization for `TensorExprDynamicGroup` op, which will be used for TensorExpr based fusion in Static Runtime. ghstack-source-id: 146107008 Test Plan: ``` buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Completed accuracy test on inline_cvr model 294738512 v0. Results: ``` get 1012 prediction values get 1012 prediction values pyper_inference_e2e_local_replayer_test.out.132ea03c2 pyper_inference_e2e_local_replayer_test.out.1858bbeb0 max_error: 0 % total: 0 ``` Reviewed By: d1jang, mikeiovine Differential Revision: D32768463 fbshipit-source-id: a3e6c1ea9ff5f3b57eb89095aa79a6d426fbb52a	2021-12-22 00:30:22 -08:00
Mike Iovine	65f54bc000	[SR] Optimize VarStack (#68750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750 There was some room for optimization in static runtime's `prim::VarStack`: * Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor>` Skip the memory overlap check * Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`) Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D32596934 fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0	2021-12-20 11:46:11 -08:00
Scott Wolchok	66406ee0f7	[PyTorch][Static Runtime] Fix to() w/dtype bool (#69935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69935 Didn't realize that `AT_DISPATCH_ALL_TYPES` should really be called `AT_DISPATCH_MOST_TYPES`. ghstack-source-id: 145661358 Test Plan: Added test for dtype bool. Ran CMF local_ro net: before: ``` I1215 12:33:49.300174 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.966491. Iters per second: 1034.67 I1215 12:33:49.825570 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.94867. Iters per second: 1054.11 I1215 12:33:50.349246 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947926. Iters per second: 1054.93 I1215 12:33:50.870433 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.943779. Iters per second: 1059.57 I1215 12:33:51.393702 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947185. Iters per second: 1055.76 I1215 12:33:51.915666 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.945672. Iters per second: 1057.45 I1215 12:33:52.438475 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948407. Iters per second: 1054.4 I1215 12:33:52.965337 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95472. Iters per second: 1047.43 I1215 12:33:53.494563 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.967083. Iters per second: 1034.04 I1215 12:33:54.017879 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948945. Iters per second: 1053.8 I1215 12:33:54.017930 1606538 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.951888, standard deviation: 0.0083367 ``` after: ``` I1215 12:32:35.820874 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.999845. Iters per second: 1000.15 I1215 12:32:36.343147 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944363. Iters per second: 1058.91 I1215 12:32:36.863806 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.942542. Iters per second: 1060.96 I1215 12:32:37.385459 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944677. Iters per second: 1058.56 I1215 12:32:37.905436 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941135. Iters per second: 1062.55 I1215 12:32:38.424907 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.939748. Iters per second: 1064.11 I1215 12:32:38.944643 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941764. Iters per second: 1061.84 I1215 12:32:39.463791 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.938946. Iters per second: 1065.02 I1215 12:32:39.987567 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95437. Iters per second: 1047.81 I1215 12:32:40.511204 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.959139. Iters per second: 1042.6 I1215 12:32:40.511242 1594955 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.950653, standard deviation: 0.0184761 ``` Reviewed By: hlu1 Differential Revision: D33106675 fbshipit-source-id: 5bb581f8d0ed22ef08df1936dc8d67045e44e862	2021-12-15 15:26:56 -08:00
Mike Iovine	102684b252	[SR] Fix stack/concat bug (#68777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777 Fixed some cases where negative dimensions were not handled correctly * `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547). * concat also needs to wrap its dim Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Added new tests to cover this case Reviewed By: hlu1 Differential Revision: D32604623 fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf	2021-12-14 16:26:27 -08:00
Don Jang	c97dc9286d	Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h Test Plan: revert-hammer Differential Revision: D32780415 (`999e93e6a8`) Original commit changeset: 119b7aedbf56 fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8	2021-12-04 19:44:07 -08:00

1 2 3 4 5

225 Commits