pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mike Iovine	7b0650d5cf	Back out "[static-runtime] change the backend for permute_copy" (#89463 ) Summary: This permute copy change seems to be causing huge regressions on machines without AVX512. Revert to mitigate. This shouldn't be problematic since the improvement from changing it was super small anyways. Differential Revision: D41450088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89463 Approved by: https://github.com/hlu1	2022-11-22 06:26:10 +00:00
Huy Do	8cb5c5543e	Revive static_runtime_benchmark build and test (#87660 ) This build uses the wrong BUILD_ENVIRONMENT `pytorch-linux-focal-py3`, thus it hasn't been run for a long time (forgotten). The name was probably the old name of the build environment we used in the past. The convention today doesn't have the `pytorch-` prefix. There is a TODO for this: > TODO: this condition is never (BUILD_ENVIRONMENT doesn't start with pytorch-), need to fix this. This is done as part of [T131829540](https://www.internalfb.com/intern/tasks/?t=131829540), where we want `static_runtime_benchmark` build and test jobs to run in OSS CI to avoid breaking internal * I also fix some compiler warning errors `-Werror=sign-compare`, `-Werror,-Wunused-const-variable`, and gcc7 compatibility issue along the way because this hasn't been run for a long time. * Reviving this test also reveals a small bug in `PrepackWeights` test in `test_static_runtime.cc` added recently in https://github.com/pytorch/pytorch/pull/85289. The test refers to an internal ops and should only be run internally. This has been fixed by https://github.com/pytorch/pytorch/pull/87799 (To be merged) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87660 Approved by: https://github.com/malfet	2022-11-08 08:32:45 +00:00
Mike Iovine	dd43903fa9	[Static Runtime] Fix tensor_split sections overload (#88113 ) Summary: D40798763 broke this op. Unfortunately, it wasn't caught at land time due to the recent OSS Static Runtime test problems. The problem is C++ overload resolution. After D40798763, the int that we were passing to `at::native::tensor_split` was getting implicitly converted to `IntArrayRef`. Fix this by converting the int to a `SymInt` and calling the correct overload. Test Plan: ``` buck2 test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Tensor_Split --run-disabled ``` Differential Revision: D40862394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88113 Approved by: https://github.com/hlu1	2022-11-07 14:36:39 +00:00
Mike Iovine	23fe6c8ca1	[Static Runtime] Fix ReplaceWithMaybeCopy test in OSS (#88099 ) Summary: `ReplaceWithMaybeCopy` is guarded by `FBCODE_CAFFE` in `OptimizeGraph`. Run the pass manually to ensure it does the replacement. Test Plan: Existing tests Differential Revision: D40858743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88099 Approved by: https://github.com/huydhn	2022-11-01 09:58:26 +00:00
Mike Iovine	81c4049f4d	[Static Runtime] Move PrepackWeights to internal-only graph passes (#87799 ) Summary: The pass introduces an `fb::` operator and thus cannot be used in OSS. The test failure was not exposed because the Static Runtime tests have been disabled in OSS for a while. The Dev Infra folks encountered this failure when re-enabling the tests. Test Plan: Existing tests Differential Revision: D40724547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87799 Approved by: https://github.com/huydhn	2022-10-28 01:28:34 +00:00
Mike Iovine	ed7a8ab436	[Static Runtime] Make canEnableStaticRuntime examine sub-blocks (#87396 ) Summary: Someone was running into problems where 1) Static Runtime enablement would fail 2) We would try to fall back to the JIT interpreter after trying to create `StaticModule` 3) The fallback fails because Static Runtime mangled the graph. We don't want to prevent Static Runtime from mutating its input due to memory concerns. The intent of `canEnableStaticRuntime` is to catch issues in the module before Static Runtime messes with it. With this diff, `StaticModule` instantiation can be avoided by querying `canEnableStaticRuntime` and the issue is fixed. Test Plan: New unit test Differential Revision: D40564452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87396 Approved by: https://github.com/tenpercent	2022-10-26 14:34:29 +00:00
Max Podkorytov	4a168e9941	[static-runtime] run codegen (#87534 ) Summary: ``` buck run //caffe2/torch/fb/jit:gen_static_runtime_ops ``` Test Plan: CI Differential Revision: D40612521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87534 Approved by: https://github.com/mikeiovine	2022-10-25 23:48:16 +00:00
Mike Iovine	ddec1eea05	[Static Runtime] Block linalg_svdvals codegen & run codegen script (#85983 ) Summary: The test is causing issues: ``` terminate called after throwing an instance of 'std::runtime_error' what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): graph(%A: Tensor, %driver: str?): %bias: None = prim::Constant() %ret = aten::linalg_svdvals(%A, %driver) ~~~~ <--- HERE %cloned = aten::clone(%ret, %bias) return (%cloned) RuntimeError: torch.linalg.svd: keyword argument `driver=` is only supported on CUDA inputs with cuSOLVER backend. ``` Just block the op and re-run the codegen script to remove everything and update the generated ops. Test Plan: Existing tests Differential Revision: D39973860 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85983 Approved by: https://github.com/xuzhao9, https://github.com/tenpercent	2022-10-06 01:07:40 +00:00
Mike Iovine	7d8ee38a5c	[Static Runtime] Fix prim::If tuple corner case (#85446 ) Summary: We currently assume that a tuple output implies that the prim::If node returns multiple unpacked outputs, but this is not guaranteed to be the case. Add some logic to return the wrapped tuple if necessary Test Plan: New unit test Differential Revision: D39712050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85446 Approved by: https://github.com/tenpercent	2022-09-24 01:01:34 +00:00
Mike Iovine	63c1f2fef9	[Static Runtime] Fold linear prepack ops (#85289 ) Summary: Split `quantized_linear_unpacked_weight_v2` into `linear_prepack` and `quantized_linear` so that the prepacking operation may be eliminated by constant folding. Test Plan: Fixes a huge regression in an internal model: ``` Before 89.6141 ms. 99.0923%. fb::quantized_linear_unpacked_weight_v2 (12 nodes) After 0.806852 ms. 53.5365%. quantized::linear (12 nodes, out variant) (prepacking eliminated) ``` Differential Revision: D39622530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85289 Approved by: https://github.com/davidberard98	2022-09-22 20:23:07 +00:00
Mike Iovine	e4899764b2	[Static Runtime] Fix aten::index_put list conversions (#85298 ) Summary: Apparently static runtime's list construct return value is always a `GenericList`, so we cannot use the `toOptionalTensorList` method in the general case -- we must convert each item individually. Test Plan: New unit test Differential Revision: D39628979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85298 Approved by: https://github.com/tenpercent	2022-09-22 20:21:52 +00:00
Max Podkorytov	7f90606309	[static-runtime] update generator for the modified tests; re-run autogen script (#84437 ) Test Plan: CI Reviewed By: mikeiovine Differential Revision: D39183148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84437 Approved by: https://github.com/mikeiovine	2022-09-06 20:07:56 +00:00
Max Podkorytov	bf62ece536	[static-runtime] add schema checks to most of the ops where these checks are missing (#84163 ) Test Plan: existing unit tests; also fix some failing ones along the way Differential Revision: D39074902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84163 Approved by: https://github.com/mikeiovine	2022-09-01 17:21:22 +00:00
Mike Iovine	db7784e722	[Static Runtime] Schema checks for index_put (#84152 ) Summary: `index_put` can take a list of tensors, but Static Runtime always tries to convert its argument to a list of optional tensors. This was causing crashes for some users. Add some schema checks to prevent this, and add a new overload for the new case. Also, I found a clear bug in the JIT interpreter (mutating the argument when its not supposed to), so I fixed that too. Test Plan: New unit test Differential Revision: D39072214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84152 Approved by: https://github.com/tenpercent	2022-08-31 01:20:14 +00:00
Mike Iovine	09157c76c0	[Static Runtime] Add schema checks for aten::list (#83753 ) Summary: The previous implementation assumed that there was only one overload and unconditionally tried to convert its input into a string. Some users were running into crashes because of this. Added a new overload for the list overload and schema checks. Also, I managed to uncover another bug when writing tests for this case (yikes). Returning inputs didn't work because the input cleanup process would destroy the output. Extended `CreateOwnedRefsForSpecialIValues` to fix that. Test Plan: CI + new unit tests Differential Revision: D38870803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83753 Approved by: https://github.com/tenpercent, https://github.com/albanD	2022-08-22 13:42:47 +00:00
Max Podkorytov	68d2d7866d	[static-runtime] change the backend for permute_copy (#83532 ) Summary: Testing wrappable dims Differential Revision: D38717563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83532 Approved by: https://github.com/mikeiovine	2022-08-17 18:10:36 +00:00
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
Akshay Parashar	38169c2287	[Static Runtime] Fix precision error in test cases (#80935 ) Summary: - Test cases related to DeepAndWideSciptModel() was crashing at random due to precision issue - test cases related for precision: DeepWide, KWargsAPI_1, KWargsAPI_2, KWargsAPI_Optional, FusionPass - test failure was not observed always due to random input to the model (via torch::randn) - Increasing the absolute tolerance for test cases Differential Revision: D37639067 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80935 Approved by: https://github.com/mikeiovine	2022-07-06 16:31:18 +00:00
Hui Guo	a622e3e14d	[static runtime] Fix linalg_solve test case (#79971 ) Summary: Added the third bool argument in inalg_solve test case to remove the runtime error. Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D37324419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79971 Approved by: https://github.com/tenpercent	2022-06-22 16:31:25 +00:00
Max Podkorytov	bf75708ce4	[static-runtime] add nnc codegen for aten::div (#76903 ) Differential Revision: D36151087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76903 Approved by: https://github.com/mikeiovine	2022-06-22 05:47:44 +00:00
Hui Guo	0545c85f74	[static runtime] Add JIT prim ops: aten::cpu, aten::list, aten::numel, aten::__range_length (#79111 ) Summary: This adds the missing jit prim ops appear in the non ads models for c2->pt mitigation: aten::cpu, aten::list, aten::numel, aten::__range_length Test Plan: static runtime unit tests Differential Revision: D36984960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79111 Approved by: https://github.com/davidberard98	2022-06-18 16:38:58 +00:00
Hui Guo	aee9762a51	[static runtime] Disable unit test for linalg_svdvals (#79574 ) Summary: The test is throwing a jit alias analysis not supporting error. Disabling it for now. Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D37056032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79574 Approved by: https://github.com/mikeiovine	2022-06-15 20:35:19 +00:00
Hui Guo	8d7fcfa8f1	[static runtime] Add native ops: aten::index_put, aten::item, aten::tensor_split (#79065 ) Summary: This adds the pytorch operators that are currently missing in non-ads models from c2->pt mitigation: aten::index_put, aten::item, aten::tensor_split Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36984961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79065 Approved by: https://github.com/davidberard98	2022-06-15 19:15:34 +00:00
Akshay Parashar	28f87b9cf9	[Static Runtime] Fix aten::clone out variant (#78297 ) (#78322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297 Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details. Crashing test case: a = tensor(3,1) // strides = (1,1) B = tensor(3,2) // strides = (2,1) Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0) temp.clone() // crashe on copy_ due to memoryOverlap Fix: Disable the out variant for the expanded tensor. - Calls native clone instead of out variant for clone dealing with expanded tensors - Added test case for both clone variants (out and native clones) - Increased the tensor size for memory planner test case to trigger dynamic allocation Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36672180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322 Approved by: https://github.com/mikeiovine	2022-06-02 21:06:59 +00:00
Max Podkorytov	ebfc70f37a	[static-runtime] out variant for aten::mean (#78161 ) Summary: As subject Test Plan: Added unit tests Differential Revision: D36614633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78161 Approved by: https://github.com/mikeiovine	2022-06-02 20:56:42 +00:00
Max Podkorytov	2679755bdc	[static-runtime] out variant for aten::max (#78271 ) Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim Test Plan: Added a unit test Differential Revision: D36656378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271 Approved by: https://github.com/mikeiovine	2022-05-26 23:29:27 +00:00
Hui Guo	d12bf9fd75	[static_runtime] Add auto-generated view ops (#77106 ) Summary: This includes the generated view ops from D36258767. Test Plan: buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest Differential Revision: D36258968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77106 Approved by: https://github.com/alanwaketan, https://github.com/tenpercent	2022-05-26 03:13:59 +00:00
mikeiovine	56c23f5633	[SR] Out variant for embedding_bag_byte_unpack Pull Request resolved: https://github.com/pytorch/pytorch/pull/77661 Add an out variant and wrapper in static runtime. I just added the declaration with the others in `qembeddingbag.h` for now (rather than properly adding the out variant to the torch library). This can be fixed in a followup. Differential Revision: [D36449840](https://our.internmc.facebook.com/intern/diff/D36449840/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36449840/)! Approved by: https://github.com/tenpercent	2022-05-25 23:24:11 +00:00
mikeiovine	2ae3c59e4b	[SR] Remove linear/relu fusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/77620 Apparently, this is not implemented in fbgemm, so it's strictly worse than using NNC. Differential Revision: [D36431811](https://our.internmc.facebook.com/intern/diff/D36431811/) Approved by: https://github.com/hlu1	2022-05-23 21:46:27 +00:00
Hao Lu	c60d2ef4eb	[StaticRuntime] Replace Permute with copy version only when it's followed by reshape or flatten (#77832 ) Reviewed By: mikeiovine Differential Revision: D36466622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77832 Approved by: https://github.com/mikeiovine	2022-05-20 03:14:01 +00:00
mikeiovine	02713221e3	[SR] Fuse clamp/nan_to_num Pull Request resolved: https://github.com/pytorch/pytorch/pull/77094 Fuse `clamp` and `nan_to_num` in an NNC kernel. This leads to a big speed up on many models. We can avoid comparisons since clamp potentially gets rid of all of the `inf`s in the input tensor. Differential Revision: [D36220967](https://our.internmc.facebook.com/intern/diff/D36220967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36220967/)! Approved by: https://github.com/navahgar	2022-05-10 23:33:59 +00:00
Mike Iovine	849984a2cd	[SR] Sigmoid out variant calls fast_sigmoid (#75661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75661 `fast_sigmoid` is a variant of sigmoid in NNC that is implemented in terms of `fast_tanh` (which is a fast rational function approximation). ghstack-source-id: 155604086 Reviewed By: navahgar, hlu1 Differential Revision: D35481390 fbshipit-source-id: 1d64b5c375539f3b2461a1f3d9b86cd696eae7a1 (cherry picked from commit 8106c2512b8d7b373cb6545a43c3e8fc04805c4b)	2022-05-06 00:14:30 +00:00
Mike Iovine	1fed6b7559	[SR] Eliminate extra permutes around softmax calls (#76391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76391 I've seen this pattern in many important internal models: ``` x = torch.permute(a, [0, 2, 1]) y = torch.softmax(x, 2) z = torch.permute(y, [0, 2, 1]) ``` This is equivalent to ``` z = torch.softmax(x, 1) ``` The `permute` ops can degrade performance, especially if copy variants are on. Add another pattern to our `EliminateExtraPermuteOpsPass` to handle this. ghstack-source-id: 155466506 Test Plan: New unit tests Reviewed By: navahgar, huiguoo Differential Revision: D35938289 fbshipit-source-id: 398b5528077b0b3f1c6fc5544e483803e96d68e9 (cherry picked from commit d742abd094d1fef23ca6a34703d97a6da2d14bd1)	2022-05-04 23:08:49 +00:00
Mike Iovine	cac2733af1	[SR] Codegen for aten::clamp (#76340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76340 NNC kernel for `clamp` scalar case ghstack-source-id: 155466507 Reviewed By: navahgar, huiguoo Differential Revision: D35904019 fbshipit-source-id: e4115757f7e2cbdf364b88be3f599dfc3028750f (cherry picked from commit bdc4b918bc5a14490f46c79793f764b28c18388f)	2022-05-04 23:08:49 +00:00
Hui Guo	bcddd4ab3e	[Static Runtime] Add auto generated unstructured ops (#76398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76398 This diff adds the large files that include the newly generated ops from D34913736. Refer to the base diff for more details. Test Plan: buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine, tenpercent Differential Revision: D35945633 fbshipit-source-id: 53497bd5c490a57ea1521837762f740deb42bfd8 (cherry picked from commit e0fbdcb0bf09f5c192430f95f450c0a946c80074)	2022-05-04 19:34:19 +00:00
Mike Iovine	fc64dbdc01	[SR] Fuse quantized linear/relu (#75775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75775 fbgemm kernels already implement the fused kernel, no reason not to use it ghstack-source-id: 155450342 Test Plan: New unit tests Reviewed By: navahgar Differential Revision: D35633297 fbshipit-source-id: a744a33a65ce7dbb9ce8900dbe091b6d56dd4e48 (cherry picked from commit b1361b349862715aa17e6318c5e658cd6401a464)	2022-05-04 19:01:14 +00:00
Mike Iovine	b02b3f25db	[SR] Quick hack to eliminate no-op slice (#75774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75774 `list[0:]` is a no-op. This should really be eliminated on the modeling side, implement as a graph pass for now until we can get this into prod models. Test Plan: New unit tests Reviewed By: navahgar Differential Revision: D35632947 fbshipit-source-id: 0c564193c35039130e99172e0185e124ea24f62d (cherry picked from commit e01d5273185e39a563c7acb15662d9c1549d4b58)	2022-05-03 19:29:46 +00:00
Mike Iovine	3fa77fa51a	[SR] Fix quantized linear tests not managing outputs (#75776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75776 The output was returned directly instead of a clone, so the output of the relevant op would not be managed. ghstack-source-id: 154935103 Test Plan: CI Reviewed By: navahgar Differential Revision: D35633469 fbshipit-source-id: 7b08b7368e0349a12abf8802a4c625ffecdc5abb (cherry picked from commit 24bed9ba4da39cff7f3b40f5e49dfded2552b373)	2022-04-27 16:38:54 +00:00
Ansha Yu	ee636e2fd1	[sr] remove max_indices argument of embedding_bag when unncessary (#75993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75993 Strobelight shows copy_ in embedding_bag taking up a lot of time in adfinder_story_post_ad_session_exit_model 334827604_0 {F723683014} More details in https://fb.quip.com/MKumAjz1YD4 (`1f47a80e88`)a#temp:C:FPD3 (`ecd5567980`)e5a0871ae5d481286b511ef7 The last 3 outputs of embedding_bag are unused in the graph: P495814049. * max_indices output isn't necessary for the main output, so remove it when it's not used in the graph. * offset2bag is used as an intermediate to calculate the main output, so we don't remove this output even though it's unused in the graph. * bag_size is used as an intermediate to calculate the main output for MODE_MEAN, so we don't remove this for now. Test Plan: `./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 334827604 0 /data/users/ansha/tmp/ads_tail sr_only` Inputs uploaded to `/mnt/persistent-public/ansha/ads_tail/334827604` Before: I0414 10:53:12.261133 1070948 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.121318. Iters per second: 8242.78 0.11156 ms. 99.0457%. aten::embedding_bag (52 nodes, out variant) After: I0418 13:05:10.837378 2354604 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.0881273. Iters per second: 11347.2 0.0789221 ms. 98.7096%. static_runtime::embedding_bag (52 nodes, out variant) * Ads prod canary: https://www.internalfb.com/intern/ads/canary/443002539593035806/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_inline_cvr_post_imp -a D35726594` https://www.internalfb.com/intern/servicelab/602875732/ * 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_10x_ctr_mbl_feed_non_mimo -a D35726594` https://www.internalfb.com/intern/servicelab/1002874745/ Reviewed By: mikeiovine Differential Revision: D35726594 fbshipit-source-id: 3b71a0822657bf7a23ce37ca899baef9997b011a (cherry picked from commit fd5e3098c047a1e7d4348e1c97341eecb892536e)	2022-04-22 15:36:35 +00:00
Mike Iovine	b6a4234090	[SR] Fix broken unit test build (#76111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76111 https://github.com/pytorch/pytorch/pull/68640 broke our build by porting `cat` structured kernels, not sure how CI didn't catch this ghstack-source-id: 154335722 Test Plan: CI Reviewed By: navahgar, ajyu Differential Revision: D35780296 fbshipit-source-id: 0a262eb06a8d619227e5db10b6a775bf0b2e17c1 (cherry picked from commit aea6fbf9365391011df5211164e3978075d7a5cb)	2022-04-20 18:36:31 +00:00
mikeiovine	98b4a4100d	[SR] Add a copy variant for fused_split_and_squeeze Pull Request resolved: https://github.com/pytorch/pytorch/pull/75660 The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)! Approved by: https://github.com/hlu1	2022-04-13 20:02:01 +00:00
Mike Iovine	2f98fa9147	[SR] Do not manage tensors that escape scope via container (#74966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74966 It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. ghstack-source-id: 153288188 Test Plan: New unit tests cover the problematic case Reviewed By: navahgar Differential Revision: D35257087 fbshipit-source-id: 853a761dffe51f2c70720759664dd8dfcd56d1d7 (cherry picked from commit 2c7f519354041975f33626eab6b7f16c2494bbf8)	2022-04-07 19:57:57 +00:00
Mike Iovine	4055d1f653	[SR] Fix StaticRuntime move ctor (#74927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74927 The move ctor was broken because `BlockRunner` stores a reference to `values_`. When moving runtime instances, the pointer to the root block would be moved, but the reference inside it would not be updated. Pass `BlockRunner` a raw pointer to the heap-allocated IValues instead to avoid this issue. ghstack-source-id: 153168602 Test Plan: New unit test/CI Reviewed By: navahgar Differential Revision: D35228467 fbshipit-source-id: 04e198b39f898b82677a0e41e1cdf00c2b0c09f3 (cherry picked from commit 03e2c591ac3a907d68025eae9500ed7226dec17e)	2022-04-07 02:16:37 +00:00
Don Jang	85e163c56b	[Static Runtime] Fix a bug that `aten::full_like` reuses a tensor that does not match arguments (#74255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74255 This change fixes a bug that `aten::full_like` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full_like` are dynamically changed. Test Plan: - Enhanced `StaticRuntime.FullLike` to cover the modified code path. Reviewed By: mikeiovine Differential Revision: D34863639 fbshipit-source-id: ca6d4ee3c039e263cc3a4f643d949cea59381608 (cherry picked from commit ae7db0af5e7d95d866027abc968afcb162fd2ef8)	2022-04-05 22:30:41 +00:00
Raghavan Raman	60bda4d06b	[Static Runtime] Fix handling relu in quantized linear relu dynamic op Summary: The implementation of `PackedLinearWeightFp16::apply_dynamic_impl` [here](https://www.internalfb.com/code/fbsource/[b1ef7c31f022]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp?lines=393) does not handle `relu`. It completely ignores the `ReluFused` boolean template parameter. At this point, callers of that function handle `relu` explicitly. While the correct thing to do would be to handle the `ReluFused` parameter in that implementation, it is not clear if that semantics is being followed in this code. So, we are handling this in SR's out-variant implementation, until the owner fixes that issue. This issue resulted in incorrect results when Static Runtime was enabled for the MRS video model. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=StaticRuntime.QuantizedLinearReluDynamicFp16 ``` Reviewed By: mikeiovine Differential Revision: D35366309 fbshipit-source-id: e60126e3590d52681ceaee5583b81c4c0b5404d9 (cherry picked from commit cabeb96a792339e7dbfd16cb51a3ac9039812137)	2022-04-04 22:16:22 +00:00
Max Podkorytov	11c412a8ec	[static-runtime] optimize empty if blocks at runtime (#74987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74987 Add specializations to `prim::If` operator at runtime to save resources when some of subblocks are empty Test Plan: `buck build //caffe2:torch-cpp-cpu` `buck test //caffe2/benchmarks/static_runtime/...` Add unit test: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.EmptyIfBlock` Reviewed By: mikeiovine Differential Revision: D35262952 fbshipit-source-id: 324f88471f33f035f4d8a9b212716530d8e59df2 (cherry picked from commit 2db1b1a6833b1376fa376f54791effc8e12fb77f)	2022-04-01 05:43:33 +00:00
Mike Iovine	2ca66ffb7d	[SR] Force split_and_squeeze usage via graph transformation (#74274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74274 Reviewed By: navahgar Differential Revision: D34913889 fbshipit-source-id: 655d3f1e5f4c027cb94758b74826a4b4882e9458 (cherry picked from commit bc94d30b69888ca6633a27090a3b87a08919231a)	2022-03-29 19:13:40 +00:00
Mike Iovine	3f37337ed0	[SR] Native implementation for reshape_as (#74585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74585 Native static runtime for `aten::reshape_as` ghstack-source-id: 152340038 Test Plan: New unit test Reviewed By: hlu1 Differential Revision: D35060895 fbshipit-source-id: c4e6f8a04c7df3821c7e654bfaf584e5a72ea701 (cherry picked from commit 6fa596cd866a024b6653239e0e30ddad42de242f)	2022-03-28 17:02:14 +00:00
Mike Iovine	9f2344aa40	[SR] Native implementation for select (#74568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74568 Native static runtime implementation for `aten::select(Tensor, int, int)` overload ghstack-source-id: 152340037 Test Plan: New unit test Reviewed By: hlu1 Differential Revision: D35053900 fbshipit-source-id: c315d4202a4dfca3360325547af805aea33ecc9f (cherry picked from commit 8683f214dbd8c081365bad727007bbff969b64d0)	2022-03-28 17:02:14 +00:00
Mike Iovine	facdbe6d72	[SR] Native implementation for IntImplicit (#74562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74562 Add a native implementation for `aten::IntImplicit`, which is similar to `aten::Int` except for a few extra checks it must do ghstack-source-id: 152340039 Test Plan: New unit tests Reviewed By: hlu1 Differential Revision: D35052997 fbshipit-source-id: cb2f0faf7c62382e3f13750d8e1280c49c6b9e42 (cherry picked from commit 359c7493f8deaeccebc27e1b6e6e9777850010c1)	2022-03-28 17:02:14 +00:00

1 2 3 4 5

241 Commits