pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Harut Movsisyan	1f16c22dc8	[Static Runtime] Implement aten::cumsum out variant (#64159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30622819 fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736	2021-08-30 16:18:22 -07:00
Harut Movsisyan	e24c3644d8	[Static Runtime] aten::cat out version when it is not being replaced by prim::VarConcat (#64157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157 UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR. Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30598574 fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a	2021-08-30 09:42:38 -07:00
Harut Movsisyan	8af1407eab	[Static Runtime] Out version for torch.linalg.norm (#64070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30595816 fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563	2021-08-29 21:00:11 -07:00
Mike Iovine	07c5cb8c48	[Static Runtime] Optimize memory planner initialization (#64101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101 Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference. There are two optimizations in this diff: * Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType` * Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs. Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: movefast1990 Differential Revision: D30595579 fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2	2021-08-27 17:40:43 -07:00
Don Jang	9f1f22b9bc	[Static Runtime] Add out variant of quantized::embedding_bag_byte_prepack (#64081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64081 This change add an out variant of `quantized::embedding_bag_byte_prepack`. Test Plan: - Added `ShapeInferenceTest.QEmbeddingBagByteUnpack`. - Observed ``` V0824 13:38:49.723708 1322143 impl.cpp:1394] Switch to out variant for node: %2 : Tensor = quantized::embedding_bag_byte_prepack(%input) ``` Reviewed By: hlu1 Differential Revision: D30504216 fbshipit-source-id: 1d9d428e77a15bcc7da373d65e7ffabaf9c6caf2	2021-08-27 10:53:23 -07:00
Harut Movsisyan	f2c47cf4db	[Static Runtime] Out version for fmod (#64046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64046 Test Plan: Confirm out variant is used: ``` > //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 V0826 23:31:30.321382 193428 impl.cpp:1395] Switch to out variant for node: %4 : Tensor = aten::fmod(%a.1, %b.1) ``` Reviewed By: mikeiovine Differential Revision: D30581228 fbshipit-source-id: dfab9a16ff8afd40b29338037769f938f154bf74	2021-08-27 03:05:06 -07:00
Don Jang	c90b3cb1da	[Static Runtime] Manage temporary Tensors for aten::layer_norm (#64078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078 This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime. Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it: ``` at::Tensor mean = create_empty_from({M}, X); at::Tensor rstd = create_empty_from({M}, X); ``` that the static runtime misses an opportunity to manage. This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors. Test Plan: - Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated. - Confirmed that the new op gets activated during testing: ``` V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3) ``` Reviewed By: hlu1 Differential Revision: D30486475 fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d	2021-08-27 02:44:43 -07:00
Hao Lu	3c3bba4169	[Static Runtime] Use F14FastMap/F14FastSet (#63999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63999 Use folly::F14FastMap/F14FastSet instead of std::unordered_map/unordered_set in the Static Runtime code base. folly::F14FastMap/F14FastSet implements the same APIs as std::unordered_map/unordered_set but faster. For details see https://github.com/facebook/folly/blob/master/folly/container/F14.md Reviewed By: d1jang Differential Revision: D30566149 fbshipit-source-id: 20a7fa2519e4dde96fb3fc61ef6c92bf6d759383	2021-08-27 01:40:41 -07:00
Ansha Yu	3f1c809470	[static runtime] port c2 argmin kernel (#63632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63632 Local benchmarking with 1 input repeated 10k iter on 290331537_4 local net. Reduces argmin runtime by about 80% and and local net execution by about ~0.71-0.77ms. Before: ``` I0826 17:25:53.972786 1104614 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 7.37599. Iters per second: 135.57 ``` ``` Static runtime ms per iter: 8.22086. Iters per second: 121.642 Time per node type: 4.13527 ms. 50.9157%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.868506 ms. 10.6935%. aten::argmin (1 nodes, out variant) ... ``` After: ``` I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987 ``` ``` Static runtime ms per iter: 7.68172. Iters per second: 130.179 Time per node type: 4.1452 ms. 54.0612%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.656778 ms. 8.56562%. fb::quantized_linear (8 nodes) 0.488229 ms. 6.36741%. static_runtime::to_copy (827 nodes, out variant) 0.372678 ms. 4.86042%. aten::argmin (1 nodes, out variant) ...Time per node type: 3.39387 ms. 53.5467%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.636216 ms. 10.0379%. fb::quantized_linear (8 nodes, out variant) 0.410535 ms. 6.47721%. fb::clip_ranges_to_gather_to_offsets (304 nodes, out variant) 0.212721 ms. 3.3562%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (157 nodes, out variant) 0.173736 ms. 2.74111%. aten::matmul (1 nodes, out variant) 0.150514 ms. 2.37474%. aten::argmin (1 nodes, out variant) ``` P447422384 Test Plan: Test with local replayer sending traffic to `ansha_perf_test_0819.test`, and compare outputs to jit interpreter. Start compute tier: ``` RUN_UUID=ansha_perf_test_0819.test.storage JOB_EXPIRE_TIME=864000 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=405 PREDICTOR_TYPE=CPU ADDITIONAL_FLAGS="--enable_disagg_file_split=true --enable_adx=false --load_remote_file_locally=true --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_cpu_pyper SMC_TIER_NAME=sigrid.predictor.perf.ansha_per_test_0819.test.storage CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t6 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` Start nnpi tier: ``` RUN_UUID=ansha_perf_test_0819.test JOB_EXPIRE_TIME=247200 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=343 PREDICTOR_TYPE=NNPI_TWSHARED ADDITIONAL_FLAGS="--torch_glow_min_fusion_group_size=30 --pytorch_storage_tier_replayer_sr_connection_options=overall_timeout:1000000,processing_timeout:1000000 --predictor_storage_smc_tier=sigrid.predictor.perf.ansha_perf_test_0819.test.storage --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 SMC_TIER_NAME=sigrid.predictor.perf.ansha_perf_test_0819.test CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t17 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` ```buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_Argmin --print-passing-details``` Compared outputs to jit interpreter to check for no differences greater than 1e-3 (with nnc on) https://www.internalfb.com/intern/diff/view-version/136824794/ Reviewed By: hlu1 Differential Revision: D30445635 fbshipit-source-id: 048de8867ac72f764132295d1ebfa843cde2fa27	2021-08-26 23:19:19 -07:00
Don Jang	fbe7133b58	[Static Runtime] Disable out variant of aten::clone (#63980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980 The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed. Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121 Test Plan: N/A Reviewed By: hlu1 Differential Revision: D30544149 fbshipit-source-id: facb334d67473f622b36862fbdb2633358556fdf	2021-08-26 08:10:13 -07:00
Raghavan Raman	dde07cad6f	[Static Runtime] Added a variable for clamp in the NNC code for Logit. (#63839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839 Replaced the use of a constant for clamp in the NNC code for Logit with a variable. This makes it easier to enable caching for Logit. There is no performance difference with this change, as shown in the micro-benchmarks below. ``` Logit NNC Benchmark Time (ns) const-clamp var-clamp logit_nnc_sleef/64 550 543 logit_nnc_sleef/512 3514 3517 logit_nnc_sleef/8192 85537 82900 logit_nnc_sleef/32768 347635 337016 logit_nnc_fast/64 173 167 logit_nnc_fast/512 829 866 logit_nnc_fast/8192 13286 13069 logit_nnc_fast/32768 51116 53429 logit_nnc_vml/64 146 164 logit_nnc_vml/512 773 783 logit_nnc_vml/8192 11556 11563 logit_nnc_vml/32768 44815 46720 ``` Test Plan: SR unit tests and the inline_cvr model. Reviewed By: bertmaher Differential Revision: D30405466 fbshipit-source-id: adb891fdae5746439931ce5f43165291fec08f52	2021-08-25 11:19:55 -07:00
Raghavan Raman	a2399a76e1	[Static Runtime] Moved NNC operator definitions to separate files. (#63838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838 Refactored NNC operator definitions code into separate files. Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before. Test Plan: Build and ran Static Runtime tests. Reviewed By: hlu1 Differential Revision: D30405467 fbshipit-source-id: 606ef852bb820d5e23a0f8af1bf5dc122e90bceb	2021-08-25 11:18:32 -07:00
Mike Iovine	7774a4e95b	[Static Runtime] Implement prim::VarStack out variant (#63579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579 Provide a static runtime out variant implementation for the new op introduced in D30426232 (`1385f9fb12`). Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack` Reviewed By: navahgar Differential Revision: D30410525 fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8	2021-08-24 09:44:29 -07:00
Mike Iovine	1385f9fb12	[JIT] Add variadic stack op (#63578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578 Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation. Most of the implementation/tests are the same as `prim::VarConcat`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt` Reviewed By: navahgar Differential Revision: D30426232 fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce	2021-08-24 08:20:54 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Don Jang	84890aae35	[Static Runtime] Add an out variant op for aten::abs (#63675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675 This change adds an out variant implementation for `aten::abs`. Test Plan: - Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)` - Perf impact: TBD Reviewed By: hlu1 Differential Revision: D30461317 fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191	2021-08-23 16:25:10 -07:00
Hao Lu	b2a601ffe5	[Static Runtime] Implement out variant for fb::quantized_linear (#63635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635 Reviewed By: ajyu Differential Revision: D30446234 fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030	2021-08-20 21:42:22 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Yukio Siraichi	32b6104f37	Port `norm` kernel to structured kernels. (#62711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62711 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30109866 Pulled By: ezyang fbshipit-source-id: 894c9496894d059c7690a174b75bbd4db7ed6016	2021-08-13 08:27:48 -07:00
Raghavan Raman	8b54b14f92	[Static Runtime] Added a cache for NNC generated code across different calls to the same ops (#62921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921 Added a cache for NNC generated code across different calls to the same ops. Before this diff: ``` ProcessedNode time 13402.9 ms Static Module initialization took 30964.8 ms ``` After this diff: ``` ProcessedNode time 85.4195 ms Static Module initialization took 4348.42 ms ``` There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model. Test Plan: Tested inline_cvr model Reviewed By: hlu1 Differential Revision: D30104017 fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7	2021-08-09 09:30:07 -07:00
Yukio Siraichi	4c4c5b14e4	Port `sum.dim_IntList` kernel to structured kernels. (#61642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61642 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29783865 Pulled By: ezyang fbshipit-source-id: 375d4cd5f915812108367601a610a428762e606d	2021-08-09 08:46:16 -07:00
Hao Lu	a27a0b1ef5	[SR] Disable NNC temporarily (#62746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62746 Disable NNC temporarily until a code cache is implemented to reduce the compilation time. Reviewed By: ajyu Differential Revision: D30080326 fbshipit-source-id: ef8bb3ac3a6947614f4a03a3d52774b6933d3ea8	2021-08-04 17:33:07 -07:00
Raghavan Raman	7b6d569a2b	[jit] Renamed prim::Concat as prim::VarConcat (#61983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983 Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29828830 Pulled By: navahgar fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee	2021-07-29 10:28:59 -07:00
Don Jang	68efa186cc	[static runtime] Implement aten::full (#62227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227 Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path. Reviewed By: hlu1 Differential Revision: D29923649 fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3	2021-07-28 09:50:27 -07:00
Mike Iovine	e1bee3eb30	[Static Runtime] Add missing unit tests for static runtime ops (#62238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238 Added tests for the following ops: * `aten::mul` * `aten::nan_to_num` * `aten::stack` * `aten::relu` * `aten::tanh` Reviewed By: hlu1 Differential Revision: D29914217 fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340	2021-07-27 14:12:21 -07:00
Mike Iovine	79eb8bb299	[Static Runtime] Enforce proper output dtype for many ops (re-land) (#62267 ) Summary: Re-land of D29935444 We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267 Reviewed By: ejguan Differential Revision: D29937253 Pulled By: malfet fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967	2021-07-27 08:54:09 -07:00
Erjia Guan	a3be2ecc3a	Revert D29887367: [Static Runtime] Enforce proper output dtype for many ops Test Plan: revert-hammer Differential Revision: D29887367 (`f4136c5efc`) Original commit changeset: cef04bfa52ec fbshipit-source-id: 32e89f2b6381930559dd746b535904c3e90fd52b	2021-07-27 07:29:09 -07:00
Mike Iovine	f4136c5efc	[Static Runtime] Enforce proper output dtype for many ops Summary: We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29887367 fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c	2021-07-26 13:27:02 -07:00
Hao Lu	78f7d8ccfa	[Static Runtime] Remove wrappers for aten::cat (#62067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067 The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (`ae58a4c45d`) . Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1. Reviewed By: navahgar, mikeiovine Differential Revision: D29864600 fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2	2021-07-23 20:33:41 -07:00
Meghan Lele	1d2ea76afb	`clamp`: port to structured kernel (#61361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361 This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag. For more information, see #55070. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29821533 Pulled By: SplitInfinity fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99	2021-07-23 02:02:07 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Raghavan Raman	ae58a4c45d	[Static Runtime] Added a variadic cat operator (#61302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29565344 Pulled By: navahgar fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248	2021-07-21 15:58:20 -07:00
Mike Iovine	28150fd0c8	[static_runtime] Implement aten::linear (#61595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595 Add out variant wrapper for `aten::linear` in the static runtime Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29684236 fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df	2021-07-16 08:55:43 -07:00
Hao Lu	a07b08136f	[Static Runtime] Check unsupported up when enabling static runtime (#61613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613 Reviewed By: ajyu, movefast1990 Differential Revision: D29663466 fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750	2021-07-14 02:13:51 -07:00
Hao Lu	ccd0977060	[Static Runtime] Support prim::GetAttr/SetAttr (#61505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505 The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module. Reviewed By: ajyu Differential Revision: D29350173 fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb	2021-07-10 14:06:06 -07:00
Don Jang	a74516d699	[static runtime] implement aten::log (#61393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393 Test Plan: Added `StaticRuntime.IndividualOps_Log` ``` ... [ RUN ] StaticRuntime.IndividualOps_Log V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1) V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29511622 fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517	2021-07-08 18:25:35 -07:00
Don Jang	c2b0af2560	[static runtime] Implement aten::sign (#61154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154 Test Plan: Added `StaticRuntime.IndividualOps_Sign` ``` [djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 ... [ RUN ] StaticRuntime.IndividualOps_Sign V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1) V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29518603 fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a	2021-07-07 12:29:25 -07:00
Mike Guo	6ecc1a4c4f	Make pytorch clang-tidy clean (#60649 ) Summary: This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master. I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver): ```bash python3 setup.py develop # Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options python3 tools/clang_tidy.py \ -j \ -s \ -k \ -v \ --paths torch/csrc/ \ -g"-torch/csrc/jit/passes/onnx/helper.cpp" \ -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \ -g"-torch/csrc/jit/serialization/onnx.cpp" \ -g"-torch/csrc/jit/serialization/export.cpp" \ -g"-torch/csrc/jit/serialization/import.cpp" \ -g"-torch/csrc/jit/serialization/import_legacy.cpp" \ -g"-torch/csrc/onnx/init.cpp" \ -g"-torch/csrc/cuda/nccl." \ -g"-torch/csrc/cuda/python_nccl.cpp" \ -g"-torch/csrc/autograd/FunctionsManual.cpp" \ -g"-torch/csrc/generic/.cpp" \ -g"-torch/csrc/jit/codegen/cuda/runtime/*" \ -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \ -g"-torch/csrc/deploy/interpreter/interpreter.h" \ -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \ -g"-torch/csrc/deploy/interpreter/test_main.cpp" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649 Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors. Reviewed By: walterddr, janeyx99 Differential Revision: D29504258 Pulled By: 1ntEgr8 fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e	2021-07-01 12:21:07 -07:00
Hao Lu	46595a9623	[Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090 Reviewed By: ajyu Differential Revision: D29479860 fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b	2021-07-01 00:01:37 -07:00
Yukio Siraichi	b099f5429c	Port `argmin` kernel to structured kernels. (#60364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29265855 Pulled By: ezyang fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec	2021-06-29 14:16:59 -07:00
Bert Maher	ddb1f293b6	Fix the NNC-disabled path in static runtime for perf comparisons Summary: The path which has NNC/LLVM disabled still constructs a tensor expression, even though `supports()` will always return false, so a `KernelScope` is necessary to manage those memory allocations. I guess we could avoid building the TEs at all in this case, but it's pretty clean this way. Test Plan: ``` scripts/bertrand/static_runtime/run.sh ``` Reviewed By: hlu1 Differential Revision: D29415909 fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30	2021-06-28 15:39:07 -07:00
Hao Lu	1e31d26b1d	[Static Runtime] Fix bugs in static_runtime::to_copy (#60503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503 Fixed a few issues in the static_runtime::to_copy impl: - fixed a bug with memory_format - copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit. - fix the schema in the `ReplaceWithCopy` pass - add registration of `static_runtime::to_copy.other` Add more unit tests: - test dynamic shapes - test strided input tensor to `aten::to` - test alias case (same input/output) - test `to.other` Reviewed By: ajyu Differential Revision: D26838933 fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522	2021-06-23 19:57:17 -07:00
Ansha Yu	0baad214b0	[static runtime][fix] resize to the input tensor size for full_like (#60229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229 Fix bug where we did not resize to the input tensor size, causing the output to be incorrect Test Plan: Test on replayer, rebased on D29217781, with model 278203319_26. Verify with jit outputs (D28583950) `./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1` Reviewed By: hlu1, movefast1990 Differential Revision: D29218918 fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409	2021-06-18 09:56:25 -07:00
Brian Hirsh	6b5e77904f	Revert D29104396: Port `argmin` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104396 (`226d745a0b`) Original commit changeset: 39c59bcc0446 fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b	2021-06-17 10:31:06 -07:00
Yukio Siraichi	226d745a0b	Port `argmin` kernel to structured kernels. (#59938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104396 Pulled By: ezyang fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7	2021-06-17 08:18:13 -07:00
Hao Lu	eda2ddb5b0	[ATen] Fix aten::to schema (#60001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001 Fix the aten::to schema to reflect that the output may alias input. Test Plan: Added new unit tests. Reviewed By: ezyang Differential Revision: D29121620 fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c	2021-06-15 20:04:20 -07:00
Hao Lu	cbd1e8c335	[Static Runtime] Fix bug in aten::to (#59995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59995 Reviewed By: ajyu Differential Revision: D29083106 fbshipit-source-id: 687ffb121af2716d606c145474942650a2d9ac7e	2021-06-14 22:54:43 -07:00
Hao Lu	2112074f25	[Static Runtime] Add schema check to several aten ops (#59603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59603 D28698997 (`10345010f7`) was reverted because I forgot to replace the ``` VLOG(1) << "Found schema mismatch"; n->schema().dump(); ``` block in `aten::clamp_min` with `LogAndDumpSchema(n)` and that led to the bazel build to fail. I don't know why it makes the bazel build though. Test Plan: OSS CI. Reviewed By: ajyu Differential Revision: D28950177 fbshipit-source-id: 9bb1c6619e6b68415a3349f04933c2fcd24cc9a2	2021-06-10 23:39:00 -07:00
Rong Rong (AI Infra)	91eb831422	Revert D28698997: [Static Runtime] Add schema check to aten ops Test Plan: revert-hammer Differential Revision: D28698997 (`10345010f7`) Original commit changeset: 232fc60c0321 fbshipit-source-id: e351df62779fea85b7afe5160d3c40c4e7cee4ed	2021-06-05 07:48:49 -07:00

1 2 3 4

153 Commits