pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	36871622f1	[2/N] Mark unused parameters in C++ code (#165121 ) This is follow-up of #164912 to mark unused C++ parameters to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121 Approved by: https://github.com/Skylion007	2025-10-15 03:04:39 +00:00
Jeddie Ji	20ec61a02f	[BE] fix lint errors caused by const SROpFunctor fn (#154552 ) Summary: Remove const quaiflier from SR suggsted from CLANGTIDY. Test Plan: arc lint -a -e extra --take CLANGTIDY caffe2/torch/fb/sparsenn/cpu_operators/to_dense_representation_cpu.cpp Reviewed By: henryoier Differential Revision: D75534056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154552 Approved by: https://github.com/Skylion007	2025-05-29 19:40:08 +00:00
cyy	45efa1aaa8	[3/N] Use internal linkage in C++ files (#151297 ) Follows #151070. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151297 Approved by: https://github.com/Skylion007	2025-05-05 17:48:39 +00:00
cyy	419a7e197d	[6/N] Fix Wextra-semi warning (#139605 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139605 Approved by: https://github.com/ezyang	2024-11-04 13:43:16 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
Scott Wolchok	cc798f1a4f	[PyTorch] add c10/util/FbcodeMaps.h (#96359 ) Allow us to use folly maps in fbcode and std maps for compatibility in OSS, extending what static runtime is already doing. Differential Revision: [D43926670](https://our.internmc.facebook.com/intern/diff/D43926670/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96359 Approved by: https://github.com/ezyang	2023-03-10 02:18:16 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Max Podkorytov	bf62ece536	[static-runtime] add schema checks to most of the ops where these checks are missing (#84163 ) Test Plan: existing unit tests; also fix some failing ones along the way Differential Revision: D39074902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84163 Approved by: https://github.com/mikeiovine	2022-09-01 17:21:22 +00:00
Mike Iovine	9e32cdeda6	[SR] Use at::DimVector in reshape_copy_out (#76473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76473 Avoid some extra heap allocations by using DimVector ghstack-source-id: 155569314 Test Plan: Existing unit tests Reviewed By: navahgar, huiguoo Differential Revision: D35972439 fbshipit-source-id: 971998d6bcaaf9bb598772f1e2ca6b13f29f92a4 (cherry picked from commit f2b70c38fffe6355cd8b2f0eb36f299c0d50e5d8)	2022-05-05 17:31:54 +00:00
Don Jang	c62de0ac15	[Static Runtime] [Code Cleanup] Use `SROperator` for operators' function type (#73450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73450 This change uses `SROperator` for operators' function type Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D34483246 fbshipit-source-id: ed544bb91b676ed08983dc8dc78cedd0f77d499f (cherry picked from commit eb9de3ad8de043990c02f30ffa48a29c8e5e81f2)	2022-03-01 02:30:48 +00:00
Scott Wolchok	bf82d2012e	[PyTorch] Add IValue::toDimVector & mostly replace toIntVector with it (#71247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71247 Most uses of toIntVector() were for a Tensor shape. We have DimVector to avoid heap allocations in those cases, so let's use it. ghstack-source-id: 146933314 Test Plan: CI -- if we think DimVector is good in general then I think we have to think this change is good? Reviewed By: mikeiovine Differential Revision: D33556198 fbshipit-source-id: cf2ad92c2d0b99ab1df4da0f6843e6ccb9a6320b	2022-01-14 14:32:40 -08:00
Don Jang	c97dc9286d	Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h Test Plan: revert-hammer Differential Revision: D32780415 (`999e93e6a8`) Original commit changeset: 119b7aedbf56 fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8	2021-12-04 19:44:07 -08:00
Don Jang	999e93e6a8	[Static Runtime] Move implementation details from impl.h into internal.h (#69274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69274 `impl.h` is the main header file that defines the interface of Static Runtime to its clients. However, it is currently filled with implementation details that should not be leaked to our clients. 1) this can unnecessarily leak our internals to our clients which can make it hard to change them later 2) cause unnecessary merge conflicts when multiple people are touching this enormous impl.cpp file. To alleviate the situation, this change moves the implementation details from impl.h into a new file, internal.h, that's internally kept without leaking the details to our clients. This change will be followed by another change to rename `impl.h` into `runtime.h` or anything better since `impl.h` is currently not about implementation but SR's interface. Note that this change is NOT complete since the remaining declarations in impl.h still contain a lot of implementation details. Therefore, we should keep working on minimizing the interface to prevent our API from being bloated unnecessarily. Also we need to work on modularizing our implementations into separate pieces organized by separate files in the near future. Test Plan: Existing unittests Reviewed By: donaldong Differential Revision: D32780415 fbshipit-source-id: 119b7aedbf563b195641c5674572a9348732145f	2021-12-04 14:48:28 -08:00
Ansha Yu	7342b654a1	[static runtime] dequantize out variant (#68664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68664 Reland D32187063 (`f120335643`), fixing lint Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Test on ctr_mbl_feed model 307210374 on 696 inputs Before: 0.0569016 ms. 0.296647%. aten::dequantize (10 nodes) After: 0.0423128 ms. 0.220481%. static_runtime::dequantize_copy (10 nodes, out variant) Reviewed By: mikeiovine Differential Revision: D32566429 fbshipit-source-id: b95dfc4c5e4115e083794093bc1571c7b1d72f5b	2021-11-30 09:03:26 -08:00
Alban Desmaison	748d9d2494	Revert D32187063: [static runtime] dequantize out variant Test Plan: revert-hammer Differential Revision: D32187063 (`f120335643`) Original commit changeset: 1fec6b74c7d3 fbshipit-source-id: 9770f8379e9ddba9e537fef0e66cc93c2caaf860	2021-11-18 10:12:31 -08:00
Ansha Yu	f120335643	[static runtime] dequantize out variant (#67873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67873 Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Reviewed By: hlu1 Differential Revision: D32187063 fbshipit-source-id: 1fec6b74c7d3f25d0f445775c4558d30c55dcece	2021-11-18 09:31:27 -08:00
Ansha Yu	01b30922dd	[static runtime] fuse gather+to+lengths_to_offsets (#64075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64075 Test Plan: Before: `I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987` After: `I0826 17:13:07.464485 1040300 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.46362. Iters per second: 154.712` Profile after: P453143683 Accuracy tested comparing with jit interpreter for no differences under 1e-3 (nnc ops turned on) https://www.internalfb.com/intern/diff/view-version/136824794/ ====== With 100-request recordio inputs (211 inputs) Before: `I1101 12:43:13.558375 742187 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.7882. Iters per second: 84.8309` After: `I1101 13:50:41.087644 1126186 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.6763. Iters per second: 85.6438` Profile after: P465977010 Constituent ops before (total is 0.5646): ``` 0.187392 ms. 1.61737%. fb::clip_ranges_gather (309 nodes, out variant) 0.174101 ms. 1.50266%. fb::lengths_to_offsets (464 nodes, out variant) 0.203126 ms. 1.75317%. static_runtime::to_copy (805 nodes, out variant) ``` Constitutent ops after (total is 0.4985): ``` 0.376559 ms. 3.25614%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0614349 ms. 0.531235%. fb::lengths_to_offsets (159 nodes, out variant) 0.0573315 ms. 0.495751%. static_runtime::to_copy (195 nodes, out variant) 0.00325543 ms. 0.0281501%. fb::gather_ranges (4 nodes, out variant) ``` Compare with jit interpreter inside benchmark: `I1101 13:55:53.013602 1149446 PtVsBlackBoxPredictorBenchLib.cpp:175] Finished comparing PT static runtime and jit interpreter results` ====== Casting on the fly: a. Static runtime off ``` Static runtime ms per iter: 11.4658. Iters per second: 87.2159 0.220367 ms. 1.94726%. static_runtime::to_copy (805 nodes, out variant) 0.172585 ms. 1.52504%. fb::clip_ranges_gather (309 nodes, out variant) 0.157836 ms. 1.39471%. fb::lengths_to_offsets (464 nodes, out variant) ``` b. Casting on the fly, using explicit allocation+to_copy (which has the fast pass for certain cases, but we'll always call empty): ``` I1115 09:08:35.711972 1925508 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 11.6732. Iters per second: 85.6662 0.599439 ms. 5.25098%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0552475 ms. 0.483958%. fb::lengths_to_offsets (159 nodes, out variant) 0.0576032 ms. 0.504593%. static_runtime::to_copy (195 nodes, out variant) 0.00299026 ms. 0.0261941%. fb::gather_ranges (4 nodes, out variant) ``` c. Casting on the fly with native::to (no explicit allocation, but no fast pass): ``` Static runtime ms per iter: 11.5627. Iters per second: 86.4849 0.454356 ms. 3.9652%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.06315 ms. 0.551115%. static_runtime::to_copy (195 nodes, out variant) 0.0590741 ms. 0.515544%. fb::lengths_to_offsets (159 nodes, out variant) 0.00359182 ms. 0.031346%. fb::clip_ranges_gather (4 nodes, out variant) ``` d. Removal of the to() call in question from the fusion pattern: ``` Static runtime ms per iter: 11.3658. Iters per second: 87.9836 0.29591 ms. 2.6479%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.154612 ms. 1.38352%. static_runtime::to_copy (500 nodes, out variant) 0.0567151 ms. 0.507505%. fb::lengths_to_offsets (159 nodes, out variant) 0.0051115 ms. 0.0457394%. fb::clip_ranges_gather (4 nodes, out variant) ``` Reviewed By: hlu1 Differential Revision: D30515441 fbshipit-source-id: 53acee10619ac2be7dc8982e929e3210c4bb6d21	2021-11-17 00:49:31 -08:00
Mike Iovine	07c5cb8c48	[Static Runtime] Optimize memory planner initialization (#64101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101 Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference. There are two optimizations in this diff: * Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType` * Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs. Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: movefast1990 Differential Revision: D30595579 fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2	2021-08-27 17:40:43 -07:00
Rong Rong (AI Infra)	7f1b672b7a	Revert D29952381: [Static Runtime] Ensure that unittests only use out variants or native ops Test Plan: revert-hammer Differential Revision: D29952381 (`8737e17af2`) Original commit changeset: e60e70b80ccf fbshipit-source-id: 59dc2f920b7ceaf94ba8f5f36024e7cc710f6645	2021-08-04 14:25:11 -07:00
Don Jang	8737e17af2	[Static Runtime] Ensure that unittests only use out variants or native ops (#62335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335 This change ensures that unittests only use out variants or native ops. - Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that. - We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that. - `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op = true` since they are written to use interpreter ops by design. Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully. Reviewed By: mikeiovine, hlu1 Differential Revision: D29952381 fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702	2021-08-04 11:37:15 -07:00
Raghavan Raman	ae58a4c45d	[Static Runtime] Added a variadic cat operator (#61302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29565344 Pulled By: navahgar fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248	2021-07-21 15:58:20 -07:00
Hao Lu	a07b08136f	[Static Runtime] Check unsupported up when enabling static runtime (#61613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613 Reviewed By: ajyu, movefast1990 Differential Revision: D29663466 fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750	2021-07-14 02:13:51 -07:00
Hao Lu	ccd0977060	[Static Runtime] Support prim::GetAttr/SetAttr (#61505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505 The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module. Reviewed By: ajyu Differential Revision: D29350173 fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb	2021-07-10 14:06:06 -07:00
Hao Lu	2112074f25	[Static Runtime] Add schema check to several aten ops (#59603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59603 D28698997 (`10345010f7`) was reverted because I forgot to replace the ``` VLOG(1) << "Found schema mismatch"; n->schema().dump(); ``` block in `aten::clamp_min` with `LogAndDumpSchema(n)` and that led to the bazel build to fail. I don't know why it makes the bazel build though. Test Plan: OSS CI. Reviewed By: ajyu Differential Revision: D28950177 fbshipit-source-id: 9bb1c6619e6b68415a3349f04933c2fcd24cc9a2	2021-06-10 23:39:00 -07:00
Rong Rong (AI Infra)	91eb831422	Revert D28698997: [Static Runtime] Add schema check to aten ops Test Plan: revert-hammer Differential Revision: D28698997 (`10345010f7`) Original commit changeset: 232fc60c0321 fbshipit-source-id: e351df62779fea85b7afe5160d3c40c4e7cee4ed	2021-06-05 07:48:49 -07:00
Hao Lu	10345010f7	[Static Runtime] Add schema check to aten ops (#59426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59426 Reviewed By: ajyu Differential Revision: D28698997 fbshipit-source-id: 232fc60c0321b8e68e4f1b6705233485260c281d	2021-06-04 21:38:45 -07:00
Hao Lu	c3d40fdf56	[ATen] Use expect_contiguous in layer_norm (#58067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58067 - Use expect_contiguous in layer_norm to avoid unnecessary refcount bumps when the tensors are contiguous - Clean up some leftovers from the hacky wrappers removal cleanup: use c10::MaybeOwned<Tensor> for bias tensors - Skip dispatcher for at::empty in the layer_norm impl in Static Runtime Test Plan: CI Reviewed By: swolchok Differential Revision: D28214298 fbshipit-source-id: 73150fa62d5c18f41a2264f8e56bbe5e377ad045	2021-05-11 22:56:32 -07:00
Hao Lu	32acc96f78	[Static Runtime] Fix bug in aten::clone (#58100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58100 aten::clone has a second arg, memory_format, which was not previously supported. Reviewed By: ajyu Differential Revision: D28347171 fbshipit-source-id: e083cc24c3228048429bba3497326415bc3d1f5a	2021-05-11 22:47:25 -07:00
Hao Lu	5439977352	[Static Runtime] Revamp op schema check (#57521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57521 When an op is added to static runtime, we manually check the schema (not with the jit schema check, more with IValue.IsTensor()/IsInt() etc) and make sure it's the one we do support. If the schema doesn't match, SR would throw an exception with TORCH_CHECK, which makes the entire graph invalid for SR. This diff tries to make the op with unsupported schema to use the fallback path and make it go through the dispatcher instead: ``` if (node->kind() != prim::ListConstruct && node->kind() != prim::TupleConstruct && node->kind() != prim::DictConstruct && node->kind() != prim::ListUnpack) { const Operator& op = node->getOperator(); TORCH_CHECK(op.hasOperation()); op_ = op.getOperation(node); VLOG(1) << "Fallback interpreter for node: " << PrintNode(node); } ``` The 2-arg `torch.norm`, which the SR `torch.norm impl doesn't support (only 3, 4, 5 args are supported), now can run in static runtime with fallback mode. (Note: this ignores all push blocking failures!) Reviewed By: ajyu Differential Revision: D27531447 fbshipit-source-id: 0a9c2662ac73ed0393a23cc3a2c7df45fdb00fdd	2021-05-04 02:48:04 -07:00
Edvard Ghazaryan	b3e1802439	Static runtime support for fb::expand_dims (#57282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57282 Added support for fb::expand_dims for SR. Test Plan: buck test caffe2/torch/fb/sparsenn:gpu_test -- test_expand_dims buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators Reviewed By: hlu1 Differential Revision: D28043049 fbshipit-source-id: 01f59db7b507f027b220f044d6ff23602adbdb06	2021-04-29 22:40:56 -07:00
Hao Lu	33f206b865	[StaticRuntime] Replace StorageImpl with TensorImpl in MemoryPlanner (#56447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56447 MemoryPlanner shouldn't manage StorageImpls; instead, it should manage the TensorImpls because the StorageImpl in Tensors can change. Test Plan: CI Reviewed By: ajyu Differential Revision: D27840361 fbshipit-source-id: f22165d167c70165be2934c6717b5057a8bb4d29	2021-04-20 23:04:01 -07:00
Peng Wu	18662d4321	[Static runtime] refactor MemoryPlanner codes to prepare for output tensor memory planning (#55809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55809 [Static runtime] refactor MemoryPlanner codes to prepare for output tensor memory planning Test Plan: buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- --exact 'caffe2/caffe2/fb/predictor:pytorch_predictor_test - PyTorchPredictor.StaticRuntime' Reviewed By: bwasti Differential Revision: D27411416 fbshipit-source-id: 7dae7c2586ce3b4ebacf6169017140166c30e99c	2021-04-13 11:04:47 -07:00
Hao Lu	c3d0607ffa	[Static Runtime] Make sure the copy version of the op exist in ReplaceWithCopy (#55337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55337 `static_runtime::permute_copy` is in fb-only folder. Because `caffe2/test/test_static_runtime.py` is in OSS, we can't load the fb-only operator library. The workaround is to check at runtime whether the op is registered or not. Test Plan: This fixed two of the broken tests: ``` ✓ Pass: caffe2/test:static_runtime - test_multihead_attention_layer (test_static_runtime.TestStaticModule) (10.316) ✓ Pass: caffe2/test:static_runtime - test_mlp (test_static_runtime.TestStaticModule) (16.134) ``` Reviewed By: ajyu Differential Revision: D27577066 fbshipit-source-id: ac87dcde71f0d5140ccde448bb49aaebbbb5908a	2021-04-06 04:25:04 -07:00
Hao Lu	a8ecf306da	[Static Runtime] Remove dead code (#53588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53588 Remove `SRViewOperatorRegistry` and related code now that it's no longer needed. Reviewed By: swolchok Differential Revision: D26901367 fbshipit-source-id: fa73501cd785d4b89466cda81481aea892f8241f	2021-03-09 13:36:41 -08:00
Bram Wasti	56f8379802	[static runtime] Move all heavy constructor logic into InferenceModule (renamed to StaticModule) (#51564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51564 Constructor logic was spread throughout InferenceModule and StaticRuntime. This diff unifies the two. After a lot of discussion on this diff D25961626 it became apparent that `clone` is uglier than a cheap StaticRuntime. This means StaticRuntime is effectively StaticModule and the only code in the new StaticRuntime is the `run` functions. ``` graph, schema = PrepareForStaticModule(torchscript_module) sm = StaticModule(graph, schema, options) sm(inputs) // or create many cheap runtimes with the module sr = StaticRuntime(sm) sr(inputs) ``` Changelist: - Rename InferenceModule StaticModule - Move all logic for construction into StaticModule - Create a new StaticRuntime that only has a unique memory planner (everything else is in StaticModule) - Update comments with explanation - Propagate all changes to predictor integration - Propagate all changes to python integration - Change semantics to be a bit more PyTorch-standard (no "run" calls, no "get_" getters). Test Plan: buck test //caffe2/test:static_runtime buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D25592967 fbshipit-source-id: 8233bed03137ce129137af2d44bce0095033ef0f	2021-03-05 10:15:26 -08:00
Hao Lu	63e0e88ccc	[PyPer] More at::empty -> at::detail::empty_cpu (#53333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53333 - Add more variants to `create_empty_from` to take more args, like dtype/layout/device. - Clean up stray at::empty uses, mostly in the out variants. Reviewed By: ajyu Differential Revision: D26799900 fbshipit-source-id: 6676d8043fead63208913ef3a28cabbae76e46bb	2021-03-05 00:16:51 -08:00
Hao Lu	248e8b42fa	[Static Runtime] Use native version of at::empty (#53216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53216 - at::native::empty_cpu calls at::detail::empty_cpu without any changes to the arguments. So we could call at::detail::empty_cpu directly. - There is no need to create a TensorOptions object first since we can get all the relevant information from the tensor directly. Reviewed By: bertmaher, swolchok Differential Revision: D26792255 fbshipit-source-id: 7a4e368a19cea79e136e34dab854cb1d37dbeb58	2021-03-03 17:13:26 -08:00
Bram Wasti	d4e64dad15	[static runtime] Register both TupleConstruct and ListConstruct as out variants (#52684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52684 With alias analysis we get much more powerful registration and we can start removing "native" and fallback interpreted implementations. `inputsOutOfPlace` is an artifact of the hardcoded "native" and lax fallback implementations. Ideally every node will run out of place every time. Afaik, there's never a reason to disable it and we may want to remove that functionality. This diff does introduce a "leak" in the memory management - containers are not cleaned up. This only happens when out variants are enabled Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --run-disabled Reviewed By: maratsubkhankulov, hlu1 Differential Revision: D26515801 fbshipit-source-id: 7391d66b9d36e15fc2955a5c34a04d027d18fe78	2021-03-02 09:55:25 -08:00
Hao Lu	11cda929fb	[StaticRuntime] Fix bug in MemoryPlanner (#51342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51342 There is a subtle bug with the MemoryPlanner with regard to view ops with out variant. ``` def forward(self, a: Tensor, shape: List[int]): b = a.reshape(shape) return b + b ``` In this case, if we replace reshape with the out variant, b would be managed by the MemoryPlanner and the storage of its output would have been set to nullptr right after inference by the MemoryPlanner if opts.cleanup_activations is true. Because b is a view of a, the storage of a is also set to nullptr, and this violates the API which promises that a is const. To fix this bug, I changed the MemoryPlanner so that it puts b in the unmanaged part. Test Plan: Add unit test to enforce the constness of inputs ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: ajyu Differential Revision: D26144203 fbshipit-source-id: 2dbacccf7685d0fe0f0b1195166e0510b2069fe3	2021-01-29 21:16:02 -08:00
Hao Lu	d035d56bfb	[StaticRuntime] Add out variant for reshape and flatten (#51249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51249 - Add out variant for reshape and flatten. reshape and flatten only create tensor views when it can. In cases where it can't, it does a copy. The out variant reuses the TensorImpl for both cases. The difference is that the TensorImpl is a view in the first case, but a normal TensorImpl in the second case. - Create a separate registry for the view ops with out variants. Because Tensor views can't participate in memory reuse (memonger), we need to track these ops separately. - The MemoryPlanner does not track the StorageImpl of tensor views because they don't own the storage, however, in cases where reshape does not create a view, the MemoryPlanner does manage the output tensor. Reviewed By: ajyu Differential Revision: D25992202 fbshipit-source-id: dadd63b78088c129e491d78abaf8b33d8303ca0d	2021-01-27 22:44:11 -08:00
Bert Maher	aa3c28a29e	[static runtime] Shortcut resize_({0}) Summary: We do a lot of resize_({0}) to force `out` operators to properly resize their results, and `resize_` does a fair bit of extraneous work (e.g. trip through dispatch, checks for memory_format and named tensors, etc.). If we strip it down to the bare minimum it's just setting the sizes to 0, so lets do that directly. Test Plan: Perf results suggest maybe a 1% win: ``` batch 20: P163138256 (large win, 1.7%, mostly in fb_fc_out) batch 1: P163139591 (smaller win, 0.88%, mostly in resize_) ``` Reviewed By: swolchok Differential Revision: D25932595 fbshipit-source-id: d306a0a15c0e1be12fde4a7f149e3ed35665e3c0	2021-01-21 17:08:47 -08:00
Scott Wolchok	c6cb632c63	[PyTorch] Make SROpFunctor a raw function pointer (#50395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50395 There's no need for these to be `std::function`. ghstack-source-id: 119684828 Test Plan: CI Reviewed By: hlu1 Differential Revision: D25874187 fbshipit-source-id: e9fa3fbc0dca1219ed13904ca704670ce24f7cc3	2021-01-13 15:51:14 -08:00
Bram Wasti	ace1680b68	[static runtime] Remove register concept by giving ownership to the nodes (#50050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50050 Every node will now own its outputs. I don't expect any big improvements perf-wise from this diff, the only eliminated code is from deallocate_registers Largely, this is to enable more optimizations going forward. Test Plan: buck test mode/dev //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/test:static_runtime Reviewed By: hlu1 Differential Revision: D25571181 fbshipit-source-id: 91fcfbd5cd968af963ba89c45656997650ca6d18	2021-01-07 10:19:58 -08:00
Bram Wasti	274ce26fd8	[static runtime] Add Internal Ops to the registry (#48616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48616 This adds a couple of _out variants and then registers them to the registry. I also added the concept of "canReuse{Input,Output}" so that we can annotate tensors that are not optimizable (specifically, non-float tensors). In the future we can change this (with this D25062301) after removing `RecordFunction`, we see these results ``` BS=20 --- caffe2: 0.651617 ~ 0.666354 static runtime: 0.753481 pytorch: 0.866658 BS=1 --- caffe2: 0.0858684 ~ 0.08633 static runtime: 0.209897 pytorch: 0.232694 ``` Test Plan: standard internal test of ads model against caffe2 reference (see the scripts in this quip: https://fb.quip.com/ztERAYjuzdlr) Reviewed By: hlu1 Differential Revision: D25066823 fbshipit-source-id: 25ca181c62209a4c4304f7fe73832b13e314df80	2020-12-08 09:32:38 -08:00
Bram Wasti	286cdf3cda	[static runtime] add static registry (#48258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48258 This will enable closed source contributions Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D25031586 fbshipit-source-id: def859fa2fb4f01910b040242662a51b85804f01	2020-11-20 17:05:24 -08:00
Katy Voor	fe7d1d7d0e	Add LeakyReLU operator to static runtime (#47798 ) Summary: - Add LeakyReLU operator to static runtime - Add LeakyReLU benchmark - Add LeakyReLU correctness test case Static Runtime ``` ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_leaky_relu/1 4092 ns 4092 ns 172331 BM_leaky_relu/8 4425 ns 4425 ns 158434 BM_leaky_relu/20 4830 ns 4830 ns 145335 BM_leaky_relu_const/1 3545 ns 3545 ns 198054 BM_leaky_relu_const/8 3825 ns 3825 ns 183074 BM_leaky_relu_const/20 4222 ns 4222 ns 165999 ``` Interpreter ``` ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_leaky_relu/1 7183 ns 7182 ns 96377 BM_leaky_relu/8 7580 ns 7580 ns 91588 BM_leaky_relu/20 8066 ns 8066 ns 87183 BM_leaky_relu_const/1 6466 ns 6466 ns 107925 BM_leaky_relu_const/8 7063 ns 7063 ns 98768 BM_leaky_relu_const/20 7380 ns 7380 ns 94564 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47798 Reviewed By: ezyang Differential Revision: D24927043 Pulled By: kavoor fbshipit-source-id: 69b12cc57f725f1dc8d68635788813710a74dc2b	2020-11-13 22:05:52 -08:00

1 2

53 Commits