pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikhail Zolotukhin	e5ed037529	[StaticRuntime] Add a 'speed of light' benchmark. (#46308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46308 This PR adds a hand optimized version of DeepAndWide model with the goal of estimating overheads of static runtime. While static runtime is currently much faster than the existing JIT interpreter, it would be useful to understand how close we are to an absolutely 0-overhead system. Currently, this "ideal" implementation is 2x faster than the static runtime on batchsize=1. Full benchmark results: ``` Running build/bin/static_runtime_bench Run on (24 X 2394.71 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 4096K (x24) L3 Unified 16384K (x24) ------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------ BM_deep_wide_base/1 59518 ns 59500 ns 10909 BM_deep_wide_base/8 74635 ns 74632 ns 9317 BM_deep_wide_base/20 82186 ns 82147 ns 9119 BM_deep_wide_fast/1 13851 ns 13851 ns 49825 << new BM_deep_wide_fast/8 22497 ns 22497 ns 32089 << new BM_deep_wide_fast/20 23868 ns 23841 ns 31184 << new BM_deep_wide_jit_graph_executor/1 62786 ns 62786 ns 10835 BM_deep_wide_jit_graph_executor/8 76730 ns 76718 ns 7529 BM_deep_wide_jit_graph_executor/20 78886 ns 78883 ns 8769 BM_deep_wide_jit_profiling_executor/1 69504 ns 69490 ns 10309 BM_deep_wide_jit_profiling_executor/8 75718 ns 75715 ns 9199 BM_deep_wide_jit_profiling_executor/20 75364 ns 75364 ns 9010 BM_deep_wide_static/1 40324 ns 40318 ns 17232 BM_deep_wide_static/8 50327 ns 50319 ns 13335 BM_deep_wide_static/20 53075 ns 53071 ns 12855 BM_deep_wide_static_threaded/threads:8 6258 ns 49873 ns 14008 ``` PS: The implementation could probably be optimized even more. Differential Revision: D24300702 Test Plan: Imported from OSS Reviewed By: dzhulgakov Pulled By: ZolotukhinM fbshipit-source-id: 7870bdef127c39d11bcaa4f03a60eb80a46be58e	2020-10-19 23:35:55 -07:00
Hao Lu	ea4fbb2e5e	[StaticRuntime] Replace hashtable based workspace with vector<IValue> (#45892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45892 Previously we were using hashtable (`std::unordered_map` in OSS, `folly::F14FastMap` in fb) for workspace, a container for all the IValues in the graph. Hashtable based lookups can be expensive. This diff replaces the hashtable with `std::vector` and extra bookkeepings are introduced to keep track of the indices of graph inputs/outputs in `StaticRuntime` and op inputs/outputs in `ProcessedNode`. Reviewed By: dzhulgakov Differential Revision: D24098763 fbshipit-source-id: 337f835ee144985029b5fa2ab98f9bcc5e3606b6	2020-10-08 09:50:30 -07:00
Hao Lu	e8d8de32b4	[StaticRuntime] Implement StaticRuntime::benchmark (#45639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45639 `StaticRuntime::run_individual` is to mimic the caffe2 operator benchmark `SimpleNet::TEST_Benchmark`, so we can accurate information on the operator breakdown. We found that the PyTorch AutogradProfiler adds a lot of overhead to small models, such as the adindexer precomputation_merge net, 100% for batch_size 1, 33% for batch_size 20. This implementation adds very little overhead, as shown in the test plan. Test Plan: Test results are fb internal only. Reviewed By: yinghai, dzhulgakov Differential Revision: D24012088 fbshipit-source-id: f32eb420aace93e2de421a15e4209fce6a3d90f0	2020-10-06 20:54:43 -07:00
Hao Lu	2b48dd168d	[StaticRuntime] Integrate Static Runtime into PyTorchPredictor (#45640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45640 Reviewed By: dzhulgakov Differential Revision: D23996656 fbshipit-source-id: 63d88c89d1df61a04deadc472319607ed83867e5	2020-10-02 23:03:05 -07:00
Bram Wasti	87b356d093	[static runtime] Split out graph preparation from runtime (#44131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44131 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604305 Pulled By: bwasti fbshipit-source-id: 7b47da4961d99074199417ef1407a788c7d80ee6	2020-09-28 13:01:23 -07:00
generatedunixname89002005325676	7818a214c5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23959094 fbshipit-source-id: 6caa046d263114bff38a38d756099aac357e4f04	2020-09-28 05:08:46 -07:00
Bram Wasti	e5f6e5af13	Add Deep and wide to test and flatten/tranpose for good measure (#44129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44129 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604302 Pulled By: bwasti fbshipit-source-id: 5787f6f32a80b22b1b712c4116f70370dad98f12	2020-09-25 11:05:41 -07:00
Bram Wasti	d1a11618f5	[static runtime] Add _out variants and reuse memory (#44128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44128 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604304 Pulled By: bwasti fbshipit-source-id: 06a23cb75700a0fc733069071843b7b498e7b9e9	2020-09-25 11:03:06 -07:00
Bram Wasti	a475613d1d	[static runtime] Swap to out-variant compatible nodes (#44127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604306 Pulled By: bwasti fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b	2020-09-14 12:38:25 -07:00
Hao Lu	8538a79bfe	[jit][static] Basic executor (#43647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43647 Nothing fancy, just a basic implementation of the graph executor without using stack machine. Reviewed By: bwasti Differential Revision: D23208413 fbshipit-source-id: e483bb6ad7ba8591bbe1767e669654d82f42c356	2020-08-28 23:20:07 -07:00
Hao Lu	25dcc28cd6	[jit][static] Replace deepcopy with copy (#43182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43182 We should avoid using `deepcopy` on the module because it involves copying the weights. Comparing the implementation of `c10::ivalue::Object::copy()` vs `c10::ivalue::Object::deepcopy()`, the only difference is `deepcopy` copies the attributes (slots) while `copy` does not. Reviewed By: bwasti Differential Revision: D23171770 fbshipit-source-id: 3cd711c6a2a19ea31d1ac1ab2703a0248b5a4ef3	2020-08-26 11:15:49 -07:00
Hao Lu	8864148823	[jit] DeepAndWide benchmark (#43096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43096 Add benchmark script for deep and wide model. Reviewed By: bwasti, yinghai Differential Revision: D23099925 fbshipit-source-id: aef09d8606eba1eccc0ed674dfea59b890d3648b	2020-08-15 01:27:12 -07:00
Bram Wasti	523b2ce9c6	[jit][static runtime] Simplify the graph and add operator whitelist (#43024 ) Summary: This PR whitelists and simplifies graphs to help with development later on. Key to note in this PR is the use of both a pattern substitution and the registration of custom operators. This will likely be one of the main optimization types done in this folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43024 Reviewed By: hlu1 Differential Revision: D23114262 Pulled By: bwasti fbshipit-source-id: e25aa3564dcc8a2b48cfd1561b3ee2a4780ae462	2020-08-13 20:19:55 -07:00
Bram Wasti	ada8404f2d	[jit] Scaffold a static runtime (#42753 ) Summary: The premise of this approach is that a small subset of neural networks are well represented by a data flow graph. The README contains more information. The name is subject to change, but I thought it was a cute reference to fire. suo let me know if you'd prefer this in a different spot. Since it lowers a JIT'd module directly I assumed the JIT folder would be appropriate. There is no exposed Python interface yet (but is mocked up in `test_accelerant.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42753 Reviewed By: zou3519 Differential Revision: D23043771 Pulled By: bwasti fbshipit-source-id: 5353731e3aae31c08b5b49820815da98113eb551	2020-08-12 13:05:27 -07:00

14 Commits