mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56636 Test Plan: Test it runs on the aug_1x model, which has aten::norm, and verify jit/sr results ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.53159 ms. 35.8619%. fb::sigrid_transforms_torch_bind (1 nodes) 0.9481 ms. 22.1996%. aten::linear (6 nodes) 0.704806 ms. 16.5029%. aten::argmin (1 nodes) 0.252252 ms. 5.90643%. aten::matmul (1 nodes) 0.140869 ms. 3.29842%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.100014 ms. 2.34181%. fb::clip_ranges_gather (263 nodes) 0.0880838 ms. 2.06247%. aten::sub (1 nodes) 0.0553556 ms. 1.29614%. aten::repeat (1 nodes) 0.0438464 ms. 1.02665%. aten::norm (1 nodes) 0.0395956 ms. 0.927124%. fb::batch_box_cox (1 nodes) 0.035834 ms. 0.839045%. aten::__getitem__ (506 nodes) 0.0345233 ms. 0.808357%. prim::TupleUnpack (254 nodes) 0.0316876 ms. 0.741959%. aten::sigmoid (2 nodes) 0.0293246 ms. 0.686629%. aten::mul (3 nodes) 0.0287696 ms. 0.673635%. fb::offsets_to_ranges (253 nodes) 0.0242373 ms. 0.567511%. aten::pow (1 nodes) 0.0224204 ms. 0.52497%. fb::simple_embedding_bag_sum (3 nodes) 0.0200074 ms. 0.468469%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0190264 ms. 0.445499%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0167253 ms. 0.39162%. prim::TupleConstruct (1 nodes) 0.0164962 ms. 0.386255%. aten::sum (3 nodes) 0.0158986 ms. 0.372262%. prim::DictConstruct (2 nodes) 0.0109372 ms. 0.256093%. aten::div (1 nodes) 0.00910563 ms. 0.213207%. prim::ListConstruct (4 nodes) 0.00876917 ms. 0.205328%. static_runtime::to_copy (8 nodes) 0.00822567 ms. 0.192603%. fb::sigrid_hash_precompute (1 nodes) 0.00622559 ms. 0.145771%. aten::contiguous (1 nodes) 0.00460064 ms. 0.107723%. aten::narrow (4 nodes) 0.00297164 ms. 0.0695804%. static_runtime::reshape_copy (2 nodes) 0.00287099 ms. 0.0672237%. aten::logit (1 nodes) 0.00277557 ms. 0.0649894%. aten::add (1 nodes) 0.00264978 ms. 0.0620441%. aten::clamp_min (1 nodes) 0.00215832 ms. 0.0505366%. aten::relu (1 nodes) 0.00213779 ms. 0.050056%. fb::gather_ranges (4 nodes) 0.00195846 ms. 0.0458571%. aten::full (1 nodes) 0.00177333 ms. 0.0415222%. aten::stack (1 nodes) 0.00147449 ms. 0.034525%. aten::size (3 nodes) 0.000762524 ms. 0.0178544%. aten::expand_as (1 nodes) 0.000757406 ms. 0.0177345%. fb::clip_ranges (2 nodes) 0.000614798 ms. 0.0143954%. fb::lengths_to_offsets (3 nodes) 0.000407952 ms. 0.00955212%. static_runtime::flatten_copy (1 nodes) 0.000159918 ms. 0.00374445%. prim::device (1 nodes) 4.2708 ms. in Total StaticRuntime setup time: 0.000407 ms Memory allocation time: 0.0089714 ms Memory deallocation time: 0.0592135 ms Outputs deallocation time: 0.0458097 ms Total memory managed: 947328 bytes Total number of reused tensors: 28 ``` Reviewed By: hlu1 Differential Revision: D27922070 fbshipit-source-id: 538b39b7fff0638fc994b7983bf32d9e9f15d016 |
||
|---|---|---|
| .. | ||
| api | ||
| backends | ||
| codegen | ||
| cuda | ||
| docs | ||
| frontend | ||
| ir | ||
| mobile | ||
| passes | ||
| python | ||
| runtime | ||
| serialization | ||
| tensorexpr | ||
| testing | ||
| jit_log.cpp | ||
| jit_log.h | ||
| jit_opt_limit.cpp | ||
| jit_opt_limit.h | ||
| OVERVIEW.md | ||
| README.md | ||
| resource_guard.h | ||
PyTorch JIT
This folder contains (most of) the C++ code for the PyTorch JIT, a language and compiler stack for executing PyTorch models portably and efficiently. To learn more about the JIT from a user perspective, please consult our reference documentation and tutorials.
A brief summary of the source tree:
OVERVIEW.md: High-level technical overview of the JIT.frontend/: Taking PyTorch modules in Python and translating them into the JIT IR.ir/: Core IR abstractions.runtime/: Interpreter, graph execution, and JIT operators.codegen/: Generating efficient, hardware-specific code for JIT subgraphs.serialization/: Saving and loading modules.api/: Any user-facing C++ or Python interfaces.python/: Binding stuff into Python or accessing information from the Python environment.testing/: Utilities and helpers for testing.mobile/: Mobile-specific implementations of runtime components.passes/: IR-to-IR passes, generally for optimization and lowering.generated/: This folder is generated by the PyTorch build, and contains bindings for native PyTorch operators into the JIT.
Refer to each folder for more in-depth documentation.
Other relevant parts of the codebase not contained here:
aten/src/ATen/core: contains JIT code re-used by other elements of the runtime system (eager, mobile, etc.)