pytorch/torch/csrc/jit
Ansha Yu 46321cb937 [static runtime] binding for aten::norm_out (#56636)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56636

Test Plan:
Test it runs on the aug_1x model, which has aten::norm, and verify jit/sr results
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```

```
Time per node type:
        1.53159 ms.    35.8619%. fb::sigrid_transforms_torch_bind (1 nodes)
         0.9481 ms.    22.1996%. aten::linear (6 nodes)
       0.704806 ms.    16.5029%. aten::argmin (1 nodes)
       0.252252 ms.    5.90643%. aten::matmul (1 nodes)
       0.140869 ms.    3.29842%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
       0.100014 ms.    2.34181%. fb::clip_ranges_gather (263 nodes)
      0.0880838 ms.    2.06247%. aten::sub (1 nodes)
      0.0553556 ms.    1.29614%. aten::repeat (1 nodes)
      0.0438464 ms.    1.02665%. aten::norm (1 nodes)
      0.0395956 ms.   0.927124%. fb::batch_box_cox (1 nodes)
       0.035834 ms.   0.839045%. aten::__getitem__ (506 nodes)
      0.0345233 ms.   0.808357%. prim::TupleUnpack (254 nodes)
      0.0316876 ms.   0.741959%. aten::sigmoid (2 nodes)
      0.0293246 ms.   0.686629%. aten::mul (3 nodes)
      0.0287696 ms.   0.673635%. fb::offsets_to_ranges (253 nodes)
      0.0242373 ms.   0.567511%. aten::pow (1 nodes)
      0.0224204 ms.    0.52497%. fb::simple_embedding_bag_sum (3 nodes)
      0.0200074 ms.   0.468469%. fb::casted_batch_one_hot_lengths (1 nodes)
      0.0190264 ms.   0.445499%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0167253 ms.    0.39162%. prim::TupleConstruct (1 nodes)
      0.0164962 ms.   0.386255%. aten::sum (3 nodes)
      0.0158986 ms.   0.372262%. prim::DictConstruct (2 nodes)
      0.0109372 ms.   0.256093%. aten::div (1 nodes)
     0.00910563 ms.   0.213207%. prim::ListConstruct (4 nodes)
     0.00876917 ms.   0.205328%. static_runtime::to_copy (8 nodes)
     0.00822567 ms.   0.192603%. fb::sigrid_hash_precompute (1 nodes)
     0.00622559 ms.   0.145771%. aten::contiguous (1 nodes)
     0.00460064 ms.   0.107723%. aten::narrow (4 nodes)
     0.00297164 ms.  0.0695804%. static_runtime::reshape_copy (2 nodes)
     0.00287099 ms.  0.0672237%. aten::logit (1 nodes)
     0.00277557 ms.  0.0649894%. aten::add (1 nodes)
     0.00264978 ms.  0.0620441%. aten::clamp_min (1 nodes)
     0.00215832 ms.  0.0505366%. aten::relu (1 nodes)
     0.00213779 ms.   0.050056%. fb::gather_ranges (4 nodes)
     0.00195846 ms.  0.0458571%. aten::full (1 nodes)
     0.00177333 ms.  0.0415222%. aten::stack (1 nodes)
     0.00147449 ms.   0.034525%. aten::size (3 nodes)
    0.000762524 ms.  0.0178544%. aten::expand_as (1 nodes)
    0.000757406 ms.  0.0177345%. fb::clip_ranges (2 nodes)
    0.000614798 ms.  0.0143954%. fb::lengths_to_offsets (3 nodes)
    0.000407952 ms. 0.00955212%. static_runtime::flatten_copy (1 nodes)
    0.000159918 ms. 0.00374445%. prim::device (1 nodes)
         4.2708 ms. in Total
StaticRuntime setup time: 0.000407 ms
Memory allocation time: 0.0089714 ms
Memory deallocation time: 0.0592135 ms
Outputs deallocation time: 0.0458097 ms
Total memory managed: 947328 bytes
Total number of reused tensors: 28
```

Reviewed By: hlu1

Differential Revision: D27922070

fbshipit-source-id: 538b39b7fff0638fc994b7983bf32d9e9f15d016
2021-04-28 08:44:10 -07:00
..
api Remove notion of "level" from Module::dump_to_str. (#52539) 2021-03-09 05:45:57 -08:00
backends [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT 2021-04-07 10:37:27 -07:00
codegen s/AutoDispatchBelowAutograd/AutoDispatchBelowInplaceOrView. (#56657) 2021-04-23 15:50:00 -07:00
cuda Merge CUDA Streams and Events (#53902) 2021-04-05 08:19:55 -07:00
docs Add remaining ToCs to ToC lint (#56487) 2021-04-20 10:28:47 -07:00
frontend s/AutoDispatchBelowAutograd/AutoDispatchBelowInplaceOrView. (#56657) 2021-04-23 15:50:00 -07:00
ir add hardtanh(0,6) to the set of MKLDNN fusible ops for mobilenetv2 (#56203) 2021-04-23 08:08:17 -07:00
mobile [PyTorch Edge] Move torch::jit::mobile::_export_operator_list() from serialization/export_module.cpp to mobile/import.cpp (#56044) 2021-04-15 17:53:36 -07:00
passes [Pytorch Edge] Remove methods_to_optimize arg (#57045) 2021-04-27 14:54:13 -07:00
python [Pytorch Edge] Remove methods_to_optimize arg (#57045) 2021-04-27 14:54:13 -07:00
runtime [static runtime] binding for aten::norm_out (#56636) 2021-04-28 08:44:10 -07:00
serialization Enable forward/backward compatibility in TS mobile (#56079) 2021-04-23 16:55:18 -07:00
tensorexpr [NNC] removed the second run of llvm passmanager - it is repeated and caused a slowdown in the generated code (#56837) 2021-04-27 08:36:04 -07:00
testing Revert D27448156: irange for size_t 2021-04-03 19:14:00 -07:00
jit_log.cpp
jit_log.h added macros in jit logging to check whether loggings are enabled; replaced similar checks in LLVM codegen with such macros (#49121) 2020-12-21 13:01:22 -08:00
jit_opt_limit.cpp Implement optimization bisect (#49031) 2021-01-11 12:25:28 -08:00
jit_opt_limit.h Update JIT_OPT macro for easier use (#50602) 2021-01-20 11:15:20 -08:00
OVERVIEW.md Add remaining ToCs to ToC lint (#56487) 2021-04-20 10:28:47 -07:00
README.md
resource_guard.h [BE] Make torch/csrc/jit/tensorexpr/ clang-tidy clean (#55628) 2021-04-08 19:44:14 -07:00

PyTorch JIT

This folder contains (most of) the C++ code for the PyTorch JIT, a language and compiler stack for executing PyTorch models portably and efficiently. To learn more about the JIT from a user perspective, please consult our reference documentation and tutorials.

A brief summary of the source tree:

  • OVERVIEW.md: High-level technical overview of the JIT.
  • frontend/: Taking PyTorch modules in Python and translating them into the JIT IR.
  • ir/: Core IR abstractions.
  • runtime/: Interpreter, graph execution, and JIT operators.
  • codegen/: Generating efficient, hardware-specific code for JIT subgraphs.
  • serialization/: Saving and loading modules.
  • api/: Any user-facing C++ or Python interfaces.
  • python/: Binding stuff into Python or accessing information from the Python environment.
  • testing/: Utilities and helpers for testing.
  • mobile/: Mobile-specific implementations of runtime components.
  • passes/: IR-to-IR passes, generally for optimization and lowering.
  • generated/: This folder is generated by the PyTorch build, and contains bindings for native PyTorch operators into the JIT.

Refer to each folder for more in-depth documentation.

Other relevant parts of the codebase not contained here:

  • aten/src/ATen/core: contains JIT code re-used by other elements of the runtime system (eager, mobile, etc.)