Commit Graph

118 Commits

Author SHA1 Message Date
Hao Lu
ccd0977060 [Static Runtime] Support prim::GetAttr/SetAttr (#61505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505

The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module.

Reviewed By: ajyu

Differential Revision: D29350173

fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb
2021-07-10 14:06:06 -07:00
Don Jang
a74516d699 [static runtime] implement aten::log (#61393)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393

Test Plan:
Added `StaticRuntime.IndividualOps_Log`

```
...
[ RUN      ] StaticRuntime.IndividualOps_Log
V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1)
V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29511622

fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517
2021-07-08 18:25:35 -07:00
Don Jang
c2b0af2560 [static runtime] Implement aten::sign (#61154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154

Test Plan:
Added `StaticRuntime.IndividualOps_Sign`

```
[djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1
...
[ RUN      ] StaticRuntime.IndividualOps_Sign
V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1)
V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29518603

fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a
2021-07-07 12:29:25 -07:00
Mike Guo
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
Hao Lu
46595a9623 [Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090

Reviewed By: ajyu

Differential Revision: D29479860

fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b
2021-07-01 00:01:37 -07:00
Yukio Siraichi
b099f5429c Port argmin kernel to structured kernels. (#60364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29265855

Pulled By: ezyang

fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec
2021-06-29 14:16:59 -07:00
Bert Maher
ddb1f293b6 Fix the NNC-disabled path in static runtime for perf comparisons
Summary:
The path which has NNC/LLVM disabled still constructs a tensor
expression, even though `supports()` will always return false, so a
`KernelScope` is necessary to manage those memory allocations.

I guess we could avoid building the TEs at all in this case, but it's pretty
clean this way.

Test Plan:
```
scripts/bertrand/static_runtime/run.sh
```

Reviewed By: hlu1

Differential Revision: D29415909

fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30
2021-06-28 15:39:07 -07:00
Hao Lu
1e31d26b1d [Static Runtime] Fix bugs in static_runtime::to_copy (#60503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503

Fixed a few issues in the static_runtime::to_copy impl:
- fixed a bug with memory_format
- copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit.
- fix the schema in the `ReplaceWithCopy` pass
- add registration of `static_runtime::to_copy.other`

Add more unit tests:
- test dynamic shapes
- test strided input tensor to `aten::to`
- test alias case (same input/output)
- test `to.other`

Reviewed By: ajyu

Differential Revision: D26838933

fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522
2021-06-23 19:57:17 -07:00
Ansha Yu
0baad214b0 [static runtime][fix] resize to the input tensor size for full_like (#60229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229

Fix bug where we did not resize to the input tensor size, causing
the output to be incorrect

Test Plan:
Test on replayer, rebased on D29217781, with model 278203319_26.

Verify with jit outputs (D28583950)

`./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1`

Reviewed By: hlu1, movefast1990

Differential Revision: D29218918

fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409
2021-06-18 09:56:25 -07:00
Brian Hirsh
6b5e77904f Revert D29104396: Port argmin kernel to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104396 (226d745a0b)

Original commit changeset: 39c59bcc0446

fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b
2021-06-17 10:31:06 -07:00
Yukio Siraichi
226d745a0b Port argmin kernel to structured kernels. (#59938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104396

Pulled By: ezyang

fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7
2021-06-17 08:18:13 -07:00
Hao Lu
eda2ddb5b0 [ATen] Fix aten::to schema (#60001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001

Fix the aten::to schema to reflect that the output may alias input.

Test Plan: Added new unit tests.

Reviewed By: ezyang

Differential Revision: D29121620

fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c
2021-06-15 20:04:20 -07:00
Hao Lu
cbd1e8c335 [Static Runtime] Fix bug in aten::to (#59995)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59995

Reviewed By: ajyu

Differential Revision: D29083106

fbshipit-source-id: 687ffb121af2716d606c145474942650a2d9ac7e
2021-06-14 22:54:43 -07:00
Hao Lu
2112074f25 [Static Runtime] Add schema check to several aten ops (#59603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59603

D28698997 (10345010f7) was reverted because I forgot to replace the
```
  VLOG(1) << "Found schema mismatch";
  n->schema().dump();
```
block in `aten::clamp_min` with `LogAndDumpSchema(n)` and that led to the bazel build to fail. I don't know why it makes the bazel build though.

Test Plan: OSS CI.

Reviewed By: ajyu

Differential Revision: D28950177

fbshipit-source-id: 9bb1c6619e6b68415a3349f04933c2fcd24cc9a2
2021-06-10 23:39:00 -07:00
Rong Rong (AI Infra)
91eb831422 Revert D28698997: [Static Runtime] Add schema check to aten ops
Test Plan: revert-hammer

Differential Revision:
D28698997 (10345010f7)

Original commit changeset: 232fc60c0321

fbshipit-source-id: e351df62779fea85b7afe5160d3c40c4e7cee4ed
2021-06-05 07:48:49 -07:00
Hao Lu
10345010f7 [Static Runtime] Add schema check to aten ops (#59426)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59426

Reviewed By: ajyu

Differential Revision: D28698997

fbshipit-source-id: 232fc60c0321b8e68e4f1b6705233485260c281d
2021-06-04 21:38:45 -07:00
Hao Lu
6627c00e63 [Static Runtime] Fix bug in quantized::linear wrapper (#59407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59407

Reviewed By: ajyu

Differential Revision: D28881307

fbshipit-source-id: 46c169f783cf05c585871c2e074d52255116b9c3
2021-06-03 19:18:04 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Raghavan Raman
e2467cc43e [NNC] Make splitWithTail transform in-place (#58268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427228

Pulled By: navahgar

fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f
2021-05-25 11:31:14 -07:00
Kurt Mohler
fe8e5eb260 Change native functions to take c10::string_view args instead of std::string (#57680)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57680

Reviewed By: malfet

Differential Revision: D28511799

Pulled By: ezyang

fbshipit-source-id: 43142f994d048b28b3279ccdb7a28cbaa3190973
2021-05-20 18:15:45 -07:00
Ansha Yu
bf1c936e06 [static runtime] out variant for full_like (#58079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58079

Support full_like

Test Plan:
`buck test mode/dev caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_FullLike`

Test on regenerated local inline_cvr model
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/dec_6x/266377643_shrunk.predictor.disagg.local.regenerated.pt --pt_inputs=/data/users/ansha/tmp/adfinder/dec_6x/local_inputs --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=0 --do_benchmark=1 --adsfinder_compatibility=1 --v=1
```

`V0511 10:59:57.187054 1911683 impl.cpp:1229] Switch to out variant for node: %5571 : Tensor = aten::full_like(%blob_for_shape.1, %235, %654, %75, %75, %75, %75)`

Reviewed By: hlu1

Differential Revision: D28361997

fbshipit-source-id: 89c41e37ce23d6008cfe4d80536832ee76d3405e
2021-05-20 16:17:40 -07:00
Hao Lu
1981904c8d [Static Runtime] Check input container type in aten::__getitem__ (#58639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58639

Fix two tests in `//caffe2/test:static_runtime` that were previously broken.

Reviewed By: ajyu, edvgha

Differential Revision: D28561185

fbshipit-source-id: 3cfb0960666c808523d65da267f70bd51e828313
2021-05-20 12:47:01 -07:00
Hao Lu
4d7abdbdad [Quant] Add out variant for int8 quantized::linear (#58282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58282

Reviewed By: ajyu

Differential Revision: D28428734

fbshipit-source-id: f25243cdbc220e59659605a3a29e2b161dd7c1f2
2021-05-19 00:24:23 -07:00
Freey0
401d0fe8c5 Port leaky_relu to structured (#57621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57621

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28224706

Pulled By: ezyang

fbshipit-source-id: 168b175d0fd9e0cc3335ea00df4c7967fea77819
2021-05-14 00:49:05 -07:00
Hao Lu
993a35a8cb [Static Runtime] Support clamp.Tensor (#58191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58191

There are two clamp overloads: clamp.Scalar and clamp.Tensor. SR needs to support both or has checks in place to avoid runtime errors. Supporting both is not too hard so here we are.

Reviewed By: edvgha

Differential Revision: D28371949

fbshipit-source-id: 0ec6b8a0b8c6277e50d8e51e4e7a45aa62211e22
2021-05-13 17:46:59 -07:00
Rong Rong (AI Infra)
002ce5c1df port addmm to structure kernel (#57417)
Summary:
Port addmm to structure kernel

Follow ups
- migrate `mm` and `addbmm` to structure
- move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57417

Reviewed By: bdhirsh

Differential Revision: D28291001

Pulled By: walterddr

fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
2021-05-13 08:33:42 -07:00
liuyuanqiang@bytedance
85d64648d3 Port threshold to structure (#57810)
Summary:
Related https://github.com/pytorch/pytorch/issues/55070
Port threshold and threshold_backward to structure

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57810

Reviewed By: agolynski

Differential Revision: D28382716

Pulled By: ezyang

fbshipit-source-id: 8d0702ad074b52e8512524d9807c93bfe04c51d6
2021-05-12 15:04:55 -07:00
Hao Lu
c3d40fdf56 [ATen] Use expect_contiguous in layer_norm (#58067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58067

- Use expect_contiguous in layer_norm to avoid unnecessary refcount bumps when the tensors are contiguous
- Clean up some leftovers from the hacky wrappers removal cleanup: use c10::MaybeOwned<Tensor> for bias tensors
- Skip dispatcher for at::empty in the layer_norm impl in Static Runtime

Test Plan: CI

Reviewed By: swolchok

Differential Revision: D28214298

fbshipit-source-id: 73150fa62d5c18f41a2264f8e56bbe5e377ad045
2021-05-11 22:56:32 -07:00
Hao Lu
32acc96f78 [Static Runtime] Fix bug in aten::clone (#58100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58100

aten::clone has a second arg, memory_format, which was not previously supported.

Reviewed By: ajyu

Differential Revision: D28347171

fbshipit-source-id: e083cc24c3228048429bba3497326415bc3d1f5a
2021-05-11 22:47:25 -07:00
Hao Lu
e9e125475e [Static Runtime] Add schema check to aten::repeat and fb::fast_gather (#58106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58106

Followup for D28047955 (1f83d8eec2).

Reviewed By: ajyu

Differential Revision: D28369472

fbshipit-source-id: 36aa10082589f4b6f0cc2d79f032fe72a19cda57
2021-05-11 22:07:21 -07:00
Hao Lu
1f83d8eec2 [Static Runtime] Return nullptr if the number of input args doesn't match (#58018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58018

- Add checks for the number of input args and return nullptr if it doesn't match. This is intended to make Static Runtime more robust so that op schema change is less likely to break things. Imagine that a new arg is added to an op or a new overload is added that has this added arg, SR would simply ignore this added arg. If this arg has a default value, SR would run the model with the default value and give you wrong results, which can be hard to track down.

Reviewed By: ajyu

Differential Revision: D28047955

fbshipit-source-id: 01067059edd5cfea80c4ee121829f7733b11f601
2021-05-11 16:30:45 -07:00
Edvard Ghazaryan
dd876120f9 Out version for aten::repeat (#57683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57683

Support aten::repeat for static runtime

Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: hlu1

Differential Revision: D27639482

fbshipit-source-id: e6e706cb1d52750eea74f19536245f0484e945e6
2021-05-11 13:21:58 -07:00
Hao Lu
8bbe383877 [Static Runtime] Fix bugs in logit (#57578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57578

The original impl in SR assumes that eps is a constant, which is true most of the times. However it could be a graph input as well. This diff fixes this issue. Unit tests are added as well.

Reviewed By: edvgha

Differential Revision: D28207975

fbshipit-source-id: 9a10dec159f3804e43ef74aaa20c3ec6c79548c9
2021-05-05 23:38:15 -07:00
Mikhail Zolotukhin
9e7814d539 Reland: [StaticRuntime] Use NNC's call_raw API to reduce call overheads. (#57553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57553

Relanding #57329 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195048

Pulled By: ZolotukhinM

fbshipit-source-id: 50052a2f20f84940b83d1dd1241c8659ff06e014
2021-05-05 09:11:38 -07:00
Hao Lu
5439977352 [Static Runtime] Revamp op schema check (#57521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57521

When an op is added to static runtime, we manually check the schema (not with the jit schema check, more with IValue.IsTensor()/IsInt() etc) and make sure it's the one we do support. If the schema doesn't match, SR would throw an exception with TORCH_CHECK, which makes the entire graph invalid for SR.

This diff tries to make the op with unsupported schema to use the fallback path and make it go through the dispatcher instead:

```
  if (node->kind() != prim::ListConstruct &&
      node->kind() != prim::TupleConstruct &&
      node->kind() != prim::DictConstruct && node->kind() != prim::ListUnpack) {
    const Operator& op = node->getOperator();
    TORCH_CHECK(op.hasOperation());
    op_ = op.getOperation(node);
    VLOG(1) << "Fallback interpreter for node: " << PrintNode(node);
  }
```

The 2-arg `torch.norm`, which the SR `torch.norm impl doesn't support (only 3, 4, 5 args are supported), now can run in static runtime with fallback mode.

(Note: this ignores all push blocking failures!)

Reviewed By: ajyu

Differential Revision: D27531447

fbshipit-source-id: 0a9c2662ac73ed0393a23cc3a2c7df45fdb00fdd
2021-05-04 02:48:04 -07:00
Mike Ruberry
3315f14280 Revert D28110358: [StaticRuntime] Use NNC's call_raw API to reduce call overheads.
Test Plan: revert-hammer

Differential Revision:
D28110358 (400ca7677c)

Original commit changeset: 94b87130a1ff

fbshipit-source-id: 246c0e54b02443c039105f48c4c419fe281150fc
2021-05-01 15:35:34 -07:00
Mikhail Zolotukhin
400ca7677c [StaticRuntime] Use NNC's call_raw API to reduce call overheads. (#57329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57329

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28110358

Pulled By: ZolotukhinM

fbshipit-source-id: 94b87130a1ffdb4acf171ddcea3895e8a75c34ac
2021-04-30 15:26:20 -07:00
Edvard Ghazaryan
e62cdae469 Static Runtime support for aten::matmul (#57291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57291

aten::matmul support for static runtime

Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_Binary_MatMul

Reviewed By: hlu1

Differential Revision: D28099671

fbshipit-source-id: 784035060c8c24953df47ca4227d2bca5094da22
2021-04-30 10:49:55 -07:00
Edvard Ghazaryan
b3e1802439 Static runtime support for fb::expand_dims (#57282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57282

Added support for fb::expand_dims for SR.

Test Plan:
buck test caffe2/torch/fb/sparsenn:gpu_test -- test_expand_dims

buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators

Reviewed By: hlu1

Differential Revision: D28043049

fbshipit-source-id: 01f59db7b507f027b220f044d6ff23602adbdb06
2021-04-29 22:40:56 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Ansha Yu
46321cb937 [static runtime] binding for aten::norm_out (#56636)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56636

Test Plan:
Test it runs on the aug_1x model, which has aten::norm, and verify jit/sr results
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```

```
Time per node type:
        1.53159 ms.    35.8619%. fb::sigrid_transforms_torch_bind (1 nodes)
         0.9481 ms.    22.1996%. aten::linear (6 nodes)
       0.704806 ms.    16.5029%. aten::argmin (1 nodes)
       0.252252 ms.    5.90643%. aten::matmul (1 nodes)
       0.140869 ms.    3.29842%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
       0.100014 ms.    2.34181%. fb::clip_ranges_gather (263 nodes)
      0.0880838 ms.    2.06247%. aten::sub (1 nodes)
      0.0553556 ms.    1.29614%. aten::repeat (1 nodes)
      0.0438464 ms.    1.02665%. aten::norm (1 nodes)
      0.0395956 ms.   0.927124%. fb::batch_box_cox (1 nodes)
       0.035834 ms.   0.839045%. aten::__getitem__ (506 nodes)
      0.0345233 ms.   0.808357%. prim::TupleUnpack (254 nodes)
      0.0316876 ms.   0.741959%. aten::sigmoid (2 nodes)
      0.0293246 ms.   0.686629%. aten::mul (3 nodes)
      0.0287696 ms.   0.673635%. fb::offsets_to_ranges (253 nodes)
      0.0242373 ms.   0.567511%. aten::pow (1 nodes)
      0.0224204 ms.    0.52497%. fb::simple_embedding_bag_sum (3 nodes)
      0.0200074 ms.   0.468469%. fb::casted_batch_one_hot_lengths (1 nodes)
      0.0190264 ms.   0.445499%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0167253 ms.    0.39162%. prim::TupleConstruct (1 nodes)
      0.0164962 ms.   0.386255%. aten::sum (3 nodes)
      0.0158986 ms.   0.372262%. prim::DictConstruct (2 nodes)
      0.0109372 ms.   0.256093%. aten::div (1 nodes)
     0.00910563 ms.   0.213207%. prim::ListConstruct (4 nodes)
     0.00876917 ms.   0.205328%. static_runtime::to_copy (8 nodes)
     0.00822567 ms.   0.192603%. fb::sigrid_hash_precompute (1 nodes)
     0.00622559 ms.   0.145771%. aten::contiguous (1 nodes)
     0.00460064 ms.   0.107723%. aten::narrow (4 nodes)
     0.00297164 ms.  0.0695804%. static_runtime::reshape_copy (2 nodes)
     0.00287099 ms.  0.0672237%. aten::logit (1 nodes)
     0.00277557 ms.  0.0649894%. aten::add (1 nodes)
     0.00264978 ms.  0.0620441%. aten::clamp_min (1 nodes)
     0.00215832 ms.  0.0505366%. aten::relu (1 nodes)
     0.00213779 ms.   0.050056%. fb::gather_ranges (4 nodes)
     0.00195846 ms.  0.0458571%. aten::full (1 nodes)
     0.00177333 ms.  0.0415222%. aten::stack (1 nodes)
     0.00147449 ms.   0.034525%. aten::size (3 nodes)
    0.000762524 ms.  0.0178544%. aten::expand_as (1 nodes)
    0.000757406 ms.  0.0177345%. fb::clip_ranges (2 nodes)
    0.000614798 ms.  0.0143954%. fb::lengths_to_offsets (3 nodes)
    0.000407952 ms. 0.00955212%. static_runtime::flatten_copy (1 nodes)
    0.000159918 ms. 0.00374445%. prim::device (1 nodes)
         4.2708 ms. in Total
StaticRuntime setup time: 0.000407 ms
Memory allocation time: 0.0089714 ms
Memory deallocation time: 0.0592135 ms
Outputs deallocation time: 0.0458097 ms
Total memory managed: 947328 bytes
Total number of reused tensors: 28
```

Reviewed By: hlu1

Differential Revision: D27922070

fbshipit-source-id: 538b39b7fff0638fc994b7983bf32d9e9f15d016
2021-04-28 08:44:10 -07:00
Edvard Ghazaryan
cea265b8d8 Support layer_norm for static runtime (#56444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56444

Added out version for layer_norm

Test Plan:
buck test caffe2/aten:math_kernel_test -- NativeLayerNorm

buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: hlu1

Differential Revision: D27873846

fbshipit-source-id: 53ee9fec4ff9a4e78198b031e86b5afd013626dd
2021-04-27 12:28:37 -07:00
Ansha Yu
e909ad2dc4 [static runtime] binding for aten::argmin_out (#56638)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56638

Test Plan:
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```

```
Time per node type:
        1.55901 ms.    35.3486%. fb::sigrid_transforms_torch_bind (1 nodes)
       0.986321 ms.    22.3636%. aten::linear (6 nodes)
       0.722277 ms.    16.3767%. aten::argmin (1 nodes)
       0.256231 ms.    5.80971%. aten::matmul (1 nodes)
       0.149653 ms.    3.39319%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
       0.105381 ms.    2.38938%. fb::clip_ranges_gather (263 nodes)
      0.0911405 ms.    2.06649%. aten::sub (1 nodes)
      0.0605429 ms.    1.37273%. aten::repeat (1 nodes)
      0.0456569 ms.    1.03521%. aten::norm (1 nodes)
      0.0421855 ms.   0.956501%. fb::batch_box_cox (1 nodes)
      0.0370142 ms.   0.839249%. aten::__getitem__ (506 nodes)
      0.0359091 ms.   0.814193%. prim::TupleUnpack (254 nodes)
      0.0338332 ms.   0.767123%. aten::sigmoid (2 nodes)
      0.0315159 ms.   0.714582%. aten::mul (3 nodes)
      0.0297553 ms.   0.674662%. fb::offsets_to_ranges (253 nodes)
      0.0279913 ms.   0.634666%. fb::simple_embedding_bag_sum (3 nodes)
      0.0233521 ms.   0.529478%. aten::pow (1 nodes)
       0.021296 ms.    0.48286%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0208991 ms.   0.473861%. fb::casted_batch_one_hot_lengths (1 nodes)
      0.0183163 ms.   0.415298%. aten::sum (3 nodes)
      0.0164318 ms.   0.372571%. prim::DictConstruct (2 nodes)
      0.0160191 ms.   0.363211%. prim::TupleConstruct (1 nodes)
      0.0126953 ms.   0.287849%. aten::div (1 nodes)
      0.0106084 ms.   0.240532%. static_runtime::to_copy (8 nodes)
      0.0092846 ms.   0.210516%. prim::ListConstruct (4 nodes)
     0.00916175 ms.   0.207731%. fb::sigrid_hash_precompute (1 nodes)
     0.00707015 ms.   0.160307%. aten::contiguous (1 nodes)
     0.00621954 ms.    0.14102%. aten::narrow (4 nodes)
     0.00302307 ms.  0.0685441%. aten::add (1 nodes)
     0.00290759 ms.  0.0659259%. aten::full (1 nodes)
     0.00283369 ms.  0.0642503%. aten::logit (1 nodes)
     0.00239244 ms.  0.0542455%. fb::gather_ranges (4 nodes)
     0.00220181 ms.  0.0499232%. aten::relu (1 nodes)
     0.00211563 ms.  0.0479691%. static_runtime::reshape_copy (2 nodes)
      0.0020059 ms.  0.0454812%. aten::stack (1 nodes)
     0.00186682 ms.  0.0423276%. aten::clamp_min (1 nodes)
     0.00172548 ms.   0.039123%. aten::size (3 nodes)
      0.0011853 ms.  0.0268751%. aten::expand_as (1 nodes)
    0.000881784 ms.  0.0199933%. fb::clip_ranges (2 nodes)
    0.000835602 ms.  0.0189462%. fb::lengths_to_offsets (3 nodes)
    0.000444376 ms.  0.0100757%. static_runtime::flatten_copy (1 nodes)
    0.000197078 ms. 0.00446848%. prim::device (1 nodes)
         4.4104 ms. in Total
StaticRuntime setup time: 0.000702 ms
Memory allocation time: 0.00943333 ms
Memory deallocation time: 0.062704 ms
Outputs deallocation time: 0.0477171 ms
Total memory managed: 831744 bytes
Total number of reused tensors: 31
W0421 14:53:04.841202 929500 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
W0421 14:53:04.841315 929500 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
I0421 14:53:04.841341 929500 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
I0421 14:53:04.971776 929500 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 130.423. Iters per second: 7.66736
I0421 14:53:05.122830 929500 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
```

Reviewed By: hlu1

Differential Revision: D27923172

fbshipit-source-id: 05cf5497fb6ac39dd3ff24f583607a3dff8cae95
2021-04-26 17:28:42 -07:00
Ansha Yu
0888b8726a [static runtime] binding for aten::clamp_min_out (#56635)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56635

Test Plan:
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=0 --adsfinder_compatibility=1
```

```
Time per node type:
        1.50885 ms.    36.0064%. fb::sigrid_transforms_torch_bind (1 nodes)
        0.92296 ms.    22.0251%. aten::linear (6 nodes)
       0.695455 ms.     16.596%. aten::argmin (1 nodes)
       0.237931 ms.    5.67787%. aten::matmul (1 nodes)
       0.141634 ms.    3.37989%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
      0.0925469 ms.     2.2085%. fb::clip_ranges_gather (263 nodes)
      0.0886556 ms.    2.11563%. aten::sub (1 nodes)
      0.0549624 ms.     1.3116%. aten::repeat (1 nodes)
       0.043996 ms.     1.0499%. aten::norm (1 nodes)
      0.0403472 ms.   0.962826%. fb::batch_box_cox (1 nodes)
      0.0371137 ms.   0.885664%. aten::sigmoid (2 nodes)
       0.035054 ms.   0.836512%. aten::__getitem__ (506 nodes)
      0.0338771 ms.   0.808427%. prim::TupleUnpack (254 nodes)
      0.0288516 ms.   0.688502%. aten::mul (3 nodes)
       0.026195 ms.   0.625106%. fb::offsets_to_ranges (253 nodes)
      0.0243627 ms.   0.581381%. aten::pow (1 nodes)
      0.0210347 ms.   0.501962%. fb::simple_embedding_bag_sum (3 nodes)
      0.0195358 ms.   0.466192%. fb::casted_batch_one_hot_lengths (1 nodes)
      0.0193484 ms.   0.461722%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0164265 ms.   0.391995%. aten::sum (3 nodes)
      0.0157266 ms.   0.375291%. prim::TupleConstruct (1 nodes)
      0.0156512 ms.   0.373493%. prim::DictConstruct (2 nodes)
      0.0114427 ms.   0.273062%. aten::div (1 nodes)
     0.00884876 ms.   0.211163%. static_runtime::to_copy (8 nodes)
     0.00864496 ms.   0.206299%. prim::ListConstruct (4 nodes)
     0.00803458 ms.   0.191734%. fb::sigrid_hash_precompute (1 nodes)
     0.00619933 ms.   0.147938%. aten::contiguous (1 nodes)
     0.00462827 ms.   0.110447%. aten::narrow (4 nodes)
     0.00293105 ms.  0.0699452%. aten::logit (1 nodes)
     0.00287083 ms.  0.0685082%. static_runtime::reshape_copy (2 nodes)
     0.00250605 ms.  0.0598032%. aten::add (1 nodes)
     0.00217015 ms.  0.0517875%. fb::gather_ranges (4 nodes)
     0.00202655 ms.  0.0483607%. aten::full (1 nodes)
     0.00200812 ms.  0.0479208%. aten::relu (1 nodes)
     0.00175433 ms.  0.0418644%. aten::stack (1 nodes)
     0.00174899 ms.   0.041737%. aten::clamp_min (1 nodes)
     0.00134367 ms.  0.0320646%. aten::size (3 nodes)
    0.000811416 ms.  0.0193633%. fb::clip_ranges (2 nodes)
    0.000801096 ms.   0.019117%. aten::expand_as (1 nodes)
    0.000541452 ms.   0.012921%. fb::lengths_to_offsets (3 nodes)
    0.000477838 ms.  0.0114029%. static_runtime::flatten_copy (1 nodes)
    0.000192906 ms. 0.00460342%. prim::device (1 nodes)
        4.19049 ms. in Total
StaticRuntime setup time: 0.000408 ms
Memory allocation time: 0.00895982 ms
Memory deallocation time: 0.0587527 ms
Outputs deallocation time: 0.0430985 ms
Total memory managed: 947328 bytes
Total number of reused tensors: 28
W0421 14:33:55.610956 836281 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
W0421 14:33:55.611043 836281 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
I0421 14:33:55.611063 836281 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
I0421 14:33:55.736069 836281 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 124.995. Iters per second: 8.0003
I0421 14:33:55.874794 836281 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
```

Reviewed By: hlu1

Differential Revision: D27922570

fbshipit-source-id: 095aa9bd0c425bc73eb48841653441d5c9e45744
2021-04-26 16:39:12 -07:00
Hao Lu
e810bed63f [Static Runtime] Clean up op implementations (#56841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56841

- Move arg checks to outside the lambda so we can perform these checks at Static Runtime initialization time
- use `optional` where possible
- support `to.other` overload, the 5-arg input load of `torch.to`.

Test Plan:
```
buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest
buck test mode/opt-clang //caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench_test -- --run-disabled
```

Reviewed By: edvgha

Differential Revision: D27933176

fbshipit-source-id: 49d6249c8784c44146461e286e7a301596172d7c
2021-04-26 15:37:39 -07:00
Ansha Yu
690c8b434f [static runtime] binding for aten::sub_out (#56656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56656

Test Plan:
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```
```
Time per node type:
        1.85766 ms.    35.7817%. fb::sigrid_transforms_torch_bind (1 nodes)
         1.1238 ms.    21.6464%. aten::linear (6 nodes)
       0.858116 ms.    16.5288%. aten::argmin (1 nodes)
       0.334183 ms.    6.43694%. aten::matmul (1 nodes)
       0.173697 ms.     3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
       0.118827 ms.    2.28881%. fb::clip_ranges_gather (263 nodes)
       0.101348 ms.    1.95215%. aten::sub (1 nodes)
      0.0748209 ms.    1.44118%. aten::repeat (1 nodes)
      0.0582576 ms.    1.12214%. aten::norm (1 nodes)
      0.0474353 ms.   0.913686%. fb::batch_box_cox (1 nodes)
      0.0457588 ms.   0.881393%. aten::__getitem__ (506 nodes)
      0.0435175 ms.   0.838222%. prim::TupleUnpack (254 nodes)
      0.0425416 ms.   0.819425%. aten::sigmoid (2 nodes)
      0.0383822 ms.   0.739308%. fb::offsets_to_ranges (253 nodes)
      0.0330187 ms.   0.635996%. aten::mul (3 nodes)
       0.027534 ms.   0.530352%. fb::simple_embedding_bag_sum (3 nodes)
      0.0274914 ms.   0.529532%. aten::pow (1 nodes)
      0.0236733 ms.   0.455989%. fb::casted_batch_one_hot_lengths (1 nodes)
       0.023348 ms.   0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes)
      0.0193511 ms.   0.372735%. aten::sum (3 nodes)
      0.0188839 ms.   0.363737%. prim::DictConstruct (2 nodes)
      0.0183191 ms.   0.352858%. prim::TupleConstruct (1 nodes)
      0.0119029 ms.    0.22927%. aten::div (1 nodes)
      0.0103263 ms.   0.198902%. static_runtime::to_copy (8 nodes)
     0.00977658 ms.   0.188314%. prim::ListConstruct (4 nodes)
     0.00924042 ms.   0.177986%. fb::sigrid_hash_precompute (1 nodes)
     0.00692162 ms.   0.133322%. aten::contiguous (1 nodes)
     0.00567485 ms.   0.109307%. aten::narrow (4 nodes)
     0.00362285 ms.  0.0697823%. aten::logit (1 nodes)
     0.00329995 ms.  0.0635627%. aten::add (1 nodes)
     0.00285633 ms.  0.0550178%. aten::full (1 nodes)
     0.00268469 ms.  0.0517118%. fb::gather_ranges (4 nodes)
     0.00248577 ms.  0.0478803%. aten::stack (1 nodes)
     0.00241782 ms.  0.0465715%. aten::relu (1 nodes)
     0.00233674 ms.  0.0450096%. aten::clamp_min (1 nodes)
     0.00222238 ms.  0.0428068%. static_runtime::reshape_copy (2 nodes)
     0.00171177 ms.  0.0329716%. aten::size (3 nodes)
     0.00120008 ms.  0.0231155%. aten::expand_as (1 nodes)
     0.00112628 ms.  0.0216942%. fb::clip_ranges (2 nodes)
     0.00103193 ms.  0.0198768%. fb::lengths_to_offsets (3 nodes)
    0.000598624 ms.  0.0115305%. static_runtime::flatten_copy (1 nodes)
    0.000236196 ms. 0.00454954%. prim::device (1 nodes)
        5.19164 ms. in Total
StaticRuntime setup time: 0.000868 ms
Memory allocation time: 0.0109619 ms
Memory deallocation time: 0.071791 ms
Outputs deallocation time: 0.0560187 ms
Total memory managed: 1232320 bytes
Total number of reused tensors: 32
W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661
I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
```

Reviewed By: hlu1

Differential Revision: D27929253

fbshipit-source-id: 5a7984ba3ce2d6d4bce0a0ab6c5e09e8c037b44e
2021-04-22 08:40:35 -07:00
Ansha Yu
81b59211d4 [static runtime] binding for aten::div_out (#56653)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56653

Test Plan:
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
```

```
Time per node type:
        1.48563 ms.    35.9861%. fb::sigrid_transforms_torch_bind (1 nodes)
        0.92385 ms.    22.3783%. aten::linear (6 nodes)
       0.681066 ms.    16.4974%. aten::argmin (1 nodes)
       0.239311 ms.    5.79679%. aten::matmul (1 nodes)
       0.140157 ms.    3.39501%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
      0.0951568 ms.    2.30497%. fb::clip_ranges_gather (263 nodes)
      0.0835801 ms.    2.02455%. aten::sub (1 nodes)
       0.054081 ms.       1.31%. aten::repeat (1 nodes)
      0.0424465 ms.    1.02818%. aten::norm (1 nodes)
      0.0389049 ms.   0.942389%. fb::batch_box_cox (1 nodes)
      0.0346992 ms.   0.840514%. aten::__getitem__ (506 nodes)
      0.0341335 ms.    0.82681%. prim::TupleUnpack (254 nodes)
      0.0306839 ms.   0.743252%. aten::sigmoid (2 nodes)
      0.0280489 ms.   0.679426%. aten::mul (3 nodes)
      0.0265321 ms.   0.642684%. fb::offsets_to_ranges (253 nodes)
      0.0207622 ms.    0.50292%. aten::pow (1 nodes)
      0.0202067 ms.   0.489465%. fb::simple_embedding_bag_sum (3 nodes)
      0.0195497 ms.    0.47355%. fb::casted_batch_one_hot_lengths (1 nodes)
      0.0184351 ms.   0.446551%. fb::concat_add_mul_replacenan_clip (1 nodes)
       0.016382 ms.    0.39682%. aten::sum (3 nodes)
      0.0158651 ms.   0.384299%. prim::TupleConstruct (1 nodes)
      0.0150918 ms.   0.365567%. prim::DictConstruct (2 nodes)
     0.00858005 ms.   0.207833%. aten::div (1 nodes)
     0.00810684 ms.   0.196371%. fb::sigrid_hash_precompute (1 nodes)
     0.00796325 ms.   0.192893%. static_runtime::to_copy (8 nodes)
     0.00782038 ms.   0.189432%. prim::ListConstruct (4 nodes)
      0.0057504 ms.   0.139291%. aten::contiguous (1 nodes)
      0.0044688 ms.   0.108247%. aten::narrow (4 nodes)
     0.00284054 ms.   0.068806%. aten::logit (1 nodes)
     0.00265049 ms.  0.0642024%. aten::add (1 nodes)
     0.00216242 ms.    0.05238%. aten::full (1 nodes)
     0.00207732 ms.  0.0503187%. aten::relu (1 nodes)
     0.00198412 ms.   0.048061%. fb::gather_ranges (4 nodes)
     0.00176954 ms.  0.0428632%. aten::stack (1 nodes)
     0.00175913 ms.  0.0426112%. static_runtime::reshape_copy (2 nodes)
      0.0016996 ms.  0.0411692%. aten::clamp_min (1 nodes)
     0.00128528 ms.  0.0311331%. aten::size (3 nodes)
    0.000849156 ms.   0.020569%. aten::expand_as (1 nodes)
    0.000757672 ms.   0.018353%. fb::clip_ranges (2 nodes)
    0.000596224 ms.  0.0144423%. fb::lengths_to_offsets (3 nodes)
    0.000442632 ms.  0.0107218%. static_runtime::flatten_copy (1 nodes)
    0.000196158 ms. 0.00475151%. prim::device (1 nodes)
        4.12833 ms. in Total
StaticRuntime setup time: 0.000451 ms
Memory allocation time: 0.0089336 ms
Memory deallocation time: 0.0578358 ms
Outputs deallocation time: 0.0431742 ms
Total memory managed: 947328 bytes
Total number of reused tensors: 31
W0421 16:56:34.220682 1522800 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
W0421 16:56:34.220772 1522800 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
I0421 16:56:34.220791 1522800 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
I0421 16:56:34.366667 1522800 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 145.863. Iters per second: 6.85573
I0421 16:56:34.514202 1522800 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
```

Reviewed By: hlu1

Differential Revision: D27927731

fbshipit-source-id: 595883a31ba0cadf6449799d47bf2294a1d05b41
2021-04-22 01:38:24 -07:00
Ansha Yu
7ae45403a1 [static runtime] support aten::__getitem__ natively (#55310)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55310

Test Plan:
Run on the dper generated local/local_ro model
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local_ro.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=1000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=0 --do_profile=0 --adsfinder_compatibility=1
```

Reviewed By: hlu1

Differential Revision: D27569662

fbshipit-source-id: df68c2fdd95e39a30aec35ddbaf1f5df0bc3a3da
2021-04-19 23:08:19 -07:00
Edward Yang
f17c9ea2ed Port all unary float functions to structured (#56082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56082

The native_functions.yaml changes were done by codemod using the
following script:

```
import ruamel.yaml
from ruamel.yaml.tokens import CommentToken
from ruamel.yaml.error import CommentMark
from tools.codegen.model import *  # noqa: F403

with open("aten/src/ATen/native/native_functions.yaml", "r") as f:
    contents = f.read()

yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.width = 1000
yaml.boolean_representation = ['False', 'True']
r = yaml.load(contents)

convert = '''\
acos
acosh
asin
asinh
atan
atanh
cos
cosh
digamma
erf
erfc
erfinv
exp
expm1
exp2
lgamma
log
log10
log1p
log2
reciprocal
sigmoid
sin
sinc
sinh
special_entr
sqrt
tan
tanh'''.split()

for e in r:
    f = NativeFunction.from_yaml(e, Location("", 0))
    if f.structured or f.structured_delegate is not None:
        continue
    n = f.func.name.name.base
    if n not in convert:
        continue
    # mutate e to make changes
    if f.func.kind() == SchemaKind.out:
        e.insert(1, 'structured', True)
        e.insert(2, 'structured_inherits', 'TensorIteratorBase')
    else:
        # TODO: The .out overload assumption is not sound in general
        e.insert(1, 'structured_delegate', f'{n}.out')

        e['dispatch'].pop('CPU', None)
        e['dispatch'].pop('CUDA', None)
        e['dispatch'].pop('CPU, CUDA', None)
        e['dispatch'].pop('CompositeExplicitAutograd', None)

        *_, last_k = e.keys()
        needs_fixup = False

        if not e['dispatch']:
            if last_k == 'dispatch':
                needs_fixup = True
            del e['dispatch']

        # Manually fix up newlines at the end, because ruamel
        # made some bad life choices about where to associate trailing
        # whitespace for nested dicts; see
        # https://stackoverflow.com/questions/42172399/modifying-yaml-using-ruamel-yaml-adds-extra-new-lines
        if needs_fixup:
            *_, last_k = e.keys()
            # post_key, pre_key, post_value, pre_value
            e.ca.items[last_k] = [None, None, CommentToken('\n\n', CommentMark(0), None), None]

with open("aten/src/ATen/native/native_functions.yaml.new", "w") as f:
    yaml.dump(r, f)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27777769

Pulled By: ezyang

fbshipit-source-id: 1ecbac7cb3e0093167bb61c7d2b1ecb95b8ae17c
2021-04-15 16:06:42 -07:00
CodemodService FBSourceClangFormatLinterBot
2f895f790a [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D27789747

fbshipit-source-id: ef4882e92d7755669083573c43ae6c5088bf01ab
2021-04-15 04:27:27 -07:00