Commit Graph

140 Commits

Author SHA1 Message Date
Mike Iovine
1385f9fb12 [JIT] Add variadic stack op (#63578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578

Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.

Most of the implementation/tests are the same as `prim::VarConcat`.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`

Reviewed By: navahgar

Differential Revision: D30426232

fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
2021-08-24 08:20:54 -07:00
Mikhail Zolotukhin
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
Don Jang
84890aae35 [Static Runtime] Add an out variant op for aten::abs (#63675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675

This change adds an out variant implementation for `aten::abs`.

Test Plan:
- Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)`

- Perf impact: TBD

Reviewed By: hlu1

Differential Revision: D30461317

fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191
2021-08-23 16:25:10 -07:00
Hao Lu
b2a601ffe5 [Static Runtime] Implement out variant for fb::quantized_linear (#63635)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635

Reviewed By: ajyu

Differential Revision: D30446234

fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030
2021-08-20 21:42:22 -07:00
Mikhail Zolotukhin
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
Yukio Siraichi
32b6104f37 Port norm kernel to structured kernels. (#62711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62711

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30109866

Pulled By: ezyang

fbshipit-source-id: 894c9496894d059c7690a174b75bbd4db7ed6016
2021-08-13 08:27:48 -07:00
Raghavan Raman
8b54b14f92 [Static Runtime] Added a cache for NNC generated code across different calls to the same ops (#62921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921

Added a cache for NNC generated code across different calls to the same ops.

Before this diff:
```
ProcessedNode time 13402.9 ms
Static Module initialization took 30964.8 ms
```

After this diff:
```
ProcessedNode time 85.4195 ms
Static Module initialization took 4348.42 ms
```

There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model.

Test Plan: Tested inline_cvr model

Reviewed By: hlu1

Differential Revision: D30104017

fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7
2021-08-09 09:30:07 -07:00
Yukio Siraichi
4c4c5b14e4 Port sum.dim_IntList kernel to structured kernels. (#61642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61642

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29783865

Pulled By: ezyang

fbshipit-source-id: 375d4cd5f915812108367601a610a428762e606d
2021-08-09 08:46:16 -07:00
Hao Lu
a27a0b1ef5 [SR] Disable NNC temporarily (#62746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62746

Disable NNC temporarily until a code cache is implemented to reduce the compilation time.

Reviewed By: ajyu

Differential Revision: D30080326

fbshipit-source-id: ef8bb3ac3a6947614f4a03a3d52774b6933d3ea8
2021-08-04 17:33:07 -07:00
Raghavan Raman
7b6d569a2b [jit] Renamed prim::Concat as prim::VarConcat (#61983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983

Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29828830

Pulled By: navahgar

fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee
2021-07-29 10:28:59 -07:00
Don Jang
68efa186cc [static runtime] Implement aten::full (#62227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227

Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path.

Reviewed By: hlu1

Differential Revision: D29923649

fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3
2021-07-28 09:50:27 -07:00
Mike Iovine
e1bee3eb30 [Static Runtime] Add missing unit tests for static runtime ops (#62238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238

Added tests for the following ops:

* `aten::mul`
* `aten::nan_to_num`
* `aten::stack`
* `aten::relu`
* `aten::tanh`

Reviewed By: hlu1

Differential Revision: D29914217

fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340
2021-07-27 14:12:21 -07:00
Mike Iovine
79eb8bb299 [Static Runtime] Enforce proper output dtype for many ops (re-land) (#62267)
Summary:
Re-land of D29935444
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.

This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = some_func(inputs)
} else {
  ...
  auto& out = p_node->Output(0);
  some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267

Reviewed By: ejguan

Differential Revision: D29937253

Pulled By: malfet

fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967
2021-07-27 08:54:09 -07:00
Erjia Guan
a3be2ecc3a Revert D29887367: [Static Runtime] Enforce proper output dtype for many ops
Test Plan: revert-hammer

Differential Revision:
D29887367 (f4136c5efc)

Original commit changeset: cef04bfa52ec

fbshipit-source-id: 32e89f2b6381930559dd746b535904c3e90fd52b
2021-07-27 07:29:09 -07:00
Mike Iovine
f4136c5efc [Static Runtime] Enforce proper output dtype for many ops
Summary:
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.

This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = some_func(inputs)
} else {
  ...
  auto& out = p_node->Output(0);
  some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D29887367

fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c
2021-07-26 13:27:02 -07:00
Hao Lu
78f7d8ccfa [Static Runtime] Remove wrappers for aten::cat (#62067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067

The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (ae58a4c45d) .
Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1.

Reviewed By: navahgar, mikeiovine

Differential Revision: D29864600

fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2
2021-07-23 20:33:41 -07:00
Meghan Lele
1d2ea76afb clamp: port to structured kernel (#61361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361

This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag.

For more information, see #55070.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29821533

Pulled By: SplitInfinity

fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99
2021-07-23 02:02:07 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Raghavan Raman
ae58a4c45d [Static Runtime] Added a variadic cat operator (#61302)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D29565344

Pulled By: navahgar

fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248
2021-07-21 15:58:20 -07:00
Mike Iovine
28150fd0c8 [static_runtime] Implement aten::linear (#61595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595

Add out variant wrapper for `aten::linear` in the static runtime

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D29684236

fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df
2021-07-16 08:55:43 -07:00
Hao Lu
a07b08136f [Static Runtime] Check unsupported up when enabling static runtime (#61613)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613

Reviewed By: ajyu, movefast1990

Differential Revision: D29663466

fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750
2021-07-14 02:13:51 -07:00
Hao Lu
ccd0977060 [Static Runtime] Support prim::GetAttr/SetAttr (#61505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505

The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module.

Reviewed By: ajyu

Differential Revision: D29350173

fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb
2021-07-10 14:06:06 -07:00
Don Jang
a74516d699 [static runtime] implement aten::log (#61393)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393

Test Plan:
Added `StaticRuntime.IndividualOps_Log`

```
...
[ RUN      ] StaticRuntime.IndividualOps_Log
V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1)
V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29511622

fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517
2021-07-08 18:25:35 -07:00
Don Jang
c2b0af2560 [static runtime] Implement aten::sign (#61154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154

Test Plan:
Added `StaticRuntime.IndividualOps_Sign`

```
[djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1
...
[ RUN      ] StaticRuntime.IndividualOps_Sign
V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1)
V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29518603

fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a
2021-07-07 12:29:25 -07:00
Mike Guo
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
Hao Lu
46595a9623 [Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090

Reviewed By: ajyu

Differential Revision: D29479860

fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b
2021-07-01 00:01:37 -07:00
Yukio Siraichi
b099f5429c Port argmin kernel to structured kernels. (#60364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29265855

Pulled By: ezyang

fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec
2021-06-29 14:16:59 -07:00
Bert Maher
ddb1f293b6 Fix the NNC-disabled path in static runtime for perf comparisons
Summary:
The path which has NNC/LLVM disabled still constructs a tensor
expression, even though `supports()` will always return false, so a
`KernelScope` is necessary to manage those memory allocations.

I guess we could avoid building the TEs at all in this case, but it's pretty
clean this way.

Test Plan:
```
scripts/bertrand/static_runtime/run.sh
```

Reviewed By: hlu1

Differential Revision: D29415909

fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30
2021-06-28 15:39:07 -07:00
Hao Lu
1e31d26b1d [Static Runtime] Fix bugs in static_runtime::to_copy (#60503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503

Fixed a few issues in the static_runtime::to_copy impl:
- fixed a bug with memory_format
- copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit.
- fix the schema in the `ReplaceWithCopy` pass
- add registration of `static_runtime::to_copy.other`

Add more unit tests:
- test dynamic shapes
- test strided input tensor to `aten::to`
- test alias case (same input/output)
- test `to.other`

Reviewed By: ajyu

Differential Revision: D26838933

fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522
2021-06-23 19:57:17 -07:00
Ansha Yu
0baad214b0 [static runtime][fix] resize to the input tensor size for full_like (#60229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229

Fix bug where we did not resize to the input tensor size, causing
the output to be incorrect

Test Plan:
Test on replayer, rebased on D29217781, with model 278203319_26.

Verify with jit outputs (D28583950)

`./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1`

Reviewed By: hlu1, movefast1990

Differential Revision: D29218918

fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409
2021-06-18 09:56:25 -07:00
Brian Hirsh
6b5e77904f Revert D29104396: Port argmin kernel to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104396 (226d745a0b)

Original commit changeset: 39c59bcc0446

fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b
2021-06-17 10:31:06 -07:00
Yukio Siraichi
226d745a0b Port argmin kernel to structured kernels. (#59938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104396

Pulled By: ezyang

fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7
2021-06-17 08:18:13 -07:00
Hao Lu
eda2ddb5b0 [ATen] Fix aten::to schema (#60001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60001

Fix the aten::to schema to reflect that the output may alias input.

Test Plan: Added new unit tests.

Reviewed By: ezyang

Differential Revision: D29121620

fbshipit-source-id: c29b6aa22d367ffedf06e47116bc46b3e188c39c
2021-06-15 20:04:20 -07:00
Hao Lu
cbd1e8c335 [Static Runtime] Fix bug in aten::to (#59995)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59995

Reviewed By: ajyu

Differential Revision: D29083106

fbshipit-source-id: 687ffb121af2716d606c145474942650a2d9ac7e
2021-06-14 22:54:43 -07:00
Hao Lu
2112074f25 [Static Runtime] Add schema check to several aten ops (#59603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59603

D28698997 (10345010f7) was reverted because I forgot to replace the
```
  VLOG(1) << "Found schema mismatch";
  n->schema().dump();
```
block in `aten::clamp_min` with `LogAndDumpSchema(n)` and that led to the bazel build to fail. I don't know why it makes the bazel build though.

Test Plan: OSS CI.

Reviewed By: ajyu

Differential Revision: D28950177

fbshipit-source-id: 9bb1c6619e6b68415a3349f04933c2fcd24cc9a2
2021-06-10 23:39:00 -07:00
Rong Rong (AI Infra)
91eb831422 Revert D28698997: [Static Runtime] Add schema check to aten ops
Test Plan: revert-hammer

Differential Revision:
D28698997 (10345010f7)

Original commit changeset: 232fc60c0321

fbshipit-source-id: e351df62779fea85b7afe5160d3c40c4e7cee4ed
2021-06-05 07:48:49 -07:00
Hao Lu
10345010f7 [Static Runtime] Add schema check to aten ops (#59426)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59426

Reviewed By: ajyu

Differential Revision: D28698997

fbshipit-source-id: 232fc60c0321b8e68e4f1b6705233485260c281d
2021-06-04 21:38:45 -07:00
Hao Lu
6627c00e63 [Static Runtime] Fix bug in quantized::linear wrapper (#59407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59407

Reviewed By: ajyu

Differential Revision: D28881307

fbshipit-source-id: 46c169f783cf05c585871c2e074d52255116b9c3
2021-06-03 19:18:04 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Raghavan Raman
e2467cc43e [NNC] Make splitWithTail transform in-place (#58268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427228

Pulled By: navahgar

fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f
2021-05-25 11:31:14 -07:00
Kurt Mohler
fe8e5eb260 Change native functions to take c10::string_view args instead of std::string (#57680)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57680

Reviewed By: malfet

Differential Revision: D28511799

Pulled By: ezyang

fbshipit-source-id: 43142f994d048b28b3279ccdb7a28cbaa3190973
2021-05-20 18:15:45 -07:00
Ansha Yu
bf1c936e06 [static runtime] out variant for full_like (#58079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58079

Support full_like

Test Plan:
`buck test mode/dev caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_FullLike`

Test on regenerated local inline_cvr model
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/dec_6x/266377643_shrunk.predictor.disagg.local.regenerated.pt --pt_inputs=/data/users/ansha/tmp/adfinder/dec_6x/local_inputs --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=0 --do_benchmark=1 --adsfinder_compatibility=1 --v=1
```

`V0511 10:59:57.187054 1911683 impl.cpp:1229] Switch to out variant for node: %5571 : Tensor = aten::full_like(%blob_for_shape.1, %235, %654, %75, %75, %75, %75)`

Reviewed By: hlu1

Differential Revision: D28361997

fbshipit-source-id: 89c41e37ce23d6008cfe4d80536832ee76d3405e
2021-05-20 16:17:40 -07:00
Hao Lu
1981904c8d [Static Runtime] Check input container type in aten::__getitem__ (#58639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58639

Fix two tests in `//caffe2/test:static_runtime` that were previously broken.

Reviewed By: ajyu, edvgha

Differential Revision: D28561185

fbshipit-source-id: 3cfb0960666c808523d65da267f70bd51e828313
2021-05-20 12:47:01 -07:00
Hao Lu
4d7abdbdad [Quant] Add out variant for int8 quantized::linear (#58282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58282

Reviewed By: ajyu

Differential Revision: D28428734

fbshipit-source-id: f25243cdbc220e59659605a3a29e2b161dd7c1f2
2021-05-19 00:24:23 -07:00
Freey0
401d0fe8c5 Port leaky_relu to structured (#57621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57621

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28224706

Pulled By: ezyang

fbshipit-source-id: 168b175d0fd9e0cc3335ea00df4c7967fea77819
2021-05-14 00:49:05 -07:00
Hao Lu
993a35a8cb [Static Runtime] Support clamp.Tensor (#58191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58191

There are two clamp overloads: clamp.Scalar and clamp.Tensor. SR needs to support both or has checks in place to avoid runtime errors. Supporting both is not too hard so here we are.

Reviewed By: edvgha

Differential Revision: D28371949

fbshipit-source-id: 0ec6b8a0b8c6277e50d8e51e4e7a45aa62211e22
2021-05-13 17:46:59 -07:00
Rong Rong (AI Infra)
002ce5c1df port addmm to structure kernel (#57417)
Summary:
Port addmm to structure kernel

Follow ups
- migrate `mm` and `addbmm` to structure
- move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57417

Reviewed By: bdhirsh

Differential Revision: D28291001

Pulled By: walterddr

fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
2021-05-13 08:33:42 -07:00
liuyuanqiang@bytedance
85d64648d3 Port threshold to structure (#57810)
Summary:
Related https://github.com/pytorch/pytorch/issues/55070
Port threshold and threshold_backward to structure

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57810

Reviewed By: agolynski

Differential Revision: D28382716

Pulled By: ezyang

fbshipit-source-id: 8d0702ad074b52e8512524d9807c93bfe04c51d6
2021-05-12 15:04:55 -07:00
Hao Lu
c3d40fdf56 [ATen] Use expect_contiguous in layer_norm (#58067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58067

- Use expect_contiguous in layer_norm to avoid unnecessary refcount bumps when the tensors are contiguous
- Clean up some leftovers from the hacky wrappers removal cleanup: use c10::MaybeOwned<Tensor> for bias tensors
- Skip dispatcher for at::empty in the layer_norm impl in Static Runtime

Test Plan: CI

Reviewed By: swolchok

Differential Revision: D28214298

fbshipit-source-id: 73150fa62d5c18f41a2264f8e56bbe5e377ad045
2021-05-11 22:56:32 -07:00