Commit Graph

231 Commits

Author SHA1 Message Date
Priya Ramani
7b55dc8340 Use kernel_func_name from aotCompiler (#66337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31514095

Pulled By: priyaramani

fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
2021-10-23 02:20:45 -07:00
Mikhail Zolotukhin
60a2a295ce [TensorExpr] Use schema instead of op name in NNC lowerings. (#65843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65843

Fixes #64963.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31282334

Pulled By: ZolotukhinM

fbshipit-source-id: ffd0e1b6433d9360fedd9081c01ef41b21684439
2021-10-12 01:26:32 -07:00
Mikhail Zolotukhin
24b9b304d9 [TensorExpr] Nuke TE shape inference. (#65554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65554

We're relying on JIT based shape inference and not using the TE
implementation.

Question to the audience: we set `hasBroadcasts_` in that function, but
this function was almost never invoked. Do we behave correctly in the
presence of rand-calls and broadcasts?

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D31148925

Pulled By: ZolotukhinM

fbshipit-source-id: 2898a57e389ea0950163122089d0fec3d92701c4
2021-10-12 01:25:14 -07:00
Scott Wolchok
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
Mikhail Zolotukhin
765b6a90f3 [TensorExpr] Move lowerings registration from kernel.cpp to lowerings.cpp. (#65553)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65553

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148921

Pulled By: ZolotukhinM

fbshipit-source-id: 772062155043d4be9e9a25f6259b8e4a6cb762f4
2021-09-30 22:56:22 -07:00
Mikhail Zolotukhin
015e0079e3 [TensorExpr] Move 'compute*' functions to operators/... (#65552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65552

This PR is mostly a verbatim move of several functions to different
files. The goal is to have more consistency in what resides where.

With this PR:
* All `compute*` functions defining how a given operator needs to be
lowered to TE IR will reside in `operators/*.{cpp,h}`.
* Auxiliary functions for these functions will reside in
`operators/misc.cpp`. `compute*` functions for ops not belonging
anywhere else can also go to that file.
* `operators/unary.*` is renamed to `operators/pointwise.*` and now
includes functions like `computeTwoOperands`.
* `kernel.*` now contains *only JIT-related* logic and implementations of
`TensorExprKernel` methods.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148923

Pulled By: ZolotukhinM

fbshipit-source-id: e36ad8e779b8d30a33b49ea4ebf6d6a7438989f4
2021-09-30 22:56:20 -07:00
Mikhail Zolotukhin
3a0165da49 [TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551

Previously we had a big switch on Op kind to decide how to lower a given
JIT operator to NNC. This PR changes this switch to a hash table lookup.

Why? This helps us with at least two things:
1) With this approach we can easily check if we know how to handle a
given node in advance - i.e. we can inspect the entire graph and tell
whether it's possible to compile it or not without actually trying to do
that and dying in the middle. This would allow us to, say, provide
user-friendly error messages in AOT workflow.
2) We can switch to use schema instead of op kind to determine correct
lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963)
and using it instead of schema can lead to bugs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148926

Pulled By: ZolotukhinM

fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704
2021-09-30 22:56:18 -07:00
Mikhail Zolotukhin
eee9ad0fdd [TensorExpr] Add a skeleton for a registry of NNC lowerings. (#65550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65550

This PR adds the source files and the class for the registry, subsequent
PRs actually port existing lowerings to this mechanism.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31148922

Pulled By: ZolotukhinM

fbshipit-source-id: 4c087b22ee898d5a5a18a5d2a4bb795aa2ffd655
2021-09-30 22:56:16 -07:00
Mikhail Zolotukhin
d84191fcc6 [TensorExpr] Kernel: make prim::ConstantChunk handled like other ops. (#65549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65549

Previously it had a special handling, with this change it follows the
same mechanism as other ops.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148924

Pulled By: ZolotukhinM

fbshipit-source-id: 572d8ae5e123e7a0e2a656154d7bd0f73c785a06
2021-09-30 22:55:00 -07:00
Ivan Kobzarev
43d47bdcca [tensorexpr] conv2d handle optional bias (#64750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64750

conv2d bias is optional. It will be ArgNone in processing of the graph.
This bias is prim::constant NoneType, so we do not know shape at the moment of constant binding.

This adding it as a constant zeros Tensor at the moment of graph processing => for that adding `std::vector<TensorExprKernel::ConstantDescr>& constants and std::vector<at::Tensor>& constant_tensors` to `computeOperandValue` as  it is not in `TensorExprKernel`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30842101

Pulled By: IvanKobzarev

fbshipit-source-id: 88020f6934e43fe606f8eae928b7e21b7c3f15f6
2021-09-27 20:00:53 -07:00
Ivan Kobzarev
31ea4358d8 [tensorexpr] Add Op handling for mobilenetv3 large (#64741)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64741

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30839110

Pulled By: IvanKobzarev

fbshipit-source-id: d8e89c086c713fbe816dd8c8096cd64c05dc7431
2021-09-27 20:00:51 -07:00
Mikhail Zolotukhin
7e9c599784 [TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010

This pass ensures all names are legal and not-duplicated.

Fixes #52727.

Test Plan: Imported from OSS

Reviewed By: bertmaher, navahgar

Differential Revision: D30939717

Pulled By: ZolotukhinM

fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63
2021-09-15 17:15:06 -07:00
Mikhail Zolotukhin
f23f21dafe [TensorExpr] Remove 'Placeholder' class. (#64887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
2021-09-14 00:22:44 -07:00
Mikhail Zolotukhin
82ac3f108d [TensorExpr] Move 2 graph passes from kernel.cpp to graph_opt.cpp (#64828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828

Also, make `removeUnusedSelfArgument` more consistent with other passes
by mutating the graph in-place rather than returning a copy.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30870776

Pulled By: ZolotukhinM

fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9
2021-09-11 10:23:15 -07:00
Raghavan Raman
cad7a4b0ea [nnc] Added an implementation of sign op (#64033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30579197

Pulled By: navahgar

fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3
2021-09-10 16:49:04 -07:00
Mikhail Zolotukhin
a17d6c7f80 [TensorExpr] Simplify TE IR before applying any transformations. (#64717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717

This also exposed several bugs, which are fixed in this PR.

Differential Revision:
D30826408
D30826408

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560
2021-09-09 18:50:51 -07:00
Raghavan Raman
652a8bf7d0 [nnc] Updated indices during broadcast to use int64_t (#64627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627

This fixes the root cause of S242719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30801686

Pulled By: navahgar

fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
2021-09-09 08:29:37 -07:00
Hui Guo
5c27a580ec [tensorexpr] Allocate intermediate buffers at compile time (#64227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652220

Pulled By: huiguoo

fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e
2021-09-08 15:34:44 -07:00
Animesh Jain
18d24bb537 [NNC] Add Softplus operator (#64589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589

Adding softplus operator lowering for NNC. Enabling element wise fusion as well.

Test Plan: Added a test in test_jit_fuser.py

Reviewed By: bertmaher

Differential Revision: D30736449

fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93
2021-09-08 10:49:58 -07:00
Bert Maher
7f0feafa55 [nnc] Provide helpful error messages about turning off the fuser (#64516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64516

If fuser compilation fails due to a bug (which should be highly
unlikely at this point) we want to direct the user how to unblock themselves by
disabling fusion, in addition to requesting that they report a bug.
ghstack-source-id: 137398537

Test Plan: existing tests

Reviewed By: ZolotukhinM

Differential Revision: D30758051

fbshipit-source-id: 98be89f1b1d4fb3bc816f5b2634c618b9297930e
2021-09-08 08:10:22 -07:00
Hui Guo
9214450b7f [tensorexpr] Wrap error msgs with buildErrorMessages for internal asserts (#64409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64409

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30717786

Pulled By: huiguoo

fbshipit-source-id: a3b147d339ff4927f14efa24407cd3b63d80001d
2021-09-02 11:30:34 -07:00
Raghavan Raman
87d8ab6e50 [nnc] Updated generic error message with info about turning off the fuser (#64316)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64316

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30683942

Pulled By: navahgar

fbshipit-source-id: d86607563672213f99a1436dcf4f5dc28053b713
2021-09-01 10:31:50 -07:00
Bert Maher
e7fb35021a [nnc] Enable fusion of bfloat16 ops (#64196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643864

Pulled By: bertmaher

fbshipit-source-id: e95edeaf7089464d713ea1d1f951743d3e5f61c5
2021-08-30 20:09:36 -07:00
Raghavan Raman
093a12aaa9 [nnc] Updated internal asserts to include more detailed error messages (#64118)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64118

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30616944

Pulled By: navahgar

fbshipit-source-id: 35289696cc0e7faa01599304243b86f0febc6daf
2021-08-30 04:40:51 -07:00
Bert Maher
2e6221a232 [nnc] Make 64-bit dimensions work (#64077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077

We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272

Test Plan: unit tests; new IR level test with huge sizes

Reviewed By: ZolotukhinM

Differential Revision: D30596689

fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
2021-08-28 19:59:47 -07:00
Raghavan Raman
6d31ba6ddc [nnc] Sanitized the names of constants in the input graph. (#63990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923

The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990

Reviewed By: ZolotukhinM

Differential Revision: D30558432

Pulled By: navahgar

fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
2021-08-26 09:52:02 -07:00
Bert Maher
ba5f1b1076 [nnc] Fix dtype promotion involving scalars (#64002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64002

Fixes https://github.com/pytorch/vision/issues/4315

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30566979

Pulled By: bertmaher

fbshipit-source-id: eaa98b9534a926be7fcd337d46c5a0acb3243179
2021-08-26 09:43:15 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Mikhail Zolotukhin
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
Bert Maher
37d60c08e5 Revert D30360382: [nnc] Support thread level parallelism in fused kernels
Test Plan: revert-hammer

Differential Revision:
D30360382 (d6d86efb1c)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438
2021-08-21 03:46:43 -07:00
Bert Maher
d6d86efb1c [nnc] Support thread level parallelism in fused kernels (#63386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6
2021-08-20 11:18:17 -07:00
Mikhail Zolotukhin
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
Richard Barnes
d1f9c03cef Use const auto with irange (#62990)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62990

Test Plan: Sandcastle

Reviewed By: zhouzhuojie

Differential Revision: D30199748

fbshipit-source-id: 284b208ffa3c6c4749e5ac9b1fccb28914590f2c
2021-08-10 17:59:01 -07:00
Raghavan Raman
59dd12042e [nnc] Removed const from all fields in IR. (#62336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336

This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.

This is the first step in making all NNC mutations in-place.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30049829

Pulled By: navahgar

fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
2021-08-03 11:44:36 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Mike Guo
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
Mikhail Zolotukhin
3bfe15085d [TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804

The lowerings are stored as a map c10::Symbol -> std::function and the
signature of thoese functions match the signature of
`computeOperandValue`. Custom lowerings have higher priority over the
standard ones, i.e. we can redefine how a given op is lowered.

In general this feature is aimed at unblocking users whose models
contain ops that are not yet supported by NNC - it allows to quickly add
a custom lowering for a given op.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D29409580

Pulled By: ZolotukhinM

fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60
2021-06-27 15:27:22 -07:00
Raghavan Raman
d0c4ace00f [jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955318

Pulled By: navahgar

fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c
2021-06-18 14:32:07 -07:00
Mikhail Zolotukhin
d9e7df707b [TensorExpr] Add NNC lowerings for aten::mean, aten::addmm, and aten::adaptive_avg_pool2d. (#59347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347

We had external call wrappers for them, but they were not used in NNC.
This PR adds lowerings using these ext calls and fixes some bugs in
them.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28853832

Pulled By: ZolotukhinM

fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41
2021-06-18 11:56:32 -07:00
Mikhail Zolotukhin
c6bb9409b8 [TensorExpr] Handle not-specified dtypes and strides. (#59346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346

Currently JIT has a pass to propagate shapes, but doesn't have a
capability to fill in strides and dtypes. This PR works around that by
assuming default dtype to be Float and strides corresponding to
contiguous layout, unless otherwise specified. Ideally, we won't need
this, and this is done simply as a workaround unless the corresponding
features are implemented on JIT side.

This is required for AOT compilation of mobilenet v3 with NNC.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28853831

Pulled By: ZolotukhinM

fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496
2021-06-18 11:56:30 -07:00
Mikhail Zolotukhin
eb36f67dcc [TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041

Differential Revision:
D29146709
D29146709

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f
2021-06-17 01:23:24 -07:00
Bert Maher
842a831f53 [nnc] Move batchnorm to operators library (#59992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59992

Wrapped batch norm in function `computeBatchNorm`.
ghstack-source-id: 131407851

Test Plan: CI

Reviewed By: ZolotukhinM

Differential Revision: D29116661

fbshipit-source-id: 2873a9a3e70f31db1988787160fc96c388ea3d4a
2021-06-16 05:09:59 -07:00
Bert Maher
bda40639c5 [nnc] Move operator implementations into a subdirectory (#59988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59988

As we broaden operator support, putting all the implementations into
kernel.cpp is getting unwieldy.  Let's factor them out into the "operators"
subdirectory.

This diff is big but it's entirely code movement; I didn't change anything,
other than to expose a few utilities in kernel.h.
ghstack-source-id: 131405139

Test Plan: CI

Reviewed By: ZolotukhinM

Differential Revision: D29115916

fbshipit-source-id: ba0df1d8dd4a108b584da3baf168407e966b2c78
2021-06-16 05:08:50 -07:00
Richard Barnes
b162d95e46 Fix a number of lint perf and safety issues in torch (#59897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29037012

fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb
2021-06-15 13:14:51 -07:00
Raghavan Raman
20460b0c05 [nnc] Removed setBufferMap method from LoopNest (#59496)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59496

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28915958

Pulled By: navahgar

fbshipit-source-id: 71e649c93fc67b36c37373f043c729aa835968a0
2021-06-15 10:37:48 -07:00
Raghavan Raman
b822928e33 [nnc] Removed setGPUBlockIndex and setGPUThreadIndex methods from LoopNest (#59495)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59495

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28915960

Pulled By: navahgar

fbshipit-source-id: 20a4032b031aba6e43d85433ade5f0680c65fbc0
2021-06-15 10:37:46 -07:00
Raghavan Raman
aa163aeff5 [nnc] Made several LoopNest APIs static (#59494)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59494

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28915959

Pulled By: navahgar

fbshipit-source-id: bf52e30d893f4d86812219b538a14307f347f10b
2021-06-15 10:36:31 -07:00
Bert Maher
df759a3d9e [nnc] Do not fuse matmul/conv2d if inputs are discontiguous. (#59754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59754

Also, if inputs are contiguous, use their Placeholders
directly rather than generating contiguous Tensors from them.

The rationale for this change is that aten::matmul and aten::conv2d
support transposed inputs; if NNC generates a physical transpose to
perform an external call, performance will be strictly worse than not
fusing (sometimes dramatically so, as in the attached benchmark).

Test Plan: benchmark

Reviewed By: ZolotukhinM

Differential Revision: D29010209

fbshipit-source-id: da6d71b155c83e8d6e306089042b6b0af8f80900
2021-06-11 02:23:47 -07:00
Mikhail Zolotukhin
daa35141e8 Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508

An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28918342

Pulled By: ZolotukhinM

fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
2021-06-08 22:48:17 -07:00