Commit Graph

123 Commits

Author SHA1 Message Date
Hui Guo
7c4ac9e3ee [NNC] Fix loopnest.cache_accesses for reduce ops (fixed #59002) (#59136)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59136

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28768598

Pulled By: huiguoo

fbshipit-source-id: 99ab8430bc0ba395e2a041b03a7761de335ddda5
2021-06-03 21:04:14 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
CodemodService FBSourceClangFormatLinterBot
bbdc428db2 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28704311

fbshipit-source-id: f089266771c1ceba127116638a4dd87aa21e2e27
2021-05-26 03:19:49 -07:00
Raghavan Raman
dd7bbe1a63 [NNC] Make splitWithMask transform in-place (#58269)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427227

Pulled By: navahgar

fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba
2021-05-25 11:32:51 -07:00
Raghavan Raman
e2467cc43e [NNC] Make splitWithTail transform in-place (#58268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427228

Pulled By: navahgar

fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f
2021-05-25 11:31:14 -07:00
Raghavan Raman
4b859cbca1 [NNC] Do not optimize conditionals when the corresponding loop is not normalized (#57675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57675

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231375

Pulled By: navahgar

fbshipit-source-id: bcbcebca25577744c7190a0aa9fa376f76dea77d
2021-05-18 14:25:53 -07:00
Raghavan Raman
a71b99b50d [NNC] Add a method to check if a loop is normalized (#57674)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57674

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231377

Pulled By: navahgar

fbshipit-source-id: 3d92d532f1e1f78c9d94619980340622b73f99ec
2021-05-18 14:25:50 -07:00
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
Mikhail Zolotukhin
f51798d0dc [TensorExpr] Fix UB in LoopNest::distribute. (#57883)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57883

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28307300

Pulled By: ZolotukhinM

fbshipit-source-id: 5c35d50759904ed10c54e71b8bcb91572341f991
2021-05-07 22:08:19 -07:00
Raghavan Raman
e795f88d6b [NNC] Make flatten transform in-place (#56629)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56157

This PR updates the `flatten` API in `LoopNest` to perform the flattening transformation in-place. After this transformation, the first loop in the input becomes the flattened loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56629

Reviewed By: H-Huang

Differential Revision: D28004787

Pulled By: navahgar

fbshipit-source-id: 7474ae237fae3fff0cd1c64a276a8831dc5b7db0
2021-04-30 09:51:45 -07:00
Raghavan Raman
5b7317b562 [NNC] API for Buffer Compression (#55853)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54338

This PR adds the following API in NNC to implement "buffer compression".

```
static void compressBuffer(Buf* buf, Stmt* stmt);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853

Reviewed By: ezyang

Differential Revision: D27960986

Pulled By: navahgar

fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac
2021-04-23 14:12:03 -07:00
Hui Guo
29491f7954 [NNC] Add unroll and flatten APIs which not require return stmt pointer (#56420)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56420

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27866118

Pulled By: huiguoo

fbshipit-source-id: f7e44fb20ef3a3c43b95d15f7b3b12e9e5cc89c9
2021-04-22 19:59:34 -07:00
Raghavan Raman
d43d6593cd [NNC] Handling conditionals in reorderAxis (#56063)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53093

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063

Reviewed By: huiguoo

Differential Revision: D27894772

Pulled By: navahgar

fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84
2021-04-21 09:35:17 -07:00
Raghavan Raman
13ac0019ae [NNC] Update loop-carried dependence check to handle all known dependences (#56354)
Summary:
This PR includes:
 * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW.
 * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354

Reviewed By: bertmaher

Differential Revision: D27856202

Pulled By: navahgar

fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc
2021-04-20 17:12:51 -07:00
Raghavan Raman
0d94c04247 [NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56357

Changes the `fuseLoops` API to the following form:
```
static bool fuseLoops(const std::vector<For*>& loops, For** fused);
```

Also, adds a new API to check for loop-carried dependences:
```
static bool hasLoopCarriedDependence(For* loop);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353

Reviewed By: bertmaher

Differential Revision: D27856214

Pulled By: navahgar

fbshipit-source-id: 443557088692585657faee296602c547a00117dd
2021-04-19 17:20:40 -07:00
Raghavan Raman
b387f7ca47 [NNC] Make normalization transformation in-place (#56158)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/56157

This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one.

New API:

```
static bool normalize(For* f);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158

Reviewed By: agolynski

Differential Revision: D27798361

Pulled By: navahgar

fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b
2021-04-18 23:54:13 -07:00
Raghavan Raman
29c5cb797d [NNC] Fuse loops that have the same bounds as expressions (#55997)
Summary:
This PR allows fusing loops whose bounds are specified as expressions that are equal.

For example:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
   }
   for (int k = 0; k < M + N; k++) {
     B[k] = 20 * k;
   }
```
`fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr*` and will result in:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
     B[j] = 20 * j;
   }
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997

Reviewed By: bertmaher

Differential Revision: D27841270

Pulled By: navahgar

fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e
2021-04-18 11:14:26 -07:00
Mikhail Zolotukhin
1263448cb2 [TensorExpr] Remove mask field from Load and Store classes. (#55825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825

The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.

Differential Revision: D27717776

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin
b01a15d3d3 [TensorExpr] Redesign Rfactor loopnest transformation. (#55324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324

With this change `rfactor` only affects the passed loop and its body
never touching anything outside (that was a rootcause of a bug with the
previous implementation). Also, we don't have an `insertion_point`
parameter anymore - its meaning was vague, and the effect of it
should've been achievable with other transformations anyway.

The new `rfactor` semantics is as follows:

```
Requirements:
 * S is the reduction store
 * S is the only statement in the innermost loop
 * There is at least two reduction arguments in S
 * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable
 used in the store and all other reduction variables are index variables of
 children loops of OUTER_REDUCTION_FOR
 * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops
 corresponding to the other reduction variables and the store, nested into
 each other

What it does:
  * Introduce a new buffer with an extra dimension of a size equal to the
  span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via
  RFAC_BUF_PTR)
  * Insert an initialization store for the new buffer in
  OUTER_REDUCTION_FOR before its nested loop
  * Replace the reduction store to the original buffer with the reduction
  store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR
  from reduction arguments
  * Insert a final reduction store over the extra dimension of the new
  buffer to the original buffer
  * Returns TRUE if the transformation succeeded and FALSE otherwise

Example:
Original IR:
S1: for i        # normal axis
S2:   X[i] = 0
S3:   for j      # reduction axis
S4:     for k    # reduction axis
S5:       X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k})

After RFACTOR(S5, S3)
S1: for i               # normal axis
S2:   X[i] = 0
S3:   for j             # reduction axis for X, normal axis for X_rfac
        X_rfac[i,j] = 0
S4:     for k           # reduction axis
          X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k})
        X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j})
```

Differential Revision: D27694960

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c
2021-04-13 12:08:48 -07:00
Mikhail Zolotukhin
57f795c27b [TensorExpr] Remove unused LoopNest::hasLoopBodyFor method. (#55323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55323

Differential Revision: D27694961

Test Plan: Imported from OSS

Reviewed By: SplitInfinity, gmagogsfm

Pulled By: ZolotukhinM

fbshipit-source-id: 367ae212054c3516409a568facc19a19671df488
2021-04-13 12:07:31 -07:00
Raghavan Raman
d805908c34 [NNC] API to reorder multiple loops (#55568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52690

This PR adds the following APIs:

```
static bool areLoopsPerfectlyNested(const std::vector<For*>& loops);

static std::vector<For*> reorder(
      const std::vector<For*>& loops,
      const std::vector<size_t>& permutation);
```

The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568

Reviewed By: albanD

Differential Revision: D27689734

Pulled By: navahgar

fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6
2021-04-12 18:12:24 -07:00
Nikita Shulga
6a39613f35 [BE] Make torch/csrc/jit/tensorexpr/ clang-tidy clean (#55628)
Summary:
Mostly auto-generated changes using
```
 python3 tools/clang_tidy.py -c build -x torch/csrc/jit/tensorexpr/eval.cpp -s
```
With following common patterns manually fixed
- Use ` = default` instead of `{}`
- deleted methods should be public
- Use pass-by-value + std::move instead of pass-by-reference+copy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55628

Reviewed By: walterddr

Differential Revision: D27655378

Pulled By: malfet

fbshipit-source-id: 92be87a08113435d820711103ea9b0364182c71a
2021-04-08 19:44:14 -07:00
Mike Ruberry
c0ac0fef4e Revert D27448156: irange for size_t
Test Plan: revert-hammer

Differential Revision:
D27448156 (041b4431b2)

Original commit changeset: 585da57d4de9

fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365
2021-04-03 19:14:00 -07:00
Richard Barnes
041b4431b2 irange for size_t (#55163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27448156

fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1
2021-04-02 23:22:29 -07:00
Mikhail Zolotukhin
bdbfb2a035 [TensorExpr] Nuke BaseCallNode. (#54999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54999

BaseCallNode was used as a base class for Intrinsics and FunctionCall.
Now FunctionCall is gone, so BaseCallNode could be removed as well.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27446411

Pulled By: ZolotukhinM

fbshipit-source-id: be8ce06fbac72bfe355e5e3e1d2aa2267fae79fd
2021-04-01 19:48:02 -07:00
Mikhail Zolotukhin
0b75f862c7 [TensorExpr] Nuke FunctionCall. (#54998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998

The only reason why we couldn't use Load instead of FunctionCall was
DepTracker. Now this is gone and we finally could replace FunctionCall
with Load.

Test Plan: Imported from OSS

Reviewed By: bertmaher, pbelevich

Differential Revision: D27446412

Pulled By: ZolotukhinM

fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47
2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin
688e350725 [TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997

DepTracker was used to automatically pull in dependent computations from
output ones. While it seems quite convenient, it's led to several
architectural issues, which are fixed in this stack.

DepTracker worked on Tensors, which is a pair of Buf and Stmt. However,
Stmt could become stale and there was no way to reliably update the
corresponding tensor. We're now using Bufs and Stmts directly and moving
away from using Tensors to avoid these problems.

Removing DepTracker allowed to unify Loads and FunctionCalls, which
essentially were duplicates of each other.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27446414

Pulled By: ZolotukhinM

fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399
2021-04-01 19:46:26 -07:00
Hui Guo
967e59e557 [tensorexpr] Add sliceHead/sliceTail APIs with short parameter list (#55115)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55115

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27488754

Pulled By: huiguoo

fbshipit-source-id: d8a1b39ec891c80f6a9078768d692ac4ebeb5f79
2021-04-01 07:34:33 -07:00
Mikhail Zolotukhin
1ceb90405b [TensorExpr] Add plumbing for conv2d fusion. (#54439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54439

For now the only way to represent conv2d in TE is via an external call,
and since aten library doesn't have an out variant for conv2d, the
external call has to perform an extra copy. Because of that fusing
conv2d now regressed performance and hence is disabled. However, in near
future we should have two alternative ways to enable it:
1) represent conv2d natively in TE (without an external call)
2) add an out variant for conv2d

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27237045

Pulled By: ZolotukhinM

fbshipit-source-id: f5545ff711b75f9f37bc056316d1999a70043b4c
2021-03-24 18:49:07 -07:00
Raghavan Raman
601e79200d [NNC] Implementing LoopFusion (#54461)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54337

This PR adds a new API to NNC to perform loop fusion.

```
static For* fuseLoops(const std::vector<For*>& loops);
```

Loop fusion is done only when all the conditions below are satisfied.
  * All the loops have the same parent.
  * There are no statements between these loops in their parent body.
  * The start bounds are the same for all loops.
  * The stop bounds are the same for all loops.
  * Fusing the loops does not violate or add any dependencies.

This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461

Reviewed By: bertmaher

Differential Revision: D27254888

Pulled By: navahgar

fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e
2021-03-23 21:20:00 -07:00
Raghavan Raman
4b2abc4b8e [NNC] Adding API to distribute loops (#53865)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53864

This PR adds the following APIs that perform loop distribution to `LoopNest`:
```
static std::vector<For*> distributeLoop(For* loop, const std::unordered_set<Stmt*>& pivots);
static std::vector<For*> distributeLoop(For* loop);
static std::vector<For*> distributeLoopOverInnerLoops(For* loop);
```

* The first method distributes the given loop over its body by splitting after every given pivot stmt.
* The second method distributes the given loop over every stmt in its body.
* The last method distributes the given loop over its body by splitting after every `For` stmt in its body.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53865

Reviewed By: mruberry

Differential Revision: D27075006

Pulled By: navahgar

fbshipit-source-id: 031746aad619fe84c109e78b53387535e7f77cef
2021-03-18 07:27:39 -07:00
Raghavan Raman
ef07a04072 [NNC] New APIs to get loops corresponding to a Buf (#53778)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53092

This PR adds the following APIs to NNC.
```
// In For:
static For* getParentLoop(const Stmt* st);
static std::vector<For*> getEnclosingLoopNest(const Stmt* st);

// In LoopNest:
std::vector<const Stmt*> getAllWritesToBuf(const Buf*) const;
std::vector<For*> getAllInnermostLoopsWritingToBuf(const Buf*) const;
std::vector<std::vector<For*>> getAllLoopNestsWritingToBuf(const Buf*) const;
```

These APIs are required for some usecases that involve multiple transformations like `splitWithTail` followed by `reorder` as shown in https://github.com/pytorch/pytorch/issues/53092

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53778

Reviewed By: albanD

Differential Revision: D26987013

Pulled By: navahgar

fbshipit-source-id: 491459eddfff045132d2358631ad069bbcc520df
2021-03-12 18:50:15 -08:00
Horace He
d4602b7e45 [NNC] Fixes case where inlining wouldn't work because dim-size was 1. (#53254)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52581

The git diff is absolutely atrocious since I also refactored the code to share stuff between `Load` and `FunctionCall`.

Biggest questions I have about this diff are:

1. The asserts I added. From my understanding it's not possible to have a constant index in `Store` that's non-zero, since `Store` always creates a new buffer. Perhaps the user can write this kind of incorrect code, though, so perhaps I should just check for it and not assert it?

2. I don't think(?) I need to do any special handling for `index_vars`, but wasn't totally able to track the logic there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53254

Reviewed By: albanD

Differential Revision: D26991064

Pulled By: Chillee

fbshipit-source-id: 0bcd612d5f4b031c0b34e68a72d9c8d12d118be8
2021-03-11 20:53:20 -08:00
Raghavan Raman
a5e19126b6 [NNC] LoopNest cleanup (#53688)
Summary:
* Replacing vector of Tensors with a set of output buffers in `TensorExprKernel`.
* Creating a block statement while compiling in `TensorExprKernel`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53688

Reviewed By: mrshenli

Differential Revision: D26941222

Pulled By: navahgar

fbshipit-source-id: 9eb81ec2effcdeafbeaa67d1e12475166054f80f
2021-03-10 20:20:03 -08:00
Raghavan Raman
aae188c529 [NNC] Handle non literal constant bounds in Unroll. (#53029)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53000

Also added test to confirm this case works in FlattenLoop as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53029

Reviewed By: bertmaher

Differential Revision: D26742705

Pulled By: navahgar

fbshipit-source-id: d87a0f9698411026b5b6e55eee7c2b9fb123d06b
2021-03-02 00:35:27 -08:00
Mikhail Zolotukhin
e22da0a5c4 [TensorExpr] Add IRVerifier. (#52901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52901

This PR implements IR Verifier and adds a call to it in `LoopNest`
constructors. Checks that were in expr/stmt constructors before are now
moved to the corresponding `::make` functions or to the verifier. They
didn't really help from the constructors anyway since an exception
thrown from there led to a segfault due to the fact our memory
management works (object was not fully created but was registered in the
kernel arena for destruction anyway).

Fixes #52778.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26682928

Pulled By: ZolotukhinM

fbshipit-source-id: c56524015cdffb1ed8bce4394509961a4071dcfa
2021-03-01 20:38:00 -08:00
Mikhail Zolotukhin
88a160dc21 [TensorExpr] LoopNest: Cleanup LoopNest constructors. (#52726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52726

This change removes `input_bufs_` and `intermediate_bufs_` from
`LoopNest` class as they can be deduced from the root stmt and the list
of output bufs. As a result, the constuctor of the LoopNest also becomes
simpler as we now need to pass just one list of bufs.

Note: we might consider passing list of input bufs for verification
purposes (only inputs buffers are allowed to not have a definition), but
since we don't really have an IR verifier yet, there is no need in it
now. Once we add IR verifier, we could reconsider it.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26629596

Pulled By: ZolotukhinM

fbshipit-source-id: 81f544e9602b6855b7968d540b9ae06bd7c7e6d8
2021-02-24 13:26:22 -08:00
Mikhail Zolotukhin
64847c7f0b [TensorExpr] Properly handle ExternalCalls in LoadStore analysis and Inliner. (#52628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52628

Prior to this change ExternalCalls were not considered as Loads or
Stores to/from its buffers, which led to incorrect behavior in inlining.
This PR fixes it.

Differential Revision: D26589378

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: cd69d5f7075f6dc756aabcf676842b9a250334d6
2021-02-22 21:50:48 -08:00
Mikhail Zolotukhin
b63a1e31d3 [TensorExpr] Inlining: allow inlining into Load exprs. (#52627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52627

Currently inliner only inlines into Calls, this PR extends this to
 cover Loads too. Eventually we will remove Calls altogether and use
 Loads everywhere, this is one step in that direction.

Differential Revision: D26589377

Test Plan: Imported from OSS

Reviewed By: asuhan

Pulled By: ZolotukhinM

fbshipit-source-id: ca28f0df2273eb214f203467c6ba3d8f02a8a3b6
2021-02-22 21:47:24 -08:00
Hui Guo
973e306c84 changed TE 'Allocate' API to take one argument 'Buf' instead of three arguments 'Var', 'dtype', 'dims'. (#50167)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50167

Test Plan:
Imported from OSS

`python test/test_jit_fuser_te.py`
`python test/test_jit_fuser_legacy.py`
`python test/test_jit_fuser.py`
`build/bin/test_tensorexpr`

Reviewed By: ZolotukhinM

Differential Revision: D25814342

Pulled By: huiguoo

fbshipit-source-id: 44cba7f92365b826c9cb1d385a94858934570dee
2021-02-22 15:08:51 -08:00
Raghavan Raman
09c56ef45e Remove DepTracker from LoopNest (#52405)
Summary:
Remove the dependency tracker that works on Tensors, DepTracker, from LoopNest. This is essential to the goal of removing Tensors from LoopNest.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52405

Reviewed By: heitorschueroff

Differential Revision: D26548621

Pulled By: navahgar

fbshipit-source-id: b20f23d608c19ac71aebd31c14777d653eead36c
2021-02-22 12:48:07 -08:00
Bert Maher
ac121165e2 Remove ReduceOp::accumulator (#52196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52196

A reduction does not need to know the buffer into which its
result will be written.  This change gets us closer to being able to
create reductions inside Compute, where we have access to the tensor
axes.
ghstack-source-id: 121813071

Test Plan: test_tensorexpr

Reviewed By: ZolotukhinM

Differential Revision: D26420107

Pulled By: bertmaher

fbshipit-source-id: c8d8a99649adfd6de56fe53a728f5aa034a84f13
2021-02-17 23:36:23 -08:00
Bert Maher
a788c2d777 [nnc] Remove output_args from ReduceOp (#52187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52187

ReduceOp doesn't need to track the indices that its result will be written into.
ghstack-source-id: 121813075

Test Plan:
test_tensorexpr, tensorexpr_bench

Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D26420575

fbshipit-source-id: 7afcfa611515334e36de8039722011687f3b61e4
2021-02-17 23:36:18 -08:00
Bert Maher
62d5f60ad2 Avoid using ReduceOp->output_args() in rfactor (#52177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52177

I'm trying to get rid of `output_args` for reductions, because they
shouldn't be necessary; it's reducing over its reduction axis, why
does it need to know where its output is going?

Rfactor is probably the trickiest place where we use output_args, but
it looks like it's mostly just carrying around the location of the
store, so use that instead.
ghstack-source-id: 121813072

Test Plan:
build/bin/test_tensorexpr && build/bin/tensorexpr_bench

Imported from OSS

Reviewed By: navahgar

Differential Revision: D26420548

fbshipit-source-id: aeab564c6113fa02eabb14c9b70c7edfd05b264d
2021-02-17 23:36:13 -08:00
Bert Maher
ff73be7e45 [te] Introduce likely/unlikely CompareSelect hint (#51751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51751

Similar in spirit to the `__builtin_expect` C intrinsic, it's useful
to be able to hint the expected branch direction in a tensor expression.  Using
this flag has a few effects on codegen:

- The CompareSelect is generated using conditional branches, rather than selects
- The conditional branches are strongly hinted (like, 100000:1) in the indicated direction
- A vectorized hinted CompareSelect computes its condition in parallel with a
  mask "reduction" (e.g. a bitcast from `<i1 x 8>` to `<i*>`).  In AVX terms
  this sequence might look like:
```
vpcmpgtd %ymm0, %ymm1, %ymm2
vmovmskps %ymm2, %eax
```

The motivating case for this addition is an attempt I'm making to replicate
fast transcendentals using tensor expressions.  Floating-point numbers have
lots of special cases (denormals, inf, nan) that need special handling, and
it's convenient to be able to punt that handling off to a slow path while
keeping the fast path nice and tight.
ghstack-source-id: 121366315

Test Plan:
I'm not sure how to test this (except I can tell you it works for
the `log` implementation I'm working on right now).  It would be nice to plumb
the LLIR/ASM output through programmatically so it can be used in FileCheck.
Maybe I'll do that in another diff?

Reviewed By: asuhan

Differential Revision: D26246401

fbshipit-source-id: 900f7fa0520010fb9931d6e3efc8680a51f8d844
2021-02-10 02:09:07 -08:00
Bert Maher
c77fc2ee06 [nnc] Vectorize bitwise ops (#51492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51492

We missed these originally.  This helps vectorize log_fast.
ghstack-source-id: 120783427

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench
```

This might have made bench_approx faster but it could be noise.

Before:
```
----------------------------------------------------------------------------
Benchmark                     Time           CPU Iterations UserCounters...
----------------------------------------------------------------------------
log_nnc_fast/64             108 ns        108 ns    5576102 log/s=590.91M/s
log_nnc_fast/512            569 ns        569 ns    1230258 log/s=899.961M/s
log_nnc_fast/8192          8047 ns       8046 ns      89715 log/s=1018.08M/s
log_nnc_fast/32768        31066 ns      31065 ns      22368 log/s=1054.81M/s
logit_nnc_fast/64           149 ns        149 ns    4851520 logit/s=428.646M/s
logit_nnc_fast/512          980 ns        979 ns     712033 logit/s=522.742M/s
logit_nnc_fast/8192       13326 ns      13325 ns      51916 logit/s=614.805M/s
logit_nnc_fast/32768      54743 ns      54739 ns      12844 logit/s=598.624M/s
```

After:
```
----------------------------------------------------------------------------
Benchmark                     Time           CPU Iterations UserCounters...
----------------------------------------------------------------------------
log_nnc_fast/64             100 ns        100 ns    7012963 log/s=640.588M/s
log_nnc_fast/512            496 ns        496 ns    1415357 log/s=1032.26M/s
log_nnc_fast/8192          7600 ns       7595 ns      88258 log/s=1078.62M/s
log_nnc_fast/32768        30300 ns      30298 ns      22442 log/s=1081.52M/s
logit_nnc_fast/64           152 ns        152 ns    4505712 logit/s=420.279M/s
logit_nnc_fast/512          816 ns        816 ns     873834 logit/s=627.267M/s
logit_nnc_fast/8192       12090 ns      12088 ns      58234 logit/s=677.675M/s
logit_nnc_fast/32768      51576 ns      51531 ns      14645 logit/s=635.888M/s
```

Reviewed By: bwasti

Differential Revision: D26155792

fbshipit-source-id: 16724b419c944aa7d4389ae85838018455a5605f
2021-02-01 16:38:57 -08:00
Mikhail Zolotukhin
e975169426 [TensorExpr] Redesign Tensor class. (#50995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995

This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and
merges it with recently introduced 'CompoundTensor'. A statement for the
tensor is either passed directly to the Tensor constructor (akin to
'CompoundTensor'), or is built immediately in constructor.

LoopNest is no longer responsible for constructing statements from
tensors - it simply stitches already constructed statements contained in
Tensors. This has a side effect that now we cannot construct several
loopnests from the same tensors - we need to explicitly clone statements
if we want to do that. A special copy constructor was added to LoopNest
to make it more convenient (note: this only affects tests, we don't
usually create multiple loopnests in other places).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038223

Pulled By: ZolotukhinM

fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17
2021-01-27 16:14:22 -08:00
Mikhail Zolotukhin
b804084428 [TensorExpr] Move 'lowerToStmt' method from 'LoopNest' to 'Tensor'. (#50994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50994

Eventually, 'Tensor' will be fully responsible for its 'Stmt' and moving
this method to it is one step in that direction.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038222

Pulled By: ZolotukhinM

fbshipit-source-id: 0549f0ae6b46a93ff7608a22e79faa5115eef661
2021-01-27 16:14:18 -08:00
Mikhail Zolotukhin
42aeb68128 [TensorExpr] Move 'initializer' field from 'Tensor' to 'Buf'. (#50993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50993

This is the first step to make 'Tensor` a thin wrapper over 'Buf' and
'Stmt', which will be finished in subsequent PRs. This change also
allows to remove 'buf_initializers_' from 'LoopNest', making it "less
stateful".

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038224

Pulled By: ZolotukhinM

fbshipit-source-id: f418816e54c62f291fa45812901487394e9b95b5
2021-01-27 16:10:53 -08:00
Mikhail Zolotukhin
5f07b53ec2 [TensorExpr] Add LoopNest::simplify. (#50850)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50850

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D25985085

Pulled By: ZolotukhinM

fbshipit-source-id: e51709423c2c12b37b449a9d7bb22be04cda7ef1
2021-01-22 08:43:34 -08:00