Commit Graph

82 Commits

Author SHA1 Message Date
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
CodemodService FBSourceClangFormatLinterBot
cbfce376a8 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28319469

fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988
2021-05-10 03:39:31 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Raghavan Raman
e795f88d6b [NNC] Make flatten transform in-place (#56629)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56157

This PR updates the `flatten` API in `LoopNest` to perform the flattening transformation in-place. After this transformation, the first loop in the input becomes the flattened loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56629

Reviewed By: H-Huang

Differential Revision: D28004787

Pulled By: navahgar

fbshipit-source-id: 7474ae237fae3fff0cd1c64a276a8831dc5b7db0
2021-04-30 09:51:45 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Raghavan Raman
5b7317b562 [NNC] API for Buffer Compression (#55853)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54338

This PR adds the following API in NNC to implement "buffer compression".

```
static void compressBuffer(Buf* buf, Stmt* stmt);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853

Reviewed By: ezyang

Differential Revision: D27960986

Pulled By: navahgar

fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac
2021-04-23 14:12:03 -07:00
Raghavan Raman
d43d6593cd [NNC] Handling conditionals in reorderAxis (#56063)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53093

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063

Reviewed By: huiguoo

Differential Revision: D27894772

Pulled By: navahgar

fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84
2021-04-21 09:35:17 -07:00
Raghavan Raman
13ac0019ae [NNC] Update loop-carried dependence check to handle all known dependences (#56354)
Summary:
This PR includes:
 * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW.
 * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354

Reviewed By: bertmaher

Differential Revision: D27856202

Pulled By: navahgar

fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc
2021-04-20 17:12:51 -07:00
Raghavan Raman
0d94c04247 [NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56357

Changes the `fuseLoops` API to the following form:
```
static bool fuseLoops(const std::vector<For*>& loops, For** fused);
```

Also, adds a new API to check for loop-carried dependences:
```
static bool hasLoopCarriedDependence(For* loop);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353

Reviewed By: bertmaher

Differential Revision: D27856214

Pulled By: navahgar

fbshipit-source-id: 443557088692585657faee296602c547a00117dd
2021-04-19 17:20:40 -07:00
Raghavan Raman
b387f7ca47 [NNC] Make normalization transformation in-place (#56158)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/56157

This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one.

New API:

```
static bool normalize(For* f);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158

Reviewed By: agolynski

Differential Revision: D27798361

Pulled By: navahgar

fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b
2021-04-18 23:54:13 -07:00
Raghavan Raman
29c5cb797d [NNC] Fuse loops that have the same bounds as expressions (#55997)
Summary:
This PR allows fusing loops whose bounds are specified as expressions that are equal.

For example:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
   }
   for (int k = 0; k < M + N; k++) {
     B[k] = 20 * k;
   }
```
`fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr*` and will result in:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
     B[j] = 20 * j;
   }
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997

Reviewed By: bertmaher

Differential Revision: D27841270

Pulled By: navahgar

fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e
2021-04-18 11:14:26 -07:00
Mikhail Zolotukhin
556dfcb0db [TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094

Now FunctionCalls are merged with Loads and vectorization for
intermediate values automatically started to work.

Fixes #53553.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27781519

Pulled By: ZolotukhinM

fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0
2021-04-14 21:39:03 -07:00
Mikhail Zolotukhin
7ab654afd7 [TensorExpr] Rename Tensor::call to Tensor::load to be consistent with Buf and Placeholder. (#55826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826

It's a mechanical change.

Differential Revision: D27717777

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51
2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin
1263448cb2 [TensorExpr] Remove mask field from Load and Store classes. (#55825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825

The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.

Differential Revision: D27717776

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
2021-04-13 12:08:51 -07:00
Raghavan Raman
d805908c34 [NNC] API to reorder multiple loops (#55568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52690

This PR adds the following APIs:

```
static bool areLoopsPerfectlyNested(const std::vector<For*>& loops);

static std::vector<For*> reorder(
      const std::vector<For*>& loops,
      const std::vector<size_t>& permutation);
```

The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568

Reviewed By: albanD

Differential Revision: D27689734

Pulled By: navahgar

fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6
2021-04-12 18:12:24 -07:00
Jeffrey Wan
3f9492c8b3 [Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/55203

Fixes issues (1) and (2) in the following tests:
tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512

Reviewed By: mrshenli

Differential Revision: D27630679

Pulled By: soulitzer

fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80
2021-04-08 08:34:25 -07:00
Brian Hirsh
dd2bccafc5 nnc hackathon - use new APIs in tests (#55497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497

Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203

I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27628625

Pulled By: bdhirsh

fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d
2021-04-07 13:03:25 -07:00
Mikhail Zolotukhin
0b75f862c7 [TensorExpr] Nuke FunctionCall. (#54998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998

The only reason why we couldn't use Load instead of FunctionCall was
DepTracker. Now this is gone and we finally could replace FunctionCall
with Load.

Test Plan: Imported from OSS

Reviewed By: bertmaher, pbelevich

Differential Revision: D27446412

Pulled By: ZolotukhinM

fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47
2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin
688e350725 [TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997

DepTracker was used to automatically pull in dependent computations from
output ones. While it seems quite convenient, it's led to several
architectural issues, which are fixed in this stack.

DepTracker worked on Tensors, which is a pair of Buf and Stmt. However,
Stmt could become stale and there was no way to reliably update the
corresponding tensor. We're now using Bufs and Stmts directly and moving
away from using Tensors to avoid these problems.

Removing DepTracker allowed to unify Loads and FunctionCalls, which
essentially were duplicates of each other.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27446414

Pulled By: ZolotukhinM

fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399
2021-04-01 19:46:26 -07:00
Bert Maher
e4d19798f3 [nnc][tests] Convert a bunch of FileCheck to checkIR
Summary:
I added a helper to convert a Stmt to string and FileCheck it, so
started using it in a bunch of places.  I replaced about half the current uses,
got tired, started to write a Perl script to automate it, realized that was
hard, and decided to give up for a bit.  But this cleans up some of the tests a
bit, so seems easy to review and worth landing.

Test Plan: test_tensorexpr --gtest_filter=LoopNest.*

Reviewed By: navahgar

Differential Revision: D27375866

fbshipit-source-id: 15894b9089dec5cf25f340fe17e6e54546a64257
2021-03-26 20:27:50 -07:00
Bert Maher
24f589df44 [nnc] Disabled test case for failure in implementing conv1d (#54756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54756

We have multiple bugs here, one relating to index flattening and the
other to computeAt.
ghstack-source-id: 125054729

Test Plan: yikes

Reviewed By: ZolotukhinM

Differential Revision: D27354082

fbshipit-source-id: 8b15bac28e3eba4629881ae0f3bd143636f65ad7
2021-03-26 20:27:48 -07:00
Bert Maher
e542e67253 [nnc] Test case for computeAt with reduction (#54755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54755

As title.  A step on the way to using computeAt to optimize
convolution.
ghstack-source-id: 125054730

Test Plan: new test

Reviewed By: ZolotukhinM

Differential Revision: D27353663

fbshipit-source-id: 930e09d96d1f74169bf148cd30fc195c6759a3e9
2021-03-26 20:25:18 -07:00
Raghavan Raman
601e79200d [NNC] Implementing LoopFusion (#54461)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54337

This PR adds a new API to NNC to perform loop fusion.

```
static For* fuseLoops(const std::vector<For*>& loops);
```

Loop fusion is done only when all the conditions below are satisfied.
  * All the loops have the same parent.
  * There are no statements between these loops in their parent body.
  * The start bounds are the same for all loops.
  * The stop bounds are the same for all loops.
  * Fusing the loops does not violate or add any dependencies.

This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461

Reviewed By: bertmaher

Differential Revision: D27254888

Pulled By: navahgar

fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e
2021-03-23 21:20:00 -07:00
Raghavan Raman
4b2abc4b8e [NNC] Adding API to distribute loops (#53865)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53864

This PR adds the following APIs that perform loop distribution to `LoopNest`:
```
static std::vector<For*> distributeLoop(For* loop, const std::unordered_set<Stmt*>& pivots);
static std::vector<For*> distributeLoop(For* loop);
static std::vector<For*> distributeLoopOverInnerLoops(For* loop);
```

* The first method distributes the given loop over its body by splitting after every given pivot stmt.
* The second method distributes the given loop over every stmt in its body.
* The last method distributes the given loop over its body by splitting after every `For` stmt in its body.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53865

Reviewed By: mruberry

Differential Revision: D27075006

Pulled By: navahgar

fbshipit-source-id: 031746aad619fe84c109e78b53387535e7f77cef
2021-03-18 07:27:39 -07:00
Bert Maher
a852fdb6b5 [nnc] Test for using int64 dimensions (#54094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54094

We should be able to use 64-bit integers for loop boundaries and
buffer/tensor indexing.
ghstack-source-id: 124116846

Test Plan: New tests, disabled

Reviewed By: ZolotukhinM

Differential Revision: D27094934

fbshipit-source-id: a53de21a0ef523ea3560d5dd4707df50624896ef
2021-03-17 10:59:26 -07:00
Raghavan Raman
ef07a04072 [NNC] New APIs to get loops corresponding to a Buf (#53778)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53092

This PR adds the following APIs to NNC.
```
// In For:
static For* getParentLoop(const Stmt* st);
static std::vector<For*> getEnclosingLoopNest(const Stmt* st);

// In LoopNest:
std::vector<const Stmt*> getAllWritesToBuf(const Buf*) const;
std::vector<For*> getAllInnermostLoopsWritingToBuf(const Buf*) const;
std::vector<std::vector<For*>> getAllLoopNestsWritingToBuf(const Buf*) const;
```

These APIs are required for some usecases that involve multiple transformations like `splitWithTail` followed by `reorder` as shown in https://github.com/pytorch/pytorch/issues/53092

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53778

Reviewed By: albanD

Differential Revision: D26987013

Pulled By: navahgar

fbshipit-source-id: 491459eddfff045132d2358631ad069bbcc520df
2021-03-12 18:50:15 -08:00
Horace He
d4602b7e45 [NNC] Fixes case where inlining wouldn't work because dim-size was 1. (#53254)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52581

The git diff is absolutely atrocious since I also refactored the code to share stuff between `Load` and `FunctionCall`.

Biggest questions I have about this diff are:

1. The asserts I added. From my understanding it's not possible to have a constant index in `Store` that's non-zero, since `Store` always creates a new buffer. Perhaps the user can write this kind of incorrect code, though, so perhaps I should just check for it and not assert it?

2. I don't think(?) I need to do any special handling for `index_vars`, but wasn't totally able to track the logic there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53254

Reviewed By: albanD

Differential Revision: D26991064

Pulled By: Chillee

fbshipit-source-id: 0bcd612d5f4b031c0b34e68a72d9c8d12d118be8
2021-03-11 20:53:20 -08:00
Bert Maher
3bd250fd03 [nnc] Test ability to vectorize reads from an intermediate tensor (#53752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53752

This test doesn't work today because we don't properly vectorize
"FunctionCall" (which is the way one accesses an intermediate tensor).
ghstack-source-id: 123592860

Test Plan: `buck test //caffe2/test/cpp/tensorexpr -- LoopNest.VectorizeUse`

Reviewed By: ZolotukhinM

Differential Revision: D26895550

fbshipit-source-id: 0798ebf3e6a834bd70181732c81528455d5329fa
2021-03-10 20:32:10 -08:00
Bert Maher
565d8235e5 [nnc] Test cases for uneven split + reorder (#53091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53091

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds
ghstack-source-id: 122870733

Test Plan: LoopNest.ColReduceSplit*

Reviewed By: navahgar

Differential Revision: D26746254

fbshipit-source-id: f8a0de18531b34d2bf06ccaa35d9c98b81b5c600
2021-03-02 20:36:48 -08:00
Mikhail Zolotukhin
aba33b0042 [TensorExpr] IRVerifier: add index verifier for Store. (#53137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53137

Also, add casting to Int for Load and Store indices.

Fixes #52773.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26760256

Pulled By: ZolotukhinM

fbshipit-source-id: a2d3141b17584724a5feabcabec25d0577b83a30
2021-03-02 19:56:28 -08:00
Raghavan Raman
aae188c529 [NNC] Handle non literal constant bounds in Unroll. (#53029)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53000

Also added test to confirm this case works in FlattenLoop as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53029

Reviewed By: bertmaher

Differential Revision: D26742705

Pulled By: navahgar

fbshipit-source-id: d87a0f9698411026b5b6e55eee7c2b9fb123d06b
2021-03-02 00:35:27 -08:00
Mikhail Zolotukhin
d3b427a0e3 [TensorExpr] Add an unmasked Load constructor. (#52790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52790

Fixes #52774.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26649542

Pulled By: ZolotukhinM

fbshipit-source-id: ab1c9e55f52e59d0bd00fbde2ec3125f8c7917ee
2021-02-24 22:45:29 -08:00
Mikhail Zolotukhin
88a160dc21 [TensorExpr] LoopNest: Cleanup LoopNest constructors. (#52726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52726

This change removes `input_bufs_` and `intermediate_bufs_` from
`LoopNest` class as they can be deduced from the root stmt and the list
of output bufs. As a result, the constuctor of the LoopNest also becomes
simpler as we now need to pass just one list of bufs.

Note: we might consider passing list of input bufs for verification
purposes (only inputs buffers are allowed to not have a definition), but
since we don't really have an IR verifier yet, there is no need in it
now. Once we add IR verifier, we could reconsider it.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26629596

Pulled By: ZolotukhinM

fbshipit-source-id: 81f544e9602b6855b7968d540b9ae06bd7c7e6d8
2021-02-24 13:26:22 -08:00
Mikhail Zolotukhin
b63a1e31d3 [TensorExpr] Inlining: allow inlining into Load exprs. (#52627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52627

Currently inliner only inlines into Calls, this PR extends this to
 cover Loads too. Eventually we will remove Calls altogether and use
 Loads everywhere, this is one step in that direction.

Differential Revision: D26589377

Test Plan: Imported from OSS

Reviewed By: asuhan

Pulled By: ZolotukhinM

fbshipit-source-id: ca28f0df2273eb214f203467c6ba3d8f02a8a3b6
2021-02-22 21:47:24 -08:00
Hui Guo
973e306c84 changed TE 'Allocate' API to take one argument 'Buf' instead of three arguments 'Var', 'dtype', 'dims'. (#50167)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50167

Test Plan:
Imported from OSS

`python test/test_jit_fuser_te.py`
`python test/test_jit_fuser_legacy.py`
`python test/test_jit_fuser.py`
`build/bin/test_tensorexpr`

Reviewed By: ZolotukhinM

Differential Revision: D25814342

Pulled By: huiguoo

fbshipit-source-id: 44cba7f92365b826c9cb1d385a94858934570dee
2021-02-22 15:08:51 -08:00
Mikhail Zolotukhin
e975169426 [TensorExpr] Redesign Tensor class. (#50995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995

This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and
merges it with recently introduced 'CompoundTensor'. A statement for the
tensor is either passed directly to the Tensor constructor (akin to
'CompoundTensor'), or is built immediately in constructor.

LoopNest is no longer responsible for constructing statements from
tensors - it simply stitches already constructed statements contained in
Tensors. This has a side effect that now we cannot construct several
loopnests from the same tensors - we need to explicitly clone statements
if we want to do that. A special copy constructor was added to LoopNest
to make it more convenient (note: this only affects tests, we don't
usually create multiple loopnests in other places).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038223

Pulled By: ZolotukhinM

fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17
2021-01-27 16:14:22 -08:00
Mikhail Zolotukhin
42aeb68128 [TensorExpr] Move 'initializer' field from 'Tensor' to 'Buf'. (#50993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50993

This is the first step to make 'Tensor` a thin wrapper over 'Buf' and
'Stmt', which will be finished in subsequent PRs. This change also
allows to remove 'buf_initializers_' from 'LoopNest', making it "less
stateful".

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038224

Pulled By: ZolotukhinM

fbshipit-source-id: f418816e54c62f291fa45812901487394e9b95b5
2021-01-27 16:10:53 -08:00
Andres Suarez
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
Elias Ellison
efe1fc21fc Dont inlinine intermediates on cpu (#49565)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565

Test Plan: Imported from OSS

Reviewed By: Krovatkin, ZolotukhinM

Differential Revision: D25688271

Pulled By: eellison

fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688
2021-01-04 15:46:20 -08:00
Mikhail Zolotukhin
a5b27d7a31 [TensorExpr] Move SimpleIREval implementation from .h to .cpp. (#49697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697

Mostly mechanical move. This refactoring helps to hide unnecessary
details from the SimpleIREval interface and make it more similar to a
pure 'codegen'.

Test Plan: Imported from OSS

Reviewed By: nickgg

Differential Revision: D25668696

Pulled By: ZolotukhinM

fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c
2020-12-21 20:20:15 -08:00
Nick Gibson
db2e9c1e7f [NNC] Intermediate allocs flattened and dependency support (#49554)
Summary:
Makes two changes in NNC for intermediate buffer allocations:
1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages.
2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff).

I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49554

Reviewed By: VitalyFedyunin

Differential Revision: D25643133

Pulled By: nickgg

fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c
2020-12-21 10:35:15 -08:00
Elias Ellison
9056173acc [NNC] Dont inline outputs buffers on cpu (#49488)
Summary:
In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining,  which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in  perf slowdown.

The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488

Reviewed By: ezyang

Differential Revision: D25596071

Pulled By: eellison

fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd
2020-12-16 16:28:25 -08:00
Nick Gibson
5469aa5e7f [NNC] Add a non functional Tensor kind (#48750)
Summary:
Adds the CompoundTensor, a specialisation of the NNC Tensor which allows arbitrary production statements. This will allow lowering of aten ops into specific NNC IR patterns (which don't need to be functional) - allowing us to shortcut to the optimized form of common patterns.

This is part 1 of trying to clean up the lowering of aten::cat so it is easier to optimize.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48750

Reviewed By: tugsbayasgalan

Differential Revision: D25433517

Pulled By: nickgg

fbshipit-source-id: de13c4719f8f87619ab254e5f324f13b5be1c9da
2020-12-10 19:43:50 -08:00
Nick Gibson
c5bc6b40ab [NNC] Dead Store Elimination (#49030)
Summary:
Adds a new optimization method to LoopNest which eliminates stores that do not contribute to any output. It's unlikely any of the lowerings of aten operators produce these stores yet, but this creates some wiggle room for transformations in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49030

Reviewed By: tugsbayasgalan

Differential Revision: D25434538

Pulled By: nickgg

fbshipit-source-id: fa1ead82e6f7440cc783c6116b23d0b7a5b5db4b
2020-12-09 18:49:53 -08:00
Mikhail Zolotukhin
2b70bcd014 [TensorExpr] Enable inlining for output tensors too. (#48967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967

We previously didn't inline output tensors which resulted in correctness
issues like #48533. This PR allows inlining for output tensors too -
this could result in duplicated computations, but we can address that
later once correctness is ensured.

Performance results on FastRNNS:
Before the fix:
```
Benchmarking LSTMs...
            name          avg_fwd          std_fwd          avg_bwd          std_bwd
           cudnn            10.09          0.05431            17.55           0.2108
            aten            21.52           0.1276             26.7            1.471
             jit            13.25           0.8748            22.47             1.73
      jit_premul            11.43           0.3226            19.43            2.231
 jit_premul_bias            11.84           0.2245            20.33            2.205
      jit_simple            13.27           0.9906            22.15           0.9724
  jit_multilayer            13.38           0.8748            22.82             1.01
              py            33.55            4.837            46.41            6.333
```
After the fix:
```
Benchmarking LSTMs...
            name          avg_fwd          std_fwd          avg_bwd          std_bwd
           cudnn            10.09          0.05979            17.45           0.1987
            aten            21.21            0.144            26.43           0.7356
             jit            13.01           0.2925            23.21           0.8454
      jit_premul             11.4           0.3905            19.62            2.448
 jit_premul_bias            11.85           0.2461            20.29           0.6592
      jit_simple            13.08           0.8533            22.81            1.315
  jit_multilayer            12.93           0.1095            23.57            1.459
              py            31.21            2.783            44.63            6.073
```

Differential Revision: D25383949

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Pulled By: ZolotukhinM

fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a
2020-12-08 13:24:40 -08:00
Bert Maher
07657b6001 [tensorexpr] Switch cpp tests to pure gtest (#48160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160

We no longer use the custom c++ test infra anyways, so move to pure
gtest.

Fixes #45703
ghstack-source-id: 116977283

Test Plan: `buck test //caffe2/test/cpp/tensorexpr`

Reviewed By: navahgar, nickgg

Differential Revision: D25046618

fbshipit-source-id: da34183d87465f410379048148c28e1623618553
2020-11-18 12:23:34 -08:00
Raghavan Raman
fa108bd264 Add flatten loops transformation (#46365)
Summary:
This diff removes the dependency of flattening on tensors by performing flattening on loops instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46365

Reviewed By: ailzhang

Differential Revision: D24366347

Pulled By: navahgar

fbshipit-source-id: 4ba182f37212b6e4033cae13f8e75bc5144389f4
2020-10-16 17:05:26 -07:00
Nick Gibson
402abdfdf4 [NNC] cacheAccesses transform (cache_reads + cache_writes) (#45869)
Summary:
Adds a new transform to the NNC compiler, which adds support for buffer access caching. All accesses within a provided scope are redirected to a cache which is initialized or written back as necessary at the boundaries of that scope. For TVM fans, this is essentially a combination of cache_reads and cache_writes. E.g. it can do this kind of thing:

Before:
```
for (int i = 0; i < 64; i++) {
  for (int j = 0; j < 64; j++) {
    A[i, j] = i * j;
  }
}
for (int i_1 = 0; i_1 < 20; i_1++) {
  for (int j_1 = 0; j_1 < 10; j_1++) {
    B[i_1, j_1] = (A(i_1 + 30, j_1 + 40)) + (A(i_1 + 31, j_1 + 41));
  }
```

After `cacheAccesses(A->buf(), "A_local", j_loop);`

```
for (int i = 0; i < 64; i++) {
  for (int j = 0; j < 64; j++) {
    A[i, j] = i * j;
  }
}
for (int i_1 = 0; i_1 < 20; i_1++) {
  for (int i_2 = 0; i_2 < 2; i_2++) {
    for (int j_1 = 0; j_1 < 11; j_1++) {
      A_local[i_2, j_1] = A[(i_2 + i_1) + 30, j_1 + 40];
    }
  }
  for (int j_2 = 0; j_2 < 10; j_2++) {
    B[i_1, j_2] = (A_local[1, j_2 + 1]) + (A_local[0, j_2]);
  }
}
```

Or this reduction:
```
for (int l1 = 0; l1 < 4; l1++) {
  sum[l1] = 0.f;
  for (int n1_1 = 0; n1_1 < 3; n1_1++) {
    for (int m1_1 = 0; m1_1 < 2; m1_1++) {
      sum[l1] = (sum[l1]) + (scale[(6 * l1 + 2 * n1_1) + m1_1]);
    }
  }
}
```

After `l.cacheAccesses(d->buf(), "d_local", n_loop);`:

```
for (int l1 = 0; l1 < 4; l1++) {
  Allocate(d_local, float, {1});
  sum[l1] = 0.f;
  d_local[0] = 0.f;
  for (int n1_1 = 0; n1_1 < 3; n1_1++) {
    for (int m1_1 = 0; m1_1 < 2; m1_1++) {
      d_local[0] = (d_local[0]) + (scale[(6 * l1 + 2 * n1_1) + m1_1]);
    }
  }
  sum[l1] = (sum[l1]) + (d_local[0]);
  Free(d_local);
}
```

I had originally planned to write `cacheReads` and `cacheWrites` wrappers so we could use them just like their TVM cousins, but they just ended up being big masses of checking that reads or writes weren't present. Didn't feel too useful so I removed them, but let me know.

This is based on bounds inference and inherits a few bugs present in that functionality, which I will address in a followup.

While working on this I realized that it overlaps heavily with `computeAt`: which is really just `cacheReads` + `computeInline`. I'm considering refactoring computeAt to be a wrapper around those two transforms. ZolotukhinM opinions on this?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45869

Reviewed By: mruberry

Differential Revision: D24195276

Pulled By: nickgg

fbshipit-source-id: 36a58ae265f346903187ebc4923637b628048155
2020-10-08 14:13:28 -07:00
Mikhail Zolotukhin
4aca63d38a [TensorExpr] Change API for creating Load and Store expressions. (#45520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520

With this change `Load`s and `Store`s no longer accept `Placeholder`s in
their constructor and `::make` functions and can only be built with
`Buf`.
`Placeholder` gets its own `store`, `load`, `storeWithMask`, and
`loadWithMask` method for more convenient construction.

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D23998789

Pulled By: ZolotukhinM

fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912
2020-09-29 20:52:38 -07:00
Mikhail Zolotukhin
3c33695a6d [TensorExpr] Rename Buffer to Placeholder. (#45389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389

Differential Revision: D23952866

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75
2020-09-29 01:21:54 -07:00