Commit Graph

46 Commits

Author SHA1 Message Date
Xiaoqiang Zheng
9f86b656ba Resubmit: Adding parallel support for the LLVM backend. (#54122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54122

Test Plan:
* USE_TBB=1 ATEN_THREADING=TBB python setup.py develop --cmake
  * USE_TBB=1 ATEN_THREADING=NATIVE python setup.py develop --cmake
  * USE_TBB=1 ATEN_THREADING=OMP python setup.py develop --cmake
  * cd build; ninja bin/tensorexpr_bench
  * bin/test_tensorexpr --gtest_filter="*Parallel*"

Reviewed By: bertmaher

Differential Revision: D27109802

Pulled By: zheng-xq

fbshipit-source-id: db159466d0b46357bcf0fbefb36094bee312368c
2021-03-18 07:19:37 -07:00
Nikita Shulga
d57ae6c46d Revert D26906509: Adding parallel support for the LLVM backend.
Test Plan: revert-hammer

Differential Revision:
D26906509 (95d2318510)

Original commit changeset: 12c17f2f21af

fbshipit-source-id: cc86d0dfca0dd791b31bda23a0172fc1cfd89760
2021-03-11 17:54:47 -08:00
Xiaoqiang Zheng
95d2318510 Adding parallel support for the LLVM backend. (#53243)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53243

Test Plan: Imported from OSS

Reviewed By: bertmaher, Chillee

Differential Revision: D26906509

Pulled By: zheng-xq

fbshipit-source-id: 12c17f2f21af11e73fa4c5b5199043a7a15ecdec
2021-03-11 03:27:37 -08:00
Bert Maher
8ba7c4918a [nnc] Test for direct usage of ramp/broadcast
Summary:
I was attempting to experiment with "manual" vectorization, and boy
was it hard.  I finally came up with this, which I want to write down as a test
case.  Eventually the APIs should make this easier...

Test Plan: buck test

Reviewed By: navahgar

Differential Revision: D26631189

fbshipit-source-id: c28794b25d7852890ea843fdbcaf8751648258c0
2021-02-25 15:02:20 -08:00
Bert Maher
74082f0d6f [te][llvm] Generate arithmetic vs logical right shift as appropriate (#51749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51749

Following in the mode of C++, we probably want to distinguish when
it's appropriate to do arithmetic vs. logical right shift.

> For negative a, the value of a >> b is implementation-defined (in most
> implementations, this performs arithmetic right shift, so that the result
> remains negative).

If you look at what clang does, if `a` is unsigned, a logical shift is
generated; if signed, an arithmetic shift.  Let's do the same here.  This turns
out to be useful for, e.g., implementing transcendental function
approximations.
ghstack-source-id: 121366317

Test Plan:
Added Byte (unsigned) and Char (signed) right-shift tests to
test_llvm.

Reviewed By: asuhan

Differential Revision: D26245856

fbshipit-source-id: 260ee9bf4b032b9ce216f89acbc273cde0ed688c
2021-02-10 02:05:39 -08:00
Mikhail Zolotukhin
c639513378 [TensorExpr] Resubmit: Introduce ExternalCall nodes to TE IR. (#51594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594

ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.

To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.

The reason the PR was previously reverted was that the LLVM generated
calls to bridge functions were breaking unwind tables. This is now fixed
by requiring bridge functions to never throw and setting the
corresponding attribute in the LLVM generated code.

Differential Revision: D26213882

Test Plan: Imported from OSS

Reviewed By: pbelevich, ngimel

Pulled By: ZolotukhinM

fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945
2021-02-03 10:22:54 -08:00
Luca Wehrstedt
4f37150f40 Revert D26179083: [TensorExpr] Introduce ExternalCall nodes to TE IR.
Test Plan: revert-hammer

Differential Revision:
D26179083 (f4fc3e3920)

Original commit changeset: 9e44de098ae9

fbshipit-source-id: d15684e04c65c395b4102d4f98a4488482822d1b
2021-02-02 05:29:41 -08:00
Mikhail Zolotukhin
f4fc3e3920 [TensorExpr] Introduce ExternalCall nodes to TE IR. (#51475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475

ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.

To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.

Test Plan: Imported from OSS

Reviewed By: pbelevich, Chillee

Differential Revision: D26179083

Pulled By: ZolotukhinM

fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659
2021-02-02 00:50:46 -08:00
Mikhail Zolotukhin
e975169426 [TensorExpr] Redesign Tensor class. (#50995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995

This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and
merges it with recently introduced 'CompoundTensor'. A statement for the
tensor is either passed directly to the Tensor constructor (akin to
'CompoundTensor'), or is built immediately in constructor.

LoopNest is no longer responsible for constructing statements from
tensors - it simply stitches already constructed statements contained in
Tensors. This has a side effect that now we cannot construct several
loopnests from the same tensors - we need to explicitly clone statements
if we want to do that. A special copy constructor was added to LoopNest
to make it more convenient (note: this only affects tests, we don't
usually create multiple loopnests in other places).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26038223

Pulled By: ZolotukhinM

fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17
2021-01-27 16:14:22 -08:00
Andres Suarez
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
Mikhail Zolotukhin
e1f73ced1e [TensorExpr] Change LoopNest::vectorize to accept For* instead of Stmt*. (#49696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49696

And make it static.

Test Plan: Imported from OSS

Reviewed By: navahgar, nickgg

Differential Revision: D25668695

Pulled By: ZolotukhinM

fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99
2020-12-21 20:17:20 -08:00
Bram Wasti
1047957831 [te][reapply] Add fast log approximation based on sleef (#49575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49575

This is a fast log implementations

benchmark:

```
buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none'
```

Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat

Reviewed By: bertmaher

Differential Revision: D25627157

fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9
2020-12-17 17:02:00 -08:00
Edward Yang
ea4ccc730e Revert D25445815: [te] Add fast log approximation based on sleef
Test Plan: revert-hammer

Differential Revision:
D25445815 (1329066b69)

Original commit changeset: 20696eacd12a

fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782
2020-12-17 15:03:17 -08:00
Bram Wasti
1329066b69 [te] Add fast log approximation based on sleef
Summary:
This is a fast log implementations

benchmark:
```
buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none'
```

Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat

Reviewed By: bertmaher

Differential Revision: D25445815

fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888
2020-12-17 14:28:34 -08:00
Bram Wasti
6b78644623 [te] Add BitCast to the IR (#49184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49184

Adds BitCasting to NNC.  This will enable fast approximation algorithms implemented directly in TensorExpressions

Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr

Reviewed By: bertmaher

Differential Revision: D25466476

fbshipit-source-id: f063ab29ba7bab2dcce463e499f2d4a16bdc1f0e
2020-12-11 16:12:20 -08:00
Bram Wasti
195b92bfa6 Revert D25441716: [te] Add BitCast to the IR
Test Plan: revert-hammer

Differential Revision:
D25441716 (3384145418)

Original commit changeset: c97b871697bc

fbshipit-source-id: e6eff02e28e1ae8c826dd2cfed79f869839ed2ba
2020-12-10 09:31:35 -08:00
Bram Wasti
3384145418 [te] Add BitCast to the IR
Summary: Adds BitCasting to NNC.  This will enable fast approximation algorithms implemented directly in TensorExpressions

Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr

Reviewed By: bertmaher

Differential Revision: D25441716

fbshipit-source-id: c97b871697bc5931d09cda4a9cb0a81bb420f4e2
2020-12-10 09:25:46 -08:00
Bert Maher
07657b6001 [tensorexpr] Switch cpp tests to pure gtest (#48160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160

We no longer use the custom c++ test infra anyways, so move to pure
gtest.

Fixes #45703
ghstack-source-id: 116977283

Test Plan: `buck test //caffe2/test/cpp/tensorexpr`

Reviewed By: navahgar, nickgg

Differential Revision: D25046618

fbshipit-source-id: da34183d87465f410379048148c28e1623618553
2020-11-18 12:23:34 -08:00
Nick Gibson
957e45a97c [NNC] Support vectorization of reductions (#47924)
Summary:
Add support for ReduceOp in the Vectorizer, which allows vectorization of reductions. Only non-reduce axes can be vectorized currently, we'd need either automatically pulling out the RHS of reductions (better as a separate transform, I think) or special handling of vector reduce in the LLVM codegen (tricky, maybe not useful?) to make vectorizing reduce axes work.

There was a disabled LLVM test for this case which I reenabled with a bit of massaging, and added a few more.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47924

Reviewed By: bertmaher

Differential Revision: D24963464

Pulled By: nickgg

fbshipit-source-id: 91d91e9e2696555ab5690b154984b1ce48359d51
2020-11-16 10:43:53 -08:00
Cheng Chang
f730f2597e [NNC] Implement Cond in LLVM codegen (#47256)
Summary:
Generate LLVM IR for statements such as
```
if (...) {
   ....
} else {
   ....
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47256

Test Plan: added unit tests to test_llvm.cpp

Reviewed By: nickgg

Differential Revision: D24699080

Pulled By: cheng-chang

fbshipit-source-id: 83b0cebcd242828263eb6052483f0924b5f091ce
2020-11-03 14:46:30 -08:00
Mikhail Zolotukhin
4aca63d38a [TensorExpr] Change API for creating Load and Store expressions. (#45520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520

With this change `Load`s and `Store`s no longer accept `Placeholder`s in
their constructor and `::make` functions and can only be built with
`Buf`.
`Placeholder` gets its own `store`, `load`, `storeWithMask`, and
`loadWithMask` method for more convenient construction.

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D23998789

Pulled By: ZolotukhinM

fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912
2020-09-29 20:52:38 -07:00
Mikhail Zolotukhin
b86008ab75 [TensorExpr] Remove buf_ field from class Tensor. (#45390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390

Tensor objects should always refer to their Function's bufs. Currently
we never create a Tensor with a buffer different than of its function,
but having it in two places seems incorrect and dangerous.

Differential Revision: D23952865

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a
2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin
3c33695a6d [TensorExpr] Rename Buffer to Placeholder. (#45389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389

Differential Revision: D23952866

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75
2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin
92306b85d5 [TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388

Classes defined in these files are closely related, so it is reasonable
to have them all in one file. The change is purely a code move.

Differential Revision: D23952867

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155
2020-09-29 01:17:10 -07:00
Alex Suhan
18b77d7d17 [TensorExpr] Add Mod support to the LLVM backend (#44823)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM

Reviewed By: glaringlee

Differential Revision: D23761996

Pulled By: asuhan

fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9
2020-09-17 15:25:42 -07:00
Alex Suhan
f5b92332c1 [TensorExpr] Fix order comparisons for unsigned types (#44857)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM

Reviewed By: glaringlee

Differential Revision: D23762162

Pulled By: asuhan

fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88
2020-09-17 14:16:54 -07:00
Alex Suhan
5d57025206 [TensorExpr] Add log1p support to the LLVM backend (#44839)
Summary:
Also corrected Sleef_log1p registrations, float versions had a redundant f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM

Reviewed By: glaringlee

Differential Revision: D23762113

Pulled By: asuhan

fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459
2020-09-17 13:38:35 -07:00
Bert Maher
c14a3613a8 Fix NaN propagation in TE fuser's min/max implementation (#43609)
Summary:
Per eager mode source-of-truth, NaNs shall be propagated by min/max.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609

Reviewed By: ZolotukhinM

Differential Revision: D23349184

Pulled By: bertmaher

fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb
2020-09-01 02:10:13 -07:00
Nick Gibson
944ac133d0 [NNC] Remove VarBinding and go back to Let stmts (#42634)
Summary:
Awhile back when commonizing the Let and LetStmt nodes, I ended up removing both and adding a separate VarBinding section the Block. At the time I couldn't find a counter example, but I found it today: Local Vars and Allocations dependencies may go in either direction and so we need to support interleaving of those statements.

So, I've removed all the VarBinding logic and reimplemented Let statements. ZolotukhinM I think you get to say "I told you so". No new tests, existing tests should cover this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42634

Reviewed By: mruberry

Differential Revision: D22969771

Pulled By: nickgg

fbshipit-source-id: a46c5193357902d0f59bf30ab103fe123b1503f1
2020-08-07 10:50:38 -07:00
Nick Gibson
7ffdd765c8 [TensorExpr] more convenient outer Rfactor output (#40050)
Summary:
Auto fuse the output loops of outer Rfactors, so it is in a more convenient format for binding GPU axes.

An example:
```
  Tensor* c = Reduce("sum", {}, Sum(), b, {{m, "m"}, {n, "n"}, {k, "k"}});
  LoopNest loop({c});
  std::vector<For*> loops = loop.getLoopStmtsFor(c);
  auto v = loops.at(0)->var();
  loop.rfactor(c->body(), v);
```
Before:
```
{
  Allocate(tmp_buf, float, {m});
  sum[0] = 0.f;
  for (int m_1 = 0; m_1 < m; m_1++) {
    tmp_buf[m_1] = 0.f;
  }
  for (int m_1 = 0; m_1 < m; m_1++) {
    for (int n = 0; n < n_1; n++) {
      for (int k = 0; k < k_1; k++) {
        tmp_buf[m_1] = (tmp_buf[m_1]) + (b[((n_1 * m_1) * k_1 + k) + k_1 * n]);
      }
    }
  }
  for (int m_1 = 0; m_1 < m; m_1++) {
    sum[0] = (sum[0]) + (tmp_buf[m_1]);
  }
  Free(tmp_buf);
}
```

After:
```
{
  sum[0] = 0.f;
  for (int m = 0; m < m_1; m++) {
    Allocate(tmp_buf, float, {m_1});
    tmp_buf[m] = 0.f;
    for (int n = 0; n < n_1; n++) {
      for (int k = 0; k < k_1; k++) {
        tmp_buf[m] = (tmp_buf[m]) + (b[((n_1 * m) * k_1 + k) + k_1 * n]);
      }
    }
    sum[0] = (sum[0]) + (tmp_buf[m]);
    Free(tmp_buf);
  }
}
```

The existing Rfactor tests cover this case, although I did rename a few for clarity. This change broke the LLVMRFactorVectorizedReduction test because it now does what its intending to (vectorize a loop with a reduction in it) rather than nothing, and since that doesn't work it correctly fails. I've disabled it for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40050

Reviewed By: ZolotukhinM

Differential Revision: D22605639

Pulled By: nickgg

fbshipit-source-id: e359be53ea62d9106901cfbbc42d55d0e300e8e0
2020-07-21 14:44:26 -07:00
Nick Gibson
33f4fca1a6 [TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606)
Summary:
Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block.

This complicates some tests but simplifies the IR by restricting where variable binding can occur.

I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606

Differential Revision: D21467483

Pulled By: nickgg

fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688
2020-05-09 16:23:37 -07:00
Nick Gibson
4e2ea6e013 [TensorExpr] Remove the Tensor argument from loopnest.reorderAxis (#37873)
Summary:
Remove the requirement for the axes provided to reorderAxis to come from a Tensor. We were using that to determine the relevant loops, but we can alternatively determine it by traversing the parents of each provided For.

resistor does this work for you?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37873

Differential Revision: D21428016

Pulled By: nickgg

fbshipit-source-id: b16b2f41cb443dfc2c6548b7980731d1e7d89a35
2020-05-06 12:02:15 -07:00
Mikhail Zolotukhin
1c0bad25f3 [TensorExpr] Add dtype to class Buf. (#36611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36611

Currently Buf represents underlying storage but it didn't have dtype.
That resulted in specifying dtypes in different places and there was no
mechanism to enforce its consistency: e.g. one could've created a kFloat
expression and use a kInt buffer to store its result. Now we're
centralizing where the logic regarding the storage is located and we can
start enforcing semantics rules.

Follow-ups: we can merge Buffer and BufHandle classes as the former is
now a mere wrapper over the latter.

Test Plan: Imported from OSS

Differential Revision: D21027356

Pulled By: ZolotukhinM

fbshipit-source-id: c06aa2c4077fdcde3bb4ca622d324aece79b5a9c
2020-05-05 15:04:37 -07:00
Owen Anderson
564de515f5 Add an iterator to Block. (#37542)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37542

Differential Revision: D21314421

Pulled By: resistor

fbshipit-source-id: e54d7a8a5c9c1186be59f69b5b8af030fc054b32
2020-05-01 15:12:49 -07:00
Owen Anderson
20ba29d81c Add support for reductions on CPU in tensorexpr (#37333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37333

Differential Revision: D21290289

Pulled By: resistor

fbshipit-source-id: ebba11f7af9e22b48c47e2eefb9497fa77acd17d
2020-04-30 10:59:38 -07:00
Nick Gibson
a99b169828 [TensorExpr] fix a bug in LLVM codegen around empty kernels (#36660)
Summary:
LLVM Codegen assumes that the kernel contains real statements, but that is not guaranteed, especially after IR Simplification. This PR adds a catch for the case where no value is generated after recursing the LLVMCodegen visitor through the kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36660

Differential Revision: D21044066

Pulled By: nickgg

fbshipit-source-id: e521c766286b1ff4e26befcec7ff4959db8181a4
2020-04-15 17:45:06 -07:00
Nick Gibson
caa45c8e33 [TensorExpr] fix warnings (#36167)
Summary:
Fix a bunch of minor warnings in jit/tensorexpr, mostly unused variable & wrong sign comparisons.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36167

Differential Revision: D20905081

Pulled By: nickgg

fbshipit-source-id: 16fe605a86f08596f64e74e9337c59a2581a4d5a
2020-04-08 15:42:29 -07:00
Mikhail Zolotukhin
3ef5ff6012 [TensorExpr] Make Load and Store multi-dimensional. (#35800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35800

This PR includes the following changes:
* Introduce a new `Expr` type `Buf`: it plays a similar to `Var` role, but also has dimensions.
* Use the new `Buf` class in `Store` and `Load` instead of `Var` for specifying where to store to or load from. `Buf` contains the dimensions info of the buffer we're loading/storing to and hence we are able to keep N-d indexes without flattening them into a 1-d index ([x,y] vs [x+y*W]).
* Flattening of the indexes is now a separate pass that is executed in `LoopNest::prepareForCodegen` - backends still expect indexes to be flattened, and this PR preserves that.
* `Tensor` now contains a `Buf` instead of `Var`, and thus Tensor now has the dimensions info (previously it was a property of a `Function`, not a `Tensor`). This brings us closer to Tensor being a combination of Buffer + Function, where Buffer specifies iteration domain and the Function defines a computation.

TODOs:
* Consider merging `Buffer` with `Buf` or `BufHandle`. It seems that we don't need all of them.
* Harden the logic of how we create buffers in fuser pass. Currently it seems that sometimes we don't set dimensions.
* Use `Buf` in `Allocate` and `Free`.
* Make it clearer that `Function` doesn't "own" dimensions info and that dimensions are a property of a Tensor, not a Function.

Differential Revision: D20789005

Test Plan: Imported from OSS

Reviewed By: zheng-xq

Pulled By: ZolotukhinM

fbshipit-source-id: e04188d1d297f195f1c46669c614557d6bb6cde4
2020-04-02 11:18:28 -07:00
Nikolay Korovaiko
9e22d15f14 Enable tensorexpr cpp tests in CI. try #2 (#35454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454

Differential Revision: D20665160

Pulled By: Krovatkin

fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85
2020-03-27 12:09:55 -07:00
Suraj Menon
aa01a95c6d Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP]
Test Plan: revert-hammer

Differential Revision:
D20630760

Original commit changeset: 7d2f27aca6b1

fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463
2020-03-25 20:34:46 -07:00
Nikolay Korovaiko
f3a5081bd4 Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] (#34897)
Summary:
This  PR add tensorexpr cpp tests to test_jit.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34897

Differential Revision: D20630760

Pulled By: Krovatkin

fbshipit-source-id: 7d2f27aca6b1e23e3ffed1c765d8f590688118e3
2020-03-25 17:23:48 -07:00
Mikhail Zolotukhin
ceb4ed3733 [TensorExpr] Methods name cleanup in LoopNest class. (#35174)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35174

Differential Revision: D20585575

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 0fa8e1e85e1502b9a86cf34608cb791ffb23d395
2020-03-25 11:51:11 -07:00
Mikhail Zolotukhin
95ad94c75b [TensorExpr] Nuke tensorexpr::schedule namespace. (#35126)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35126

Test Plan: Imported from OSS

Differential Revision: D20569364

Pulled By: ZolotukhinM

fbshipit-source-id: c0d51ecadf411918641cdbdc6d8cb06e207d2c9b
2020-03-20 23:39:14 -07:00
Mikhail Zolotukhin
65cea95777 [TensorExpr] Rename schedule.{cpp,h} to loopnest.{cpp,h}. (#35119)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35119

Differential Revision: D20567927

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 1fb6d03bd4c6e66aca62140d2b537692577f261d
2020-03-20 23:37:51 -07:00
Mikhail Zolotukhin
95833a49e6 [TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842

This PR (hopefully the last one of such kind) is merging changes from a
side branch where tensor expessions based fuser work has been done so
far. This PR is is a squashed version of changes in the side branch,
which is available here: https://github.com/bertmaher/pytorch

Differential Revision: D20478208

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0
2020-03-17 11:02:48 -07:00
Mikhail Zolotukhin
ea5c86c276 [TensorExpr] Add LLVM codegen. (#34228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228

This PR adds LLVM codegen to tensor expressions. LLVM is added as an
optional build dependency specified with `USE_LLVM=<path_to_llvm>`
variable. If this variable is not set or LLVM is not found in the
specified path, the LLVM codegen is completely disabled.

Differential Revision: D20251832

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2
2020-03-16 11:49:34 -07:00