Commit Graph

833 Commits

Author SHA1 Message Date
Xiaoqiang Zheng
5e504e83e8 Add sync-point insertions and block/thread local memory allocations (#36563)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36563

Test Plan: Imported from OSS

Differential Revision: D21014238

Pulled By: zheng-xq

fbshipit-source-id: 4d61ff2f76345ea2825f2d5f60a771f65b24ad69
2020-04-20 18:52:30 -07:00
Xiaoqiang Zheng
32bbf12aa7 Make trivial thread-idx for degenerate statements without thread-idx. (#36480)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36480

Test Plan: Imported from OSS

Differential Revision: D20992505

Pulled By: zheng-xq

fbshipit-source-id: 3d4e5401b59b9507b5f2db659e511bd1af53f5ab
2020-04-17 02:31:07 -07:00
Mike Ruberry
b45b9673a1 Fixes clang format (#36787)
Summary:
Fixes clang format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36787

Differential Revision: D21084603

Pulled By: mruberry

fbshipit-source-id: 7e29da135f9a2aa126cb68640e33c1914fd570e3
2020-04-17 00:42:51 -07:00
Wanchao Liang
6d4c509168 [autograd] lower MAX_DEPTH limit according to TSAN limit (#36745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36745

As we hold a mutex for our custom C++ Node, when calling reentrant
backward from custom C++ function, we will cocurrently holding many
mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise
it will complain. This PR lower the limit according to TSAN.

TSAN Reference: https://github.com/google/sanitizers/issues/950

Test Plan: Imported from OSS

Differential Revision: D21072604

Pulled By: wanchaol

fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2
2020-04-16 20:43:20 -07:00
Owen Anderson
1fc3556ec9 Teach the tensorexpr vectorizer to handle nested For loops. (#36467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36467

Differential Revision: D21013179

Pulled By: resistor

fbshipit-source-id: aa4f3da58cf16934f11e0cf4252a300cbac98f21
2020-04-16 15:40:44 -07:00
Karl Ostmo
4894cba572 Revert D19775659: [WIP] Move profiler to a dispatch wrapper
Test Plan: revert-hammer

Differential Revision:
D19775659

Original commit changeset: 5cbe5f736660

fbshipit-source-id: dcb41d2433697c5d521044a9dbc12c79f31e0929
2020-04-16 14:18:51 -07:00
Nick Gibson
ee3d046f87 [TensorExpr] Add support for Axis reordering in LoopNest (#36540)
Summary:
Adds a capability for reordering axes in the LoopNest. This was fairly straightforward except when handling Reduction initializers which required more changes, UPDATE: actually the complicated bit was preserving the ordering of statements in the loopnest which should not be reordered.

Usage looks something like this:

```
Tensor* tensor = Compute(
    "f", {{2, "x"}, {3, "y"}}, [](const VarHandle& x, const VarHandle& y) {
      return ExprHandle(1.0f) + cast<float>(x) * x + cast<float>(y) * y;
    });
LoopNest l({tensor});

/* LoopNest looks like:
  for x in ...
    for y  in ...
       f[x,y] = 1 + x * x + y * y;
*/

auto loops = l.getLoopStmtsFor(tensor);
l.reorderAxis(tensor, loops[0], loops[1])

/* LoopNest looks like:
  for y in ...
    for x  in ...
       f[x,y] = 1 + x * x + y * y;
*/
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36540

Differential Revision: D21068143

Pulled By: nickgg

fbshipit-source-id: f02c29004376df4f5a9bedff366c075772726618
2020-04-16 13:42:47 -07:00
James Reed
a85c835196 [WIP] Move profiler to a dispatch wrapper (#33057)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33057

Test Plan: Imported from OSS

Differential Revision: D19775659

Pulled By: jamesr66a

fbshipit-source-id: 5cbe5f736660c8543764ef62b16550638d9ceb72
2020-04-16 13:36:37 -07:00
Michael Ranieri
3567b881a5 make sure dispatch test works on windows (#36729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36729

setenv not available on windows

Test Plan: CI green in ovrsource

Reviewed By: stepancheg

Differential Revision: D21067835

fbshipit-source-id: ddbc3285ef88f123dc6a200b661c48cfafc6bf00
2020-04-16 11:36:56 -07:00
Christian Sarofeen
f11c4f90c2 New CUDA Fuser: Unrolling support, interface refactor (#36435)
Summary:
Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is:
https://godbolt.org/z/i0uAv3

What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36435

Reviewed By: ZolotukhinM

Differential Revision: D21024011

Pulled By: soumith

fbshipit-source-id: e852e282fa7a304aba962e1926f756098c011fe0
2020-04-16 09:20:24 -07:00
Nick Gibson
7539ea0207 [TensorExpr] Add simplification of length 0 and 1 For loops to IR Simplifier (#36348)
Summary:
Simplifies loops which can be collapsed down into a single block or removed entirely. E.g.

```
For 0..1 {
  Statements...
}
```

Is now just `Block({Statements...})`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36348

Differential Revision: D21057959

Pulled By: nickgg

fbshipit-source-id: 2f95a19a965c4a6e023680e2cea9ea846e82d62e
2020-04-15 23:56:34 -07:00
Elias Ellison
9cbeb0faed [JIT] Dont optimize shape peepholes on inline (#36404)
Summary:
With https://github.com/pytorch/pytorch/pull/35562, we are running peephole optimization on inlining to reduce the number of nodes that are copied.

The tracer encodes the sizes in the graph like:
```
graph(%0 : Double(7)):
  %1 : Function = prim::Constant[name="tensor_size"]()
  %2 : Tensor = prim::CallFunction(%1, %0)
  return (%2)
```

however people would like to reuse the graph with different shapes so running size invalidations would invalidate that. long term it might be better for the tracer to not include shape information but there are downstream users of that.

Separates out FuseAddMM from peephole so that now there is a single `disable_size_optimizations` parameter, and onnx explicitly invokes fuseaddmm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36404

Differential Revision: D20968974

Pulled By: eellison

fbshipit-source-id: 56f8f1699e3b0adeeccdfd5a67bb975fd41a2913
2020-04-15 17:49:48 -07:00
Nick Gibson
a99b169828 [TensorExpr] fix a bug in LLVM codegen around empty kernels (#36660)
Summary:
LLVM Codegen assumes that the kernel contains real statements, but that is not guaranteed, especially after IR Simplification. This PR adds a catch for the case where no value is generated after recursing the LLVMCodegen visitor through the kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36660

Differential Revision: D21044066

Pulled By: nickgg

fbshipit-source-id: e521c766286b1ff4e26befcec7ff4959db8181a4
2020-04-15 17:45:06 -07:00
Nikita Shulga
6bd6b70a02 Fix clang-format (#36685)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36685

Differential Revision: D21052657

Pulled By: malfet

fbshipit-source-id: b4ec7eba21864108a1108f8c83b5d33cf31ab89e
2020-04-15 17:02:20 -07:00
Xiaoqiang Zheng
dad25ae47d Add the one-block multi-thread global reduction support. (#36306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36306

Missing __syncthreads between sections.

Differential Revision: D20957254

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Pulled By: zheng-xq

fbshipit-source-id: c988f0205b667174b3ee851c28adeec2dbd089f7
2020-04-15 13:05:11 -07:00
Xiaoqiang Zheng
e80813fae3 Add trivial reduce for Cuda (#36293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36293

Detect non-read-only loads, and not to use __ldg.
Resubmiting #36092

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D20935933

Pulled By: zheng-xq

fbshipit-source-id: f9280db26aa9c9c8119cea12571bc820f5fbcb61
2020-04-15 13:03:58 -07:00
Mikhail Zolotukhin
317f598103 [TensorExpr] Clang-format test/cpp/tensorexpr/*. (#36615)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36615

Test Plan: Imported from OSS

Differential Revision: D21027733

Pulled By: ZolotukhinM

fbshipit-source-id: e19cd85c1634f4e40805814ac71eec719d6587f8
2020-04-14 19:08:18 -07:00
Mikhail Zolotukhin
d5ba39c25d [TensorExpr] Postpone insertion of Alloc/Free statements in computeAt. (#36526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36526

Test Plan: Imported from OSS

Differential Revision: D21004740

Pulled By: ZolotukhinM

fbshipit-source-id: 8ac8db0d4e31065e4fbd3e0cc27f15a15dcb141c
2020-04-13 22:30:00 -07:00
Mikhail Zolotukhin
df5f0a04ff [TensorExpr] Implement LoopNest::computeAt (#36112)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36112

Differential Revision: D20885662

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 4ea6293b249562fca46739dc36c5483d912e5838
2020-04-11 04:01:14 -07:00
Mikhail Zolotukhin
397aa46a3e [TensorExpr] Bounds inference (#35120)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35120

Differential Revision: D20567926

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 89a2afcddaf23a5c6259c15e4f7194e8649c1c4d
2020-04-11 03:59:34 -07:00
Nick Gibson
42457e634d [TensorExpr] add support for Reduction Ops (#35866)
Summary:
Second attempt at the reduction frontend for the TensorExpr compiler. Has two APIs, a simple version for common reduction types and a customizable Reducer fronted which allows specifying initializer, reduction interaction via lambda and body via lambda.

Simple API looks like so:
```
Buffer b(BufHandle("b", {10}), kInt);
Tensor* c = Reduce("sum", {}, Sum(b), {{10, "m"}});
```

An example of specializing a Sum to do Matmul:
```
Buffer tA(BufHandle("tA", {M, K}), kFloat);
Buffer tB(BufHandle("tB", {K, N}), kFloat);
Sum matmul([&](ParameterList& v) {
  ExprHandle m = v[0];
  ExprHandle n = v[1];
  ExprHandle k = v[2];
  return tA(m, k) * tB(k, n);
});
Tensor* mm = Reduce("mm", {{M, "m"}, {N, "n"}}, matmul, {{K, "k"}});
```

A fully specialized Reduction:
```
VarHandle searchValue("searchValue", kInt);
Buffer b(BufHandle("b", {4, 10}), kInt);

Reducer anyEqSV(
    ExprHandle(0),
    [](ExprHandle a, ExprHandle b) {
      return CompareSelect::make(a, 1, 1, b, kEQ);
    },
    [&](ParameterList& v) {
      return CompareSelect::make(b.call(v), searchValue, kEQ);
    });

Tensor* any = Reduce("anyEqual", {{4, "i"}}, anyEqSV, {{10, "j"}});
```

 ---

Until lowering, Reductions are held in a compound form for easier optimization:
```
  VarHandle m("m", kInt);
  Buffer b(BufHandle("b", {2, 3, m}), kFloat);

  Tensor* c = Reduce("sum", {{2, "l"}, {3, "n"}}, Sum(b), {{m, "m"}});
  LoopNest loop({c});
  std::cout << *loop.root_stmt() << "\n";
```
```
for (int l = 0; l < 2; l++) {
  for (int n = 0; n < 3; n++) {
    for (int m = 0; m < m_1; m++) {
      sum[l, n] = ReduceOp(sum[l, n] = float(0);, (sum[l, n]) + (b[l, n, m]), {m});
    }
  }
}
```
```
  loop.prepareForCodegen();
  std::cout << *loop.root_stmt() << "\n";
```
```
for (int l = 0; l < 2; l++) {
  for (int n = 0; n < 3; n++) {
    sum[(0 + l * (1 * 3)) + n * 1] = float(0);
    for (int m = 0; m < m_1; m++) {
      sum[(0 + l * (1 * 3)) + n * 1] = (sum[(0 + l * (1 * 3)) + n * 1]) + (b[((0 + l * ((1 * m_1) * 3)) + n * (1 * m_1)) + m * 1]);
    }
  }
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35866

Differential Revision: D20965577

Pulled By: nickgg

fbshipit-source-id: afe506c90db794447180056417013bcaf0e2c049
2020-04-10 11:57:10 -07:00
Nick Gibson
477f1c047c [TensorExpr] add simplication of constant branches to IR Simplifier (#36257)
Summary:
Adds handling of constant branches to the TensorExpr IR Simplifier. This covers both IfThenElse and Cond when the condition expression is a known constant (e.g. `IfThenElse(1, X, Y) => X`), or when both arms of the branch are the same (e.g. `IfThenElse(Y, X, X) => X`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36257

Differential Revision: D20947777

Pulled By: nickgg

fbshipit-source-id: 974379e42a6d65ce3e7178622afb62d36ad4e380
2020-04-09 14:45:13 -07:00
Christian Sarofeen
e551bfc8de New CUDA Fuser code lowering refactor (#36199)
Summary:
This PR completely refactors the code lowering process from our IR to CUDA. Before we had one giant step that would go from a relatively high level IR straight to CUDA, now we're lowering this first into concepts like ForLoop, IfThenElse, TensorIndex, Allocate. This lowering will allow us to do more complex code lowering like reductions and unrolling. Unrolling will quickly follow this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36199

Reviewed By: dzhulgakov

Differential Revision: D20925220

Pulled By: soumith

fbshipit-source-id: 8f621c694c68a1aad8653e625d7287fe2d8b35dc
2020-04-09 14:27:05 -07:00
Nick Gibson
1443db8dc3 [TensorExpr] fix bug in IRSimplifier when multiplying by 0 (#36287)
Summary:
In the IR Simplifier we were not treating multiply by zero specially, which meant some constant expressions were stored in formats that were not constant.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36287

Differential Revision: D20937497

Pulled By: nickgg

fbshipit-source-id: 528e430313ea048524d7a4a0256eef4a0297438b
2020-04-09 09:55:16 -07:00
Nick Gibson
caa45c8e33 [TensorExpr] fix warnings (#36167)
Summary:
Fix a bunch of minor warnings in jit/tensorexpr, mostly unused variable & wrong sign comparisons.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36167

Differential Revision: D20905081

Pulled By: nickgg

fbshipit-source-id: 16fe605a86f08596f64e74e9337c59a2581a4d5a
2020-04-08 15:42:29 -07:00
Nick Gibson
195362d74c [TensorExpr] scalar factorization of Div (#36154)
Summary:
Add support for the TensorExpr IR Simplifier to factorize common terms on either side of a Div node. e.g. `(8 * x) / (4 * y) => (2 * x) / y`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36154

Differential Revision: D20910580

Pulled By: nickgg

fbshipit-source-id: ee071d93bc4711b1e710be312de599d18ab506f3
2020-04-08 11:56:07 -07:00
Mike Ruberry
3570ef6a0f Revert D20876204: [pytorch][PR] Add trivial reduce for Cuda
Test Plan: revert-hammer

Differential Revision:
D20876204

Original commit changeset: a719f3583cc4

fbshipit-source-id: 6d00afb3a24754d283a7b832c0b784ed9fce36e1
2020-04-06 20:17:04 -07:00
Xiaoqiang Zheng
a81be33a4e Add trivial reduce for Cuda (#36092)
Summary:
Detect non-read-only loads, and not to use __ldg.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36092

Reviewed By: ZolotukhinM

Differential Revision: D20876204

Pulled By: zheng-xq

fbshipit-source-id: a719f3583cc4ca30fcfb49d999ca785181354d84
2020-04-06 17:58:50 -07:00
Martin Yuan
82087ee7f6 Add DICT_CONSTRUCT and NAMED_TUPLE_CONSTRUCT to lite interpreter (#36015)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36015

Test Plan: Imported from OSS

Reviewed By: linbinyu

Differential Revision: D20853995

Pulled By: iseeyuan

fbshipit-source-id: 153f76d223f9ffc71e2259b741a7e5d78ae63f22
2020-04-04 09:52:58 -07:00
Will Feng (FAIAR)
5fab1bf3e4 Use std::abs instead of abs in lbfgs.cpp (#35974)
Summary:
This supersedes https://github.com/pytorch/pytorch/pull/35698.

`abs` is a C-style function that takes only integral argument
`std::abs` is polymorphic and can be applied to both integral and floating point types

This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D20853570

Pulled By: yf225

fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4
2020-04-04 09:37:21 -07:00
Ashkan Aliabadi
b7f4b6a6de Support for XNNPACK max pooling operator. (#35354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35354

Differential Revision: D20821862

Test Plan: Imported from OSS

Pulled By: AshkanAliabadi

fbshipit-source-id: 156fb8db85ab194919f68fd99599f08f2647b695
2020-04-03 22:53:15 -07:00
Ilia Cherniavskii
a604041a11 Back out "[pytorch][PR] indexing: throw exception for masks with dtype=uint8" (#36013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36013

Original commit changeset: f4ebaabf427d

Test Plan: CI

Differential Revision: D20853694

fbshipit-source-id: 93deb43f67a385ddfd6853fef6f1dc6de408ec37
2020-04-03 21:40:02 -07:00
Pavel Belevich
4b64dffcb6 Move uniform_() to DistributionTemplates(Migrate uniform_ from TH to ATen) (#35580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35580

`uniform_kernel_cpu` is based on https://github.com/pytorch/pytorch/pull/30954

Test Plan: Imported from OSS

Differential Revision: D20820221

Pulled By: pbelevich

fbshipit-source-id: 13f9fc8fc75b0e9fb48021f2ac08dcb38212a53f
2020-04-03 16:37:44 -07:00
Nikita Shulga
03a4a4887d Fix clang-format (#35969)
Summary:
Just run `./tools/clang_format.py  --verbose` and `git commit --all`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35969

Test Plan: CI

Differential Revision: D20845626

Pulled By: malfet

fbshipit-source-id: 0ae9a91dfa33417a021e7e9d233baba4188daf81
2020-04-03 14:36:20 -07:00
davidriazati
596153cad1 [jit] Enable type tags in serialization (#35741)
Summary:
This enables the serialization part of this change (the deserialization stuff is already landed #33255)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35741

Pulled By: driazati

Differential Revision: D20758124

fbshipit-source-id: e2cdefa99c3bec991491e5e967e7f1661ca7ffd9
2020-04-03 11:59:42 -07:00
Song Zhou
dabeff33b9 [pytorch] Fix fblearner flow compiling errors (#35902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35902

Move operator registration to anonymous namespace to avoid collision.

Reviewed By: soumith

Differential Revision: D20822382

fbshipit-source-id: 1ab00871491668b8b85e803ac877d96477f1688b
2020-04-02 14:52:48 -07:00
Mikhail Zolotukhin
3ef5ff6012 [TensorExpr] Make Load and Store multi-dimensional. (#35800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35800

This PR includes the following changes:
* Introduce a new `Expr` type `Buf`: it plays a similar to `Var` role, but also has dimensions.
* Use the new `Buf` class in `Store` and `Load` instead of `Var` for specifying where to store to or load from. `Buf` contains the dimensions info of the buffer we're loading/storing to and hence we are able to keep N-d indexes without flattening them into a 1-d index ([x,y] vs [x+y*W]).
* Flattening of the indexes is now a separate pass that is executed in `LoopNest::prepareForCodegen` - backends still expect indexes to be flattened, and this PR preserves that.
* `Tensor` now contains a `Buf` instead of `Var`, and thus Tensor now has the dimensions info (previously it was a property of a `Function`, not a `Tensor`). This brings us closer to Tensor being a combination of Buffer + Function, where Buffer specifies iteration domain and the Function defines a computation.

TODOs:
* Consider merging `Buffer` with `Buf` or `BufHandle`. It seems that we don't need all of them.
* Harden the logic of how we create buffers in fuser pass. Currently it seems that sometimes we don't set dimensions.
* Use `Buf` in `Allocate` and `Free`.
* Make it clearer that `Function` doesn't "own" dimensions info and that dimensions are a property of a Tensor, not a Function.

Differential Revision: D20789005

Test Plan: Imported from OSS

Reviewed By: zheng-xq

Pulled By: ZolotukhinM

fbshipit-source-id: e04188d1d297f195f1c46669c614557d6bb6cde4
2020-04-02 11:18:28 -07:00
Christian Sarofeen
6d24f8fe21 Infrastructure for a new CUDA Fuser (#34785)
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
2020-04-02 09:22:42 -07:00
Nick Gibson
051132f119 [TensorExpr] simplification of round + mod pattern. (#35683)
Summary:
Adds capabilities to the TensorExpr IR Simplifier to simplify down Round + Mod patterns (e.g. `(x/y)*y + x%y => x`) via means of lifting integer rounding into a temporary `RoundOff` node.

This integrates with existing simplification mechanisms (folding, factorization, reordering, etc) to allow simplification of compound expressions: e.g. `20 * (x  / (16 / 2)) * 2 + (11 % 6) * (x % (7+1)) => 5 * x.`.

Tests: ran tensorexpr cpp and python tests, ran a hpc benchmark and verified results and time didn't regress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35683

Differential Revision: D20811316

Pulled By: nickgg

fbshipit-source-id: 0cd6a517fb9548b3bc689768304b97375df5ac58
2020-04-02 00:11:00 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Wojciech Baranowski
2f84a07b58 indexing: throw exception for masks with dtype=uint8 (#34418)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34418

Differential Revision: D20776164

Pulled By: ngimel

fbshipit-source-id: f4ebaabf427d7967f2f317235562f91c8f9216f0
2020-03-31 20:51:56 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
Nick Gibson
5b3492df18 [TensorExpr] Extend arithmetic simplifier to work with multi variable expressions (Attempt 2) (#35415)
Summary:
https://github.com/pytorch/pytorch/pull/35127 was landed and reverted because I missed a test fail (oops). I have found and fixed the issue, which was due to zero terms being introduced after the point that filtered them out (usually required NAN/INF, e.g. x / INF => 0).

See https://github.com/pytorch/pytorch/pull/35127 for more info.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35415

Reviewed By: ZolotukhinM

Differential Revision: D20702957

Pulled By: nickgg

fbshipit-source-id: 119eb41e9fa676bd78e3d1df99297a47ae312185
2020-03-28 00:19:55 -07:00
Nikita Shulga
b9adbb5002 Fix/relax CMake linter rules (#35574)
Summary:
Ignore mixed upper-case/lower-case style for now
Fix space between function and its arguments violation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574

Test Plan: CI

Differential Revision: D20712969

Pulled By: malfet

fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78
2020-03-27 16:52:33 -07:00
Nikolay Korovaiko
9e22d15f14 Enable tensorexpr cpp tests in CI. try #2 (#35454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454

Differential Revision: D20665160

Pulled By: Krovatkin

fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85
2020-03-27 12:09:55 -07:00
anjali411
5371fdb1a0 [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957)
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD

**TODO**: add BC-breaking notes for this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20678162

Pulled By: yf225

fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a
2020-03-26 19:53:02 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Edward Yang
843fd740fb Revert D20645945: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer
Test Plan: revert-hammer

Differential Revision:
D20645945

Original commit changeset: 383588065bf1

fbshipit-source-id: 6d7bc5676de64e329d9862889f32033c76b4009c
2020-03-26 06:40:34 -07:00
Suraj Menon
aa01a95c6d Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP]
Test Plan: revert-hammer

Differential Revision:
D20630760

Original commit changeset: 7d2f27aca6b1

fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463
2020-03-25 20:34:46 -07:00
anjali411
efbd6b8533 [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957)
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD

**TODO**: add BC-breaking notes for this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957

Differential Revision: D20645945

Pulled By: yf225

fbshipit-source-id: 383588065bf1859b38f0ad0a25d93d41e153c96e
2020-03-25 18:26:02 -07:00