Commit Graph

922 Commits

Author SHA1 Message Date
Mike Ruberry
ddea6c552f
Ports full dtype inference deprecation to 1.6 (#40799)
* ports full deprecation

* fixtures

* Fixes lint

* Trying to fix phantom lint issue

* nuclear lint option

* Paradoxical linter fix

Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-07-01 09:27:27 -07:00
Mike Ruberry
75a074abdc
1.6 Port: Dynamic Versioning (#40542)
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-06-30 10:18:18 -07:00
Deepali Chourasia
02e091902f Release DistAutogradContainer context for each dist_autograd test case (#38711)
Summary:
this fixes - https://github.com/pytorch/pytorch/issues/38710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38711

Differential Revision: D22132057

fbshipit-source-id: 894280d164543c63beaec679c18f2059e7055b01
2020-06-18 20:58:55 -07:00
Xiang Gao
954a59a2f5 Add at::tensor(complex) and torch::tensor(complex) overload (#39793)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39793

Differential Revision: D22067181

Pulled By: anjali411

fbshipit-source-id: 3cec1289a8aa3a9cc6bd1fcdb2974f858f75f7bd
2020-06-18 16:20:27 -07:00
Sotiris Lamprinidis
41f2dbde31 Add AdamW to C++ frontend (#40009)
Summary:
Slightly modified Adam, following the python implementation, and the `ProducesPyTorchValues` tests pass. I had a problem with another test though (see commit c1a6241676ab84fc531c1c3a10f964aa5704092e), it seems that optimizing for two steps with the same optimizer vs optimizing for two steps using freshly initialized objects will produce the same output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40009

Differential Revision: D22096053

Pulled By: glaringlee

fbshipit-source-id: a31a8f5488cb37c53752ddf15436efabdba67dc4
2020-06-18 15:28:12 -07:00
Mikhail Zolotukhin
79450edad3 [JIT] IRParser: properly parse negative numbers. (#39981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39981

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D22032786

Pulled By: ZolotukhinM

fbshipit-source-id: b6c5237ac5c1c331d5053a620eb9a37a4c698125
2020-06-15 12:28:41 -07:00
Jeremy Lilley
569c85b45d [futures] Add assert to Future constValue() accessor, add hasValue(). (#39950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39950

Per the comment in the code, constValue() should only be used in
the case where the future was complete and value was not an error.
Add an assert to enforce this.

Also, add hasValue() accessor for completeness.
ghstack-source-id: 105815597

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit:

Differential Revision: D22021776

fbshipit-source-id: b59b6c775eab344068a76f4cd8c3a9dc1f2a174e
2020-06-15 12:11:22 -07:00
Kurt Mohler
124cdf2290 Add experimental deterministic flag (#38683)
Summary:
Adds `torch.experimental.deterministic` flag to enforce deterministic algorithms across all of pytorch.
Adds `torch.experimental.deterministic_error_level` to allow users to choose between error/warning/silent if determinism for an operation is not available.
Adds `torch.experimental.alert_not_deterministic()` which should be called within operations that are not deterministic.
Offers both Python and ATen interfaces

Issue https://github.com/pytorch/pytorch/issues/15359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38683

Differential Revision: D21998093

Pulled By: ezyang

fbshipit-source-id: 23aabbddd20f6199d846f97764ff24d728163737
2020-06-12 08:44:06 -07:00
Jerry Zhang
004aa089a6 [jit][subgraph_rewriter] Support list of filters (#39867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39867

Support list of filters in subgraph rewriter, the rewrite will execute only
when the match passes all filter check. this is useful for different matches
to share the same filter.

Test Plan: Imported from OSS

Differential Revision: D22009855

fbshipit-source-id: 67aab8d6326b2011a9061397699dc62ee9ad4e2d
2020-06-12 08:24:49 -07:00
Christian Sarofeen
80e5ebf989 [nvFuser] Transform replay refactor and minor updates (#39579)
Summary:
We've got quite a few things going on, preparing a push back to upstream so we don't get too desynced.

- Major refactor of transform replay. It is now far more robust and fixes bugs discovered in reductions. Preparing for extension to explicit broadcast ops which will be the last major memory pattern for op coverage. Broadcast ops will allow us to express up to and potentially beyond norms and gemms.

- Initial runtime expression evaluator. This allows us to evaluate expressions at runtime. Will be useful for determining our grid/block layout at runtime, so we don't have to manually compute them according to the code we're trying to generate.

- Moving to int64 and double for scalar representations to match PyTorch JIT.

- Improvements in codegen interface where we return Tensor like object instead of parent class Val.

- Add `addcmul` and `lerp` ops

- General updates, fixes, test additions, test inprovements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39579

Differential Revision: D21974001

Pulled By: soumith

fbshipit-source-id: 7f7ccc91593466e948f3ce90f8f9b7fbc5c28de2
2020-06-11 23:04:24 -07:00
Nick Gibson
63dc1363e6 [TensorExpr] Eliminate Cond statements when each branch is a different kind of empty (#39754)
Summary:
Fix another simplification edge case, a Cond statement when one branch is nullptr and the other is a zero stmt block. This happens mostly with an if with no else branch where all statements inside the if are removed (eg via inlining or simplification). Common case is SplitWithMask -> ComputeInline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39754

Differential Revision: D21962987

Pulled By: nickgg

fbshipit-source-id: 2461415466fbbab88d2329061f90fcfdfa85e243
2020-06-11 17:08:14 -07:00
Nick Gibson
2b29feace4 [TensorExpr] Fix IRPrinter for function calls with uniqued names (#39753)
Summary:
IRPrinter was using the name_hint rather than the uniqued name when printing FunctionCalls, leading to output that appeared incorrect.

E.g. for the following set of tensorexprs:
```
producer[M, N] = M * N;
chunk[M, N/2] = producer[M, N];
chunk_1[M, N/2] = producer[M, N + N/2];
consumer[M, N] = chunk_1[M, N];
```

Before fix:
```
{
  for (int m = 0; m < 4; m++) {
    for (int n = 0; n < 20; n++) {
      producer[m, n] = m * n;
    }
  }
  for (int m_1 = 0; m_1 < 4; m_1++) {
    for (int n_1 = 0; n_1 < 10; n_1++) {
      chunk[m_1, n_1] = producer(m_1, n_1);
    }
  }
  for (int m_2 = 0; m_2 < 4; m_2++) {
    for (int n_2 = 0; n_2 < 10; n_2++) {
      chunk_1[m_2, n_2] = producer(m_2, n_2 + 10);
    }
  }
  for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 10; j++) {
      consumer[i, j] = i * (chunk(i, j));          <----- HERE!
    }
  }
}
```

After fix:
```
{
  for (int m = 0; m < 4; m++) {
    for (int n = 0; n < 20; n++) {
      producer[m, n] = m * n;
    }
  }
  for (int m_1 = 0; m_1 < 4; m_1++) {
    for (int n_1 = 0; n_1 < 10; n_1++) {
      chunk[m_1, n_1] = producer(m_1, n_1);
    }
  }
  for (int m_2 = 0; m_2 < 4; m_2++) {
    for (int n_2 = 0; n_2 < 10; n_2++) {
      chunk_1[m_2, n_2] = producer(m_2, n_2 + 10);
    }
  }
  for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 10; j++) {
      consumer[i, j] = i * (chunk_1(i, j));          <----- HERE!
    }
  }
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39753

Differential Revision: D21962441

Pulled By: nickgg

fbshipit-source-id: caa429caf0df9c7b16e109937412587bff6dc886
2020-06-11 12:13:28 -07:00
Nikolay Korovaiko
7f55197a57 Peel Loop (#39434)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39434

Differential Revision: D21857037

Pulled By: Krovatkin

fbshipit-source-id: 6583da167fe93d96e93f1c3d71f46f94e7f4e982
2020-06-10 13:48:18 -07:00
Yanan Cao
c22bbb2124 [JIT] Add Type::repr_str to return human-readable str (#39544)
Summary:
Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly

Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str().

annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated"

Closes https://github.com/pytorch/pytorch/issues/39449
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544

Differential Revision: D21978759

Pulled By: gmagogsfm

fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0
2020-06-10 12:01:24 -07:00
Elias Ellison
2193fa119e [JIT] consider side effects when trying moves in alias analysis (#39497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39497

Previously, we didn't consider side effects at all when moving nodes in alias analysis. It is never valid to reorder a node with a side effect. This has led to bugs when used with Bailouts.

Unfortunately this will might cause regressions but it wasn't correct prior :/

Test Plan: Imported from OSS

Differential Revision: D21963774

Pulled By: eellison

fbshipit-source-id: 656995d1b82534eca65437ed4e397b2bf08a4dec
2020-06-09 19:32:55 -07:00
Jeremy Lilley
be3bbfc917 [futures] Add collectAny() to ivalue::Future for completeness (#39597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39597

To complement collectAll(), this change adds collectAny(), and writes
up relevant unittest coverage.

We also remove the vector-based helper version of collectAll(), which
was debatable usefulness in a previous change.
ghstack-source-id: 105527180

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/...

Differential Revision: D21910311

fbshipit-source-id: dbb3ca404672a3d751b1b3cf016e6084a9ff8040
2020-06-09 16:32:52 -07:00
Jeremy Lilley
b83fed8d4c [futures] Add c++ ivalue::Future collectAll() helper (#39119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39119

Add some base c++ unittest coverage for ivalue::Future, and in
the process, add a basic collectAll() primitive, per 38937.

In the process, I realized that List<Future> is effectively
impossible to construct (since the Future's type is not templated,
but rather passed in,  the getTypePtr_<T>::call() isn't defined),
so added a workaround in List to make it possible.
ghstack-source-id: 105309650

Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/...

Differential Revision: D21756884

fbshipit-source-id: 5d40c8d1c55098de5497655c7b887f4f56508a37
2020-06-08 05:52:09 -07:00
Linbin Yu
b28422d444 add overload name for str cmp (#39607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39607

add overload name for strcmp macro to prevent duplicated op names in lite interpreter

also reformatted some other files

Test Plan:
verified these op schema are changed

```
-aten::eq(str a, str b) -> (bool)
+aten::eq.str(str a, str b) -> (bool)

-aten::ne(str a, str b) -> (bool)
+aten::ne.str(str a, str b) -> (bool)

-aten::lt(str a, str b) -> (bool)
+aten::lt.str(str a, str b) -> (bool)

-aten::gt(str a, str b) -> (bool)
+aten::gt.str(str a, str b) -> (bool)

-aten::le(str a, str b) -> (bool)
+aten::le.str(str a, str b) -> (bool)

-aten::ge(str a, str b) -> (bool)
+aten::ge.str(str a, str b) -> (bool)
```

Reviewed By: iseeyuan

Differential Revision: D21913049

fbshipit-source-id: 518db068c8c5b0efd19223f0bd94fc3351335dc4
2020-06-06 23:21:35 -07:00
Jerry Zhang
3669e45736 [jit][subgraph_matcher] Enable regex matching for string attributes of node (#39454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39454

Test Plan: Imported from OSS

Differential Revision: D21876224

fbshipit-source-id: c0fdff3a4532d2a73b222353e2cad6cf52444697
2020-06-05 23:03:38 -07:00
Nikolay Korovaiko
97a2918a07 reduce number of bailout nodes (#38281)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38281

Differential Revision: D21665509

Pulled By: Krovatkin

fbshipit-source-id: c2c34b759aec30d0a161e582030ba994192ee4ec
2020-06-05 13:45:37 -07:00
Nick Gibson
d31e84497c [TensorExpr] some cleanups / fixes for LoopOptions (#39408)
Summary:
Mainly, fix a bug in the HashProvider where it would not include LoopOptions in the hash, meaning two loops would be seen as identical even if they were bound to different thread/block axes. Also added symbolic names for the different axis options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39408

Differential Revision: D21864494

Pulled By: nickgg

fbshipit-source-id: 9c28729984e7a3375e026c78294c9f75b9015123
2020-06-03 15:11:59 -07:00
Nick Gibson
2ed4ed8733 [TensorExpr] Fix two bugs in Rfactor (#39268)
Summary:
The two bugs were:
* Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this.
* The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases.

Also cleaned up tests, removed some helper code we don't need etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39268

Differential Revision: D21864489

Pulled By: nickgg

fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c
2020-06-03 14:38:34 -07:00
Ilia Cherniavskii
abe2be2063 [resubmit] Use TensorMethods.cpp (#39385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39385

see https://github.com/pytorch/pytorch/pull/37639

Test Plan:
https://github.com/pytorch/pytorch/pull/37639

Imported from OSS

Differential Revision: D21833287

fbshipit-source-id: 9928d3f4122903d0de67ad312e349352d5f5c19c
2020-06-02 20:27:51 -07:00
Nick Gibson
36607c85ee [TensorExpr] eliminate zero length Allocations in IRSimplifier (#38794)
Summary:
If the size of a temporary buffer is reduced to zero via binding of a dynamic variable we still run the alloc, even though it is a no op. It's easy to strip these out during simplification, so the expr:
```
{
  Allocate(x, int, {0});
  // Stuff...
  Free(x);
}
```
becomes
```
{
  // Stuff...
}
```

I am assuming here that if the allocation size is zero then any usage of the buffer is also eliminated since theres no safe way to refer to a zero size buffer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38794

Differential Revision: D21723656

Pulled By: nickgg

fbshipit-source-id: 3eaa8bd8974a13b0a351be04abe2348498b31b02
2020-06-02 18:24:42 -07:00
Edward Yang
2fe0fc2684 Revert D21374247: Use TensorMethods.cpp
Test Plan: revert-hammer

Differential Revision:
D21374247

Original commit changeset: 076964415079

fbshipit-source-id: 732ec8c561d1f37475c1b5549ba79c718e3a6db8
2020-06-01 08:12:09 -07:00
Nick Gibson
5153cdbe87 [TensorExpr] fix a bug in ReorderAxis when there are trailing loops (#38841)
Summary:
Fixes a bug in reorder axis where we append the new reordered loops to the enclosing block, even if there were statements after it. e.g. with 3 Computes:
```
for (int m1 ...
  for (int n1 ...
    for (int k1 ...
      Body 1
for (int m2 ...
  for (int n2 ...
    for (int k2 ...
      Body 2
for (int m3 ...
  for (int n3 ...
    for (int k3 ...
      Body 3
```

If we reorder loops m2 and k2, we were also reordering the body statements like this:

```
for (int m1 ...
  for (int n1 ...
    for (int k1 ...
      Body 1
for (int m3 ...
  for (int n3 ...
    for (int k3 ...
      Body 3
for (int k2 ...
  for (int n2 ...
    for (int m2 ...
      Body 2
```

This is because we always append the new loops to their parent. This PR fixes the logic to replace the old loop root with the new loop, which keeps things consistent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38841

Differential Revision: D21723670

Pulled By: nickgg

fbshipit-source-id: 1dee8bb153182fcaa2cabd948197577e8e80acd7
2020-05-31 22:22:45 -07:00
Ilia Cherniavskii
68e62b9ab6 Use TensorMethods.cpp (#37639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37639

Changing TensorMethods.h to .cpp
Necessary to avoid incomplete types in dispatcher

Test Plan:
CI

Imported from OSS

checked mobile size, no change, small reduction in size in fbios
fbios: Succeeded
Change in Download Size for arm64 + 3x assets variation: -18.2 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -8.8 KiB

reran benchmark, no stat. significant difference

buck run mode/opt caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:benchmark_torchscript_model -- --model_file caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt --num_runs 3

╷ @  68592d0d  41 minutes ago  iliacher  D21374247
╭─╯  Use TensorMethods.cpp

Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt.
Links to the results:

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1729113760

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/3867976782

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2782186766

hg prev

@  7f501b42  Thursday at 14:26  bvaughan  D21764704
╷  short-circuit pow for complex 1 and 0 exponents

Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt.
Links to the results:

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2155256332

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1802057074

* Adhoc run: https://our.intern.facebook.com/intern/aibench/details/4119590830

Differential Revision: D21374247

fbshipit-source-id: 076964415079cf84fb57f1f7b43d087afed86e1d
2020-05-31 17:11:12 -07:00
Ilia Cherniavskii
a5e023f28a Set RecordFunction id only when needed (#39265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39265

In this PR we set id of RecordFunction only when callbacks need them and when
there's at least one active callback

Test Plan:
testRecordFunction unit test in test_misc.cpp
buck test mode/dev caffe2/test/cpp/jit:jit

https://our.intern.facebook.com/intern/testinfra/testrun/8725724291116413

Reviewed By: dzhulgakov

Differential Revision: D21790421

fbshipit-source-id: 016623d7f1a2a271921a71c0483061e232b40321
2020-05-29 15:34:44 -07:00
lixinyu
a04fb2ab22 [Reland] add xenial + cuda 9.2 + gcc 5.4 CI test (#39036)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39036

Test Plan: Imported from OSS

Differential Revision: D21731026

Pulled By: glaringlee

fbshipit-source-id: ae678f786f95e3687ed6b3f176fe6736a436c721
2020-05-28 19:48:18 -07:00
Luca Wehrstedt
72f2ff5950 [TensorPipe] Improve serialization (#39010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39010

The initial version of the serialization for the TensorPipe RPC agent (i.e., the conversion from rpc::Message to tensorpipe::Message) worker around a limitation of TensorPipe of only allowing one payload per message by pickling each tensor separately and storing the pickles as metadata (which is a less efficient way of sending data over, as it goes through more copies). Having now lifter that limitation we can now improve the way we serialize. We now put the type and the id as their own payloads, we do a single pickling pass for all the tensors of the message (which allows us to deduplicate them) and store the pickle as a payload. My impression is that pickling is a somewhat costly operation, so reducing the number of times we do it should be beneficial for performance. For this same reason, another change I've done here is separate the allocation of the buffers from the deserialization. This will allow us (in the future) to perform the allocation on the I/O event loop but perform the unpickling in the worker thread, thus keeping the event loop more responsive.
ghstack-source-id: 104810740

Test Plan: RPC tests

Differential Revision: D21716067

fbshipit-source-id: c1475cc78afdcf0820a485ffd98c91abb35796c7
2020-05-28 10:48:24 -07:00
Luca Antiga
e088902b4a Add type-hint check for default arguments in TorchScript C++ frontend (#39021)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/39020 by requiring users to type-hint default arguments to a TorchScript when using the C++ frontend (the Python frontend will insert those automatically).

Since this is a bit of a niche use case, I opted for the simpler solution of making type-hints mandatory for default arguments, as opposed to trying to type-infer them. I left a comment in the code justifying this choice.

Test is included.

/cc t-vi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39021

Differential Revision: D21755317

Pulled By: suo

fbshipit-source-id: e007650d3bfb3a4c58c25ad2c3a17759898f303b
2020-05-28 01:42:04 -07:00
Nick Gibson
cf8001d2d0 [TensorExpr] Fix a bug in Rfactor when there are multiple reductions (#38733)
Summary:
In `LoopNest::rfactor` we assume that there is only a single reduction below the insertion point, and when replacing the reduction we recursively replace all reductions below that point. This is not a safe assumption, as a number of transformations can introduce additional ReduceOps - most directly a `splitWithTail` on the innermost reduce axis.

This PR fixes that bug, and adds some unit tests covering the case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38733

Differential Revision: D21723634

Pulled By: nickgg

fbshipit-source-id: 3ed6ffcdc2c15aef7504f9b2b91e8d827e0b5d88
2020-05-27 16:49:34 -07:00
Nikita Shulga
c6e9e9359f [Codemod][GleanFbcode] Remove dead includes in caffe2/test (#39023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39023

Reviewed By: orionr

Differential Revision: D21702529

fbshipit-source-id: 6945bba95609102409850b105a8a091e33b8acc9
2020-05-27 14:07:26 -07:00
Nick Gibson
a25062ab50 [TensorExpr] Fix elimination of For loops with empty bodies (#38883)
Summary:
We do try to eliminate empty For loops, but missed a case where the body Block exists but is empty. In that case we can eliminate the loop as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38883

Differential Revision: D21723680

Pulled By: nickgg

fbshipit-source-id: 49610b0524af5b9ec30ef3b4cc0c8461838259c3
2020-05-26 18:58:57 -07:00
Christian Sarofeen
8e69c3be17 [nvFuser] Reduction support in codegen, fp16 support (#38627)
Summary:
Adds reduction  support for the code generator. Reductions are fully supported with split/merge/reorder/rfactor/computeAt/unroll operators. There is also cross thread (intra-block) reduction support.

The two remaining pieces missing for reduction support is:
- Safety: If cross thread reduction was used, child operators shouldn't be able to bind that thread dim anymore
- Cross block reduction: we will want inter-block reduction support to match parity with tensor iterator

PR also provides FP16 support for fusions now. We insert casts on FP16 inputs to FP32, and we insert casts to FP16 on FP16 outputs.

Also working towards reductions and shape inference for reductions in the fusion pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38627

Reviewed By: albanD

Differential Revision: D21663196

Pulled By: soumith

fbshipit-source-id: 3ff2df563f86c39cd5821ab9c1148149e5172a9e
2020-05-21 17:18:39 -07:00
Jerry Zhang
a8d8fc5532 [quant][graphmode] Different rule for add/add_/mul/mul_ (#38667)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38667

Test Plan: Imported from OSS

Differential Revision: D21633555

fbshipit-source-id: 03b0298e83bf4dbda41b048c0edc7bb92cd4e1df
2020-05-20 19:43:46 -07:00
Michael Voznesensky
f6f1384811 [JIT] Refactor attributes to support buffers and parameters as first class citizens, add support for iterating over named_buffers() (#37905)
Summary:
First part of https://github.com/pytorch/pytorch/issues/36211 - still a WIP, but asking for commentary to ensure this is the direction we want to go in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37905

Differential Revision: D21633735

Pulled By: voznesenskym

fbshipit-source-id: f4e4302e40114513776c9e48867a90d72049e2e9
2020-05-18 23:23:43 -07:00
Nick Gibson
2f21dfb541 [TensorExpr] Eager reduction initialization & removal from ReduceOp (#38585)
Summary:
This PR removes the deferred initializer field from ReduceOp in favour of eagerly initializing buffers when they are created (either in the constructor of `LoopNest`, or in `rfactor()`). This allows a pretty good simplification of reduction logic, removing almost all of the reduction expander and the ReduceInitCleaner & unpopular NoOp node added in the last fix.

Eager initialization is better for us anyway because it allows more opportunities to transform the initialization loop.

Added a few more tests, testReduceOverSplitWithTail failed before this change due to a bug in splitWithTail which now can't happen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38585

Differential Revision: D21621551

Pulled By: nickgg

fbshipit-source-id: 378137e5723b4a6d6e390239efb12adce22a8215
2020-05-18 15:56:43 -07:00
Mikhail Zolotukhin
b29e7f9b9d [TensorExpr] Use couldMoveBefore instead of couldMoveAfter checks in the fuser pass, add CPP tests. (#38592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38592

I'm not sure that using couldMoveAfter was incorrect, but using
couldMoveBefore is more consistent with other subgraph-extraction
passes (old fuser, create autodiff graphs, etc.), so it would make it
easier to unify their implementations after this change.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D21607856

Pulled By: ZolotukhinM

fbshipit-source-id: 970583af7859889d48aacf620ae028258e37a75f
2020-05-18 13:40:59 -07:00
Nick Gibson
8bf3124572 [TensorExpr] Fix bug when splitting inner reduce axis with tail (#38420)
Summary:
Fixes a bug in the following code:
```
    Tensor* c = Reduce("sum", {{10, "m"}}, Sum(), b, {{10, "n"}, {10, "k"}});
    // split N loop with tail:
    loop.splitWithTail(loop.getLoopStmtsFor(c)[1], 8, &outer, &inner, &tail);
```

When this is expanded there are two ReduceOps:

```
for (int m = 0; m < 10; m++) {
  for (int n_outer = 0; n_outer < (10 - 0) / 8; n_outer++) {
    for (int n_inner = 0; n_inner < 8; n_inner++) {
      for (int k = 0; k < 10; k++) {
        sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_outer * 8 + n_inner, k]), out_args={m}, reduce_args={n_inner, n_outer, k});
      }
    }
  }
  for (int n_tail = 0; n_tail < (10 - 0) % 8; n_tail++) {
    for (int k = 0; k < 10; k++) {
      sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_tail + ((10 - 0) / 8) * 8, k]), out_args={m}, reduce_args={n_tail, k});
    }
  }
}
```

But each ReduceOp will expand it's initializer, which in this case will overwrite the sum of the split loop:

```
for (int m = 0; m < 10; m++) {
  sum[m] = 0.f;
  for (int n_inner = 0; n_inner < 8; n_inner++) {
    for (int k = 0; k < 10; k++) {
      sum[m] = (sum[m]) + (b[(100 * m + k) + 10 * n_inner]);
    }
  }
  sum[m] = 0.f;          <------- *HERE*
  for (int n_tail = 0; n_tail < 2; n_tail++) {
    for (int k = 0; k < 10; k++) {
      sum[m] = (sum[m]) + (b[((100 * m + k) + 10 * n_tail) + 80]);
    }
  }
}
```

The simplest fix is to remove the initializer from the tail loop, which requires adding support for Reductions without an initializer (I did via adding a NoOp Expr rather than handling nullptr). Also moved the ReductionExpander from loopnest.cpp to reduction.h as loopnest is getting a bit heavy.

Added tests for all kinds of splits on a simple 3D reduction to verify no more problems of this type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38420

Differential Revision: D21587583

Pulled By: nickgg

fbshipit-source-id: e0766934481917007119612eb60cc76c3242e44a
2020-05-14 22:58:28 -07:00
Michael Suo
0d220ef381 [torchbind] Better error message when missing init. (#37474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37474

Previously we would segfault

Test Plan: Imported from OSS

Differential Revision: D21297542

Pulled By: suo

fbshipit-source-id: c7e2f828a250c490ec23fb51c6a4a642d3370e52
2020-05-13 17:38:31 -07:00
Mikhail Zolotukhin
6e13146d96 [TensorExpr] TensorExprKernel: don't do any compilation or lowering in run(). (#37948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37948

The input JIT graph has all the information we need to perform the
entire compilation at the construction time. We don't need to postpone
any steps until the execution time. Also, from the graph we always know
what device we will be executing on and thus we don't need to have a
CodeGen cache in TensorExprKernel - we always have one and only one
CodeGen.

Test Plan: Imported from OSS

Reviewed By: protonu

Differential Revision: D21432145

Pulled By: ZolotukhinM

fbshipit-source-id: 8dc86b891713056b2c62f30170cd4a168912f027
2020-05-13 14:02:23 -07:00
Ilia Cherniavskii
43dd8760d7 Move ThreadLocalDebugInfo to c10 (#37774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774

Move ThreadLocalDebugInfo from ATen to C10

Test Plan: Imported from OSS

Differential Revision: D21384249

Pulled By: ilia-cher

fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2
2020-05-11 19:27:41 -07:00
James Reed
a553935e3c [JIT] Expose magic methods on script::Object (#38167)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38167

Test Plan: Imported from OSS

Differential Revision: D21486709

Pulled By: jamesr66a

fbshipit-source-id: 17b44d979fc658768b0d64f7d8af6fb684043ea3
2020-05-11 15:01:15 -07:00
Nick Gibson
33f4fca1a6 [TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606)
Summary:
Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block.

This complicates some tests but simplifies the IR by restricting where variable binding can occur.

I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606

Differential Revision: D21467483

Pulled By: nickgg

fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688
2020-05-09 16:23:37 -07:00
Nick Gibson
ad433e2003 [TensorExpr] Fix a bug in the IR Simplifier that could introduce a division by zero (#38055)
Summary:
In the IR Simplifier when doing partial factorization of Round+Mod patterns we divide by the lower number, which could be zero. Add in a quick check against zero avoid the crash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38055

Differential Revision: D21478486

Pulled By: nickgg

fbshipit-source-id: c5083f672e91662b7d1271d817cade7fa6c39967
2020-05-08 14:58:53 -07:00
Nick Gibson
f2f8027760 [TensorExpr] simplify trivial adds/subs/muls even in Float (#37960)
Summary:
The IR Simplifier early exits when working with dtypes that are not safe to reorder. There are some cases where we still want to simplify ops in these dtypes: x + 0,  x - 0, x * 0 and x * 1.  It's safe to eliminate the op here and it reduces clutter in the expr.

Also added a quick simplification of casts which do nothing (their type is the same as the underlying).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37960

Differential Revision: D21457736

Pulled By: nickgg

fbshipit-source-id: 40e20a3b55fc1afb2ec50071812238a08bded2ac
2020-05-07 17:23:47 -07:00
Ilia Cherniavskii
facc5e0cc4 Make profiler thread local (#36291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36291

Move profiler state to be a thread local property,
reuse existing thread local propagation mechanism to ensure
correct profiling of async tasks. This also makes
push/pop callback thread safe and easier to use in e.g.
distributed profilier

Test Plan:
USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install
./build/bin/test_jit

./build/bin/test_jit
python test/test_autograd.py
python test/test_jit.py

Differential Revision: D20938501

Pulled By: ilia-cher

fbshipit-source-id: c0c6c3eddcfea8fc7c14229534b7246a0ad25845
2020-05-07 14:52:49 -07:00
Ilia Cherniavskii
2ef4010593 Propagate TLS callbacks with ThreadLocalState (#37745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37745

This PR makes it possible to set TLS callbacks and use
them transparently not only in the main thread but also
in any async tasks

Test Plan: Imported from OSS

Differential Revision: D21374873

Pulled By: ilia-cher

fbshipit-source-id: 3be2e121673b32d7694e17e794f3b474826dffe9
2020-05-07 14:52:44 -07:00
Ilia Cherniavskii
2d708cefcc Move RecordFunction into ATen (#37548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548

Moving RecordFunction from torch::autograd::profiler into at namespace

Test Plan:
CI

Imported from OSS

Differential Revision: D21315852

fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
2020-05-07 14:52:39 -07:00