Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48213
it was completely broken unless rhs was a constant.
Test Plan: new unit test in test_jit_fuser_te.py
Reviewed By: eellison
Differential Revision: D25071639
fbshipit-source-id: ef1010a9fd551db646b83adfaa961648a5c388ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48085
We were treating it as a binary operator, which implies shape
broadcasting, even though the second arg is thrown away aside from the type.
Treating it as a unary is the proper approach.
ghstack-source-id: 116873680
Test Plan: new unit test
Reviewed By: ZolotukhinM
Differential Revision: D25017585
fbshipit-source-id: 0cfa89683c9bfd4fbb132617c74b47b268d7f368
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48084
as title
ghstack-source-id: 116870328
Test Plan: new unit test
Reviewed By: Krovatkin
Differential Revision: D25017489
fbshipit-source-id: 0d1998fccad6f509db04b6c67a4e4e4093d96751
Summary:
NNC lowering of aten::pow assumes that the types of the exponent is either float or int cast to to float, which doesn't work great with double (or half for that matter).
Fixes https://github.com/pytorch/pytorch/issues/47304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47795
Reviewed By: ZolotukhinM
Differential Revision: D24904201
Pulled By: nickgg
fbshipit-source-id: 43c3ea704399ebb36c33cd222db16c60e5b7ada5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47374
A few small fixes needed to enable unary op cpu testing. If reviewers would prefer I split them up let me know.
Test Plan: Imported from OSS
Reviewed By: ansley
Differential Revision: D24805248
Pulled By: eellison
fbshipit-source-id: c2cfe2e3319a633e64da3366e68f5bf21d390cb7
Summary:
Fix an issue with the TensorExpr lowering of aten::remainder with integral inputs. We were always lowering to fmod and never to Mod.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47611
Reviewed By: bertmaher, heitorschueroff
Differential Revision: D24846929
Pulled By: nickgg
fbshipit-source-id: adac4322ced5761a11a8e914debc9abe09cf5637
Summary:
This diff adds support for `log_softmax` op in NNC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47409
Reviewed By: ejguan
Differential Revision: D24750203
Pulled By: navahgar
fbshipit-source-id: c4dacc7f62f9df65ae467f0d578ea03d3698273d
Summary:
This diff enables inlining for all non-output buffers, including the intermediate buffers that are created as part of an op. However, the buffers that correspond to reductions will not be inlined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47258
Reviewed By: anjali411
Differential Revision: D24707015
Pulled By: navahgar
fbshipit-source-id: ad8b03e38497600cd69980424db6d586bf93db74
Summary:
This diff enables inlining producers into reductions. It also guards against inlining reductions themselves.
Prior to this diff, if there was a reduction in the loopnest, no inlining was happening. After this change, we will inline all non-output buffers that do not correspond to a reduction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47020
Reviewed By: albanD
Differential Revision: D24644346
Pulled By: navahgar
fbshipit-source-id: ad234a6877b65be2457b734cbb7f3a1800baa6a5
Summary:
This is the second attempt at replacing flatten tensors with flatten loops in `TensorExprKernel::generateStmt`. The first attempt (https://github.com/pytorch/pytorch/pull/46539) resulted in a build failure due to an exception that gets thrown during inline.
The reason for the build failure was because there was an inline step, which was supposed to happen on the unflattened tensors. This was necessary earlier because for every flattened tensor there was an unflattened tensor which had to be inlined. That is no longer necessary since we do not have 2 tensors (flattened and unflattened) now. Removed this inline.
Checked python and cpp tests on CPU as well as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46737
Reviewed By: anjali411, izdeby
Differential Revision: D24534529
Pulled By: navahgar
fbshipit-source-id: 8b131a6be076fe94ed369550d9f54d3879fdfefd
Summary:
This diff changes `TensorExprKernel::generateStmt` to use flatten loops instead of flatten tensors.
Checked all tests on CPU as well as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46539
Reviewed By: nickgg
Differential Revision: D24395956
Pulled By: navahgar
fbshipit-source-id: f3792903f2069bda37b571c9f0a840e6fb02f189
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45791
Most of the lowering for log1p and lgamma already existed, add JIT integration.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D24169536
Pulled By: eellison
fbshipit-source-id: a009c77a3471f3b5d378bad5de6d8e0880e9da3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520
With this change `Load`s and `Store`s no longer accept `Placeholder`s in
their constructor and `::make` functions and can only be built with
`Buf`.
`Placeholder` gets its own `store`, `load`, `storeWithMask`, and
`loadWithMask` method for more convenient construction.
Test Plan: Imported from OSS
Reviewed By: glaringlee
Differential Revision: D23998789
Pulled By: ZolotukhinM
fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390
Tensor objects should always refer to their Function's bufs. Currently
we never create a Tensor with a buffer different than of its function,
but having it in two places seems incorrect and dangerous.
Differential Revision: D23952865
Test Plan: Imported from OSS
Reviewed By: nickgg
Pulled By: ZolotukhinM
fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a
Summary:
For integral types, isnan is meaningless. Provide specializations for
maximum and minimum which don't call it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984
Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops
Reviewed By: ezyang
Differential Revision: D23885259
Pulled By: asuhan
fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437
Summary:
Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover,
such semantics can be implemented by the client code through insertion of
explicit casts to widen and narrow to the desired types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677
Test Plan:
test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic
python test/test_jit_fuser_te.py
Reviewed By: agolynski
Differential Revision: D23801412
Pulled By: asuhan
fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36
Summary:
This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context.
The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it.
I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231
Reviewed By: albanD
Differential Revision: D23689688
Pulled By: nickgg
fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9
Summary:
A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two.
This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches.
This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs).
This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list:
* When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body.
* When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined.
* `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885
Reviewed By: gmagogsfm
Differential Revision: D23503636
Pulled By: nickgg
fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972
It is useful when debugging a bug to disable NNC backend to see whether
the bug is there or in the fuser logic.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D23455624
Pulled By: ZolotukhinM
fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43097
Boolean arguments weren't promoted, so if you tried to write a comparison with
types such as `Tensor(Bool) == Int` you'd fail typechecking inside the TE
engine.
Test Plan: Imported from OSS
Reviewed By: protonu, zheng-xq
Differential Revision: D23167926
Pulled By: bertmaher
fbshipit-source-id: 47091a815d5ae521637142a5c390e8a51a776906
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42202
Currently we used the template in order to be able to take both
`std::vector<ExprHandle>` and `std::vector<VarHandle>`. However,
semantics of this function tells that the only allowed option should be
the former one: we're specifying indices for the tensor access we want
to generate. While it could be convenient to avoid conversion from
vector of vars to a vector of exprs at the callsites, it makes the code
less explicit and thus more difficult to reason about.
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D22806429
Pulled By: ZolotukhinM
fbshipit-source-id: 8403af5fe6947c27213050a033e79a09f7075d4c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41451
Since TE operates on a limited subset of ops with a well-defined
semantics, we can easily infer shapes of intermediate and output tensors
given shapes of the inputs.
There is a couple of ops that are not yet supported in the shape
inference, once we add them we could relax the shape info requirements
in the TE fuser: currently it requires all values in the fusion group to
have shapes known and we can change it to only inputs.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D22543470
Pulled By: ZolotukhinM
fbshipit-source-id: 256bae921028cb6ec3af91977f12bb870c385f40
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42201
Previously, we've been using operators <, >, ==, et al. and relied on
the dtype to be picked automatically. It led to a wrong dtype being
picked for the result, but that choice was overwritten by the type
explicitly specified in JIT IR, which we were lowering. Now we are
moving towards using shape inference instead of relying on all types
being specified in the IR, and that made this issue to immediately pop
up.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D22806428
Pulled By: ZolotukhinM
fbshipit-source-id: 89d2726340efa2bb3da45d1603bedc53955e14b9