Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793
Approved by: https://github.com/huiguoo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390
This class didn't add much value and only caused more boilerplate code.
This change removes the class and updates all the use cases with
uses of `ExprHandle`.
A side effect of this change is different names in loop variables, which
caused massive mechanical changes in our tests.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D34030296
Pulled By: ZolotukhinM
fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108
(cherry picked from commit c2ec46a058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D31705358
fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887
BufHandle has exactly the same functionality and should be used instead.
Differential Revision:
D30889483
D30889483
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763
Simplification pattern:
x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N).
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D30845854
Pulled By: huiguoo
fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587
Now that there is no classes using KernelArena for memory management we
can remove it.
Differential Revision:
D30429115
D30429115
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586
This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.
After this change nothing uses KernelScope/KernelArena and they can be
safely removed.
Differential Revision:
D30429114
D30429114
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197
This solves non-determinism from using hash values in sort methods.
Changes in tests are mostly mechanical.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292776
Pulled By: ZolotukhinM
fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195
This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.
The changes are mechanical and should not affect any functionality.
With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`
Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292779
Pulled By: ZolotukhinM
fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336
This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.
This is the first step in making all NNC mutations in-place.
Test Plan: Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30049829
Pulled By: navahgar
fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825
The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.
Differential Revision: D27717776
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54121
It would be nice to do range analysis to determine if a condition
cannot be satisfied. These are some tests that we should be able to turn on
once we have this feature.
ghstack-source-id: 124116847
Test Plan: Simplify.*LoopBounds
Reviewed By: ZolotukhinM
Differential Revision: D27107956
fbshipit-source-id: bb27e3d3bc803f0101c416e4a351ba2278684980
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53861
Replaced the iterators in the for-loops with integer index variables due to
overflow when handling empty vectors.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D26998894
Pulled By: huiguoo
fbshipit-source-id: a1f6475c8ba123968ef7247b4f6f38edbf24b9ef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697
Mostly mechanical move. This refactoring helps to hide unnecessary
details from the SimpleIREval interface and make it more similar to a
pure 'codegen'.
Test Plan: Imported from OSS
Reviewed By: nickgg
Differential Revision: D25668696
Pulled By: ZolotukhinM
fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c
Summary:
GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49379
Reviewed By: ezyang
Differential Revision: D25597115
Pulled By: navahgar
fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357
This is a follow-up fix for PR #48679, where the previous PR
adds support for integer inputs to aten::abs by promoting integers to
float and then demote the result back to integers. This PR supports
integer inputs to aten::abs more efficiently in the SimpleIREvaluator
by allowing implementing integer inputs for kAbs (renamed from kFabs).
- Rename kFabs to kAbs
- Add support for integer input to kAbs in SimpleIREvalator (note that:
llvm_codegen and cuda_codegen already supports integer inputs to kAbs)
Test Plan:
- `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py
TestTEFuser.test_unary_ops`
- `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops`
Imported from OSS
Reviewed By: eellison
Differential Revision: D25545791
fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160
We no longer use the custom c++ test infra anyways, so move to pure
gtest.
Fixes#45703
ghstack-source-id: 116977283
Test Plan: `buck test //caffe2/test/cpp/tensorexpr`
Reviewed By: navahgar, nickgg
Differential Revision: D25046618
fbshipit-source-id: da34183d87465f410379048148c28e1623618553
Summary:
Adds new rules to the NNC IRSimplifier to take care of the following cases:
* Comparisons which are symbolic but have a constant difference. E.g. this is most useful in cases like `if (x > x + 4) ...` which we can now eliminate.
* Simplification of `Mod` nodes, including simple rules such as `0 % x` and `x % 1`, but also factorization of both sides to find common symbolic multiples. E.g. `(x * y) % x` can be cancelled out to `0`.
See tests for many more examples!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46412
Reviewed By: navahgar
Differential Revision: D24396151
Pulled By: nickgg
fbshipit-source-id: abb954dc930867d62010dcbcd8a4701430733715
Summary:
Fixes a crash bug in the IRSimplifier when the LHS is a Term (e.g. 2x) and the RHS is a Polynomial (e.g. 2x+1).
This case crashes 100% of the time so I guess it's not very common in models we've been benchmarking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46108
Reviewed By: agolynski
Differential Revision: D24226593
Pulled By: nickgg
fbshipit-source-id: ef454c855ff472febaeba16ec34891df932723c0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520
With this change `Load`s and `Store`s no longer accept `Placeholder`s in
their constructor and `::make` functions and can only be built with
`Buf`.
`Placeholder` gets its own `store`, `load`, `storeWithMask`, and
`loadWithMask` method for more convenient construction.
Test Plan: Imported from OSS
Reviewed By: glaringlee
Differential Revision: D23998789
Pulled By: ZolotukhinM
fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912
Summary:
We need to check if dtypes differ in scalar type or lanes to decide between
Cast and Broadcast.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179
Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander
Reviewed By: bwasti
Differential Revision: D23873316
Pulled By: asuhan
fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620
Summary:
combineMultilane used the wrong order when ramp was on the left hand side,
which matters for subtract.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157
Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast
Reviewed By: ailzhang
Differential Revision: D23851751
Pulled By: asuhan
fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858
Summary:
A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled.
The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads.
To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place.
Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909
Reviewed By: agolynski
Differential Revision: D23800565
Pulled By: nickgg
fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f
Summary:
Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g.
```
if (i < 10) {
do_thing_1;
} else {
do_thing_2;
}
if (i < 10) {
do_thing_3;
}
```
is transformed into:
```
if (i < 10) {
do_thing_1;
do_thing_3;
} else {
do_thing_2;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886
Reviewed By: glaringlee
Differential Revision: D23768565
Pulled By: nickgg
fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df
Summary:
Adds a new optimization to the IRSimplifier which changes this pattern:
```
for ...
if ...
do thing;
```
into:
```
if ...
for ...
do thing;
```
Which should be almost strictly better.
There are many cases where this isn't safe to do, hence tests. Most obviously when the condition depends on something modified within the loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764
Reviewed By: mruberry
Differential Revision: D23734463
Pulled By: nickgg
fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36
Summary:
This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context.
The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it.
I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231
Reviewed By: albanD
Differential Revision: D23689688
Pulled By: nickgg
fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9
Summary:
Improve simplification of nested Min and Max patterns.
Specifically, handles the following pattern simplications:
* `Max(A, Max(A, Const)) => Max(A, Const)`
* `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))`
* `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))`
- This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)`
Similarly, for the case of Min as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142
Reviewed By: albanD
Differential Revision: D23644486
Pulled By: navahgar
fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4
Summary:
A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two.
This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches.
This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs).
This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list:
* When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body.
* When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined.
* `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885
Reviewed By: gmagogsfm
Differential Revision: D23503636
Pulled By: nickgg
fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa