Commit Graph

53 Commits

Author SHA1 Message Date
Bert Maher
2e6221a232 [nnc] Make 64-bit dimensions work (#64077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077

We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272

Test Plan: unit tests; new IR level test with huge sizes

Reviewed By: ZolotukhinM

Differential Revision: D30596689

fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
2021-08-28 19:59:47 -07:00
Raghavan Raman
6d31ba6ddc [nnc] Sanitized the names of constants in the input graph. (#63990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923

The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990

Reviewed By: ZolotukhinM

Differential Revision: D30558432

Pulled By: navahgar

fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
2021-08-26 09:52:02 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Mikhail Zolotukhin
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
Bert Maher
37d60c08e5 Revert D30360382: [nnc] Support thread level parallelism in fused kernels
Test Plan: revert-hammer

Differential Revision:
D30360382 (d6d86efb1c)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438
2021-08-21 03:46:43 -07:00
Bert Maher
d6d86efb1c [nnc] Support thread level parallelism in fused kernels (#63386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6
2021-08-20 11:18:17 -07:00
Mikhail Zolotukhin
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Mikhail Zolotukhin
3bfe15085d [TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804

The lowerings are stored as a map c10::Symbol -> std::function and the
signature of thoese functions match the signature of
`computeOperandValue`. Custom lowerings have higher priority over the
standard ones, i.e. we can redefine how a given op is lowered.

In general this feature is aimed at unblocking users whose models
contain ops that are not yet supported by NNC - it allows to quickly add
a custom lowering for a given op.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D29409580

Pulled By: ZolotukhinM

fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60
2021-06-27 15:27:22 -07:00
Mikhail Zolotukhin
eb36f67dcc [TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041

Differential Revision:
D29146709
D29146709

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f
2021-06-17 01:23:24 -07:00
Mikhail Zolotukhin
daa35141e8 Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508

An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28918342

Pulled By: ZolotukhinM

fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
2021-06-08 22:48:17 -07:00
Nikita Shulga
ba3a90b55e Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors.
Test Plan: revert-hammer

Differential Revision:
D28819780

Original commit changeset: f3feff35a1ce

fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a
2021-06-04 19:25:30 -07:00
Mikhail Zolotukhin
d60efd8207 [TensorExpr] Fix handling of 0-dim tensors. (#59279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279

There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.

Differential Revision:
D28819780
D28819780

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
2021-06-04 13:58:15 -07:00
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Mikhail Zolotukhin
dedaf4fad7 Reland: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel. (#57560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57560

The new methods allow to peak into bufferArgs which describe parameters
that codegen expects. This description includes info whether a given
parameter is a scalar var or a buffer and in case it's a buffer allows
to get the corresponding `Buf*` pointer from which we could get the
expected sizes.

Relanding #57074  which was reverted because I forgot to guard a new
test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28199048

Pulled By: ZolotukhinM

fbshipit-source-id: 636e838e7e242a3c63e97ec453b8fae9b6380231
2021-05-05 09:11:40 -07:00
Mikhail Zolotukhin
e686c66fe7 Reland: [TensorExpr] Add TensorExprKernel::runFast method. (#57552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552

This method uses `CodeGen::call_raw` instead of `CodeGen::call`.

Relanding #57328 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195047

Pulled By: ZolotukhinM

fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f
2021-05-05 09:11:37 -07:00
Mike Ruberry
1461859fde Revert D28048289: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel.
Test Plan: revert-hammer

Differential Revision:
D28048289 (6b2cb939c5)

Original commit changeset: 3867e862a0ec

fbshipit-source-id: bdd45dcc4b229673efeb06da411bbf0c58d44026
2021-05-04 11:29:14 -07:00
Mikhail Zolotukhin
6b2cb939c5 [TensorExpr] Add methods for inspecting generated code in TensorExprKernel. (#57074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57074

The new methods allow to peak into bufferArgs which describe parameters
that codegen expects. This description includes info whether a given
parameter is a scalar var or a buffer and in case it's a buffer allows
to get the corresponding `Buf*` pointer from which we could get the
expected sizes.

Differential Revision: D28048289

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 3867e862a0ec3593906820826c2344bd8a8f5c0a
2021-05-03 20:02:28 -07:00
Mike Ruberry
3018093066 Revert D28110359: [TensorExpr] Add TensorExprKernel::runFast method.
Test Plan: revert-hammer

Differential Revision:
D28110359 (f219ed6627)

Original commit changeset: 4fdffc8196d2

fbshipit-source-id: 3c93a058b5dd7a3b71e399341a408ec74949ef56
2021-05-01 16:16:37 -07:00
Mikhail Zolotukhin
f219ed6627 [TensorExpr] Add TensorExprKernel::runFast method. (#57328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57328

This method uses `CodeGen::call_raw` instead of `CodeGen::call`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28110359

Pulled By: ZolotukhinM

fbshipit-source-id: 4fdffc8196d24fc3300a9b4bc69f67562042a045
2021-04-30 15:26:18 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Mikhail Zolotukhin
f3743f097f [TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27977461

Pulled By: ZolotukhinM

fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f
2021-04-26 01:51:21 -07:00
Bert Maher
c91c4a081d [NNC] Horizontally fuse all loops (#56324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324

Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs
(and thus multiple loops), CSE has no chance.

So, this pass "horizontally" fuses the output loops together so that CSE can go
to town. Essentially we want to turn
```
for (...) {
  output_1[] = some_complicated_expr...
}
for (...) {
  output_2[] = some_complicated_expr...
}
```

Into:
```
for (...) {
  output_1[] = complicated_expr
  output_2[] = complicated_expr. // llvm cse should take care of this
}
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27841194

Pulled By: bertmaher

fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a
2021-04-20 23:54:40 -07:00
Mikhail Zolotukhin
85126629a5 [TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319

With this change the TorchScript graph can have constant tensors in it
and we still will be able to lower it to TE. The constants are
registered (or bound) within the `TensorExprKernel` object and when the
codegen is called, they are passed along with usual inputs and outputs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27838747

Pulled By: ZolotukhinM

fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437
2021-04-17 11:15:35 -07:00
Raghavan Raman
d3cde6c23c [NNC] Implementation for aten::cat without conditionals. (#53128)
Summary:
This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default.

Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes.

aten::cat implementation in NNC **with** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s
```
aten::cat implementation in NNC **without** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128

Reviewed By: bertmaher

Differential Revision: D26758613

Pulled By: navahgar

fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a
2021-03-07 22:57:02 -08:00
Elias Ellison
43f56e19a6 [NNC] Make NNC sanitize input names (#52786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52786

Previously, NNC did not sanitize input names. I ran into this in the next PR when making subgraph creation preserve debug names caused a number of NNC cuda failures. I also previously ran into this with some masked_fill failures internally, which led me to disable the operator.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696699

Pulled By: eellison

fbshipit-source-id: 7c3af4d559d58762fb8332666784a4d5cd6a4167
2021-03-01 21:22:16 -08:00
Raghavan Raman
c7a70eec1b Make LLVM the default backend for TE (#52314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52264

When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.

This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314

Reviewed By: ejguan

Differential Revision: D26491294

Pulled By: navahgar

fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
2021-02-18 12:00:38 -08:00
Elias Ellison
efe1fc21fc Dont inlinine intermediates on cpu (#49565)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565

Test Plan: Imported from OSS

Reviewed By: Krovatkin, ZolotukhinM

Differential Revision: D25688271

Pulled By: eellison

fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688
2021-01-04 15:46:20 -08:00
generatedunixname89002005325676
dcd1e3d78d [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D25490983

fbshipit-source-id: b24a11214a485a4a24ccf7da1e72715b450d3a81
2020-12-11 08:43:24 -08:00
Elias Ellison
3b57be176e [NNC] Preserve strided output (#48264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264

Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel.

Fix for https://github.com/pytorch/pytorch/issues/45604

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D25286213

Pulled By: eellison

fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f
2020-12-10 12:19:51 -08:00
Elias Ellison
413caa7fd2 [NNC] Compute Tensor Output Properties in ininitialization (#47813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47813

We have some code paths that at kernel invocation seem to handle dynamic sizes, but I'm not sure how well it works because we have other parts of our code base that assume that tenso shapes are always fully specified. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L1572

As with some other PRs in the stack, I think it would be good to remove the features that aren't on/actively being worked on while they are not used.

I initially did this PR to try to speed up perf. I couldn't observe too much  of a speed up, so we can decide to keep drop this PR if we want.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25286212

Pulled By: eellison

fbshipit-source-id: 4ae66e0af88d649dd4e592bc78686538c2fdbaeb
2020-12-10 12:19:45 -08:00
Nikolay Korovaiko
0d8ddb5ec2 Make softmax and log_softmax handle negative dims, add tests (#48156)
Summary:
Make softmax and log_softmax handle negative dims, add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156

Reviewed By: bertmaher

Differential Revision: D25059788

Pulled By: Krovatkin

fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad
2020-11-19 01:38:14 -08:00
Bert Maher
07657b6001 [tensorexpr] Switch cpp tests to pure gtest (#48160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160

We no longer use the custom c++ test infra anyways, so move to pure
gtest.

Fixes #45703
ghstack-source-id: 116977283

Test Plan: `buck test //caffe2/test/cpp/tensorexpr`

Reviewed By: navahgar, nickgg

Differential Revision: D25046618

fbshipit-source-id: da34183d87465f410379048148c28e1623618553
2020-11-18 12:23:34 -08:00
Raghavan Raman
8eb228a7f3 Add support for log_softmax (#47409)
Summary:
This diff adds support for `log_softmax` op in NNC.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47409

Reviewed By: ejguan

Differential Revision: D24750203

Pulled By: navahgar

fbshipit-source-id: c4dacc7f62f9df65ae467f0d578ea03d3698273d
2020-11-06 13:29:27 -08:00
Raghavan Raman
2caa3bd453 Inlining all non-output buffers, including intermediate buffers. (#47258)
Summary:
This diff enables inlining for all non-output buffers, including the intermediate buffers that are created as part of an op. However, the buffers that correspond to reductions will not be inlined.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47258

Reviewed By: anjali411

Differential Revision: D24707015

Pulled By: navahgar

fbshipit-source-id: ad8b03e38497600cd69980424db6d586bf93db74
2020-11-03 17:00:32 -08:00
Raghavan Raman
f58842c214 Enable inlining into reductions (#47020)
Summary:
This diff enables inlining producers into reductions. It also guards against inlining reductions themselves.

Prior to this diff, if there was a reduction in the loopnest, no inlining was happening. After this change, we will inline all non-output buffers that do not correspond to a reduction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47020

Reviewed By: albanD

Differential Revision: D24644346

Pulled By: navahgar

fbshipit-source-id: ad234a6877b65be2457b734cbb7f3a1800baa6a5
2020-11-02 15:33:38 -08:00
Mikhail Zolotukhin
d6de9d573a [TensorExpr] Properly handle input types promotion and special case of empty inputs for aten::cat. (#46500)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46500

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D24373671

Pulled By: ZolotukhinM

fbshipit-source-id: b3be73a89a9ab6654212cb7094f32bf1c445e876
2020-10-16 20:26:46 -07:00
Mikhail Zolotukhin
0f668d95b6 [TensorExpr] Fix shape inference logic for aten::cat. (#46482)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46482

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D24366778

Pulled By: ZolotukhinM

fbshipit-source-id: 000ff363b11599ba3827cdf2db3d4793878b84ab
2020-10-16 20:24:30 -07:00
Raghavan Raman
a5c0dbc519 Add support for Softmax. (#45286)
Summary:
This PR adds support for Softmax in NNC.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45286

Reviewed By: mrshenli

Differential Revision: D24042901

Pulled By: navahgar

fbshipit-source-id: 120bafe17586d3ecf0918f9aee852a7c3a8f4990
2020-10-08 23:57:02 -07:00
Ansley Ussery
5072728d88 Fix stride printing/parsing formatting (#45156)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45156

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24078695

Pulled By: ansley

fbshipit-source-id: dab993277d43b31105c38d12098c37653747b42a
2020-10-06 15:06:46 -07:00
Mikhail Zolotukhin
92306b85d5 [TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388

Classes defined in these files are closely related, so it is reasonable
to have them all in one file. The change is purely a code move.

Differential Revision: D23952867

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155
2020-09-29 01:17:10 -07:00
Alex Suhan
85d91a3230 [TensorExpr] Check statements in test_kernel.cpp (#43911)
Summary:
Check statements and fix all the warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43911

Test Plan: test_tensorexpr

Reviewed By: ZolotukhinM

Differential Revision: D23441092

Pulled By: asuhan

fbshipit-source-id: f671eef4b4eb9b51acb15054131152ae650fedbd
2020-08-31 22:16:25 -07:00
Alex Suhan
deb5fde51c [TensorExpr] Make KernelSumMultipleAxes much faster (#43905)
Summary:
Reduce input size, skip the dtype conversion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43905

Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.KernelSum*

Reviewed By: ailzhang

Differential Revision: D23433398

Pulled By: asuhan

fbshipit-source-id: 0d95ced3c1382f10595a9e5745bf4bef007cc913
2020-08-31 17:58:43 -07:00
Alex Suhan
60ad7e9c04 [TensorExpr] Make sum available from Python (#43730)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730

Test Plan:
python test/test_jit_fuser_te.py -k TestTEFuser.test_sum
test_tensorexpr --gtest_filter=TensorExprTest.KernelSum*

Reviewed By: ZolotukhinM

Differential Revision: D23407600

Pulled By: asuhan

fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c
2020-08-29 10:38:21 -07:00
Alex Suhan
de84db2a9d [TensorExpr] Add aten::sum lowering to the kernel (#43585)
Summary:
Handles all dimensions and selected dimensions, per PyTorch semantics.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585

Test Plan: test_tensorexpr

Reviewed By: bertmaher

Differential Revision: D23362382

Pulled By: asuhan

fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c
2020-08-27 02:46:47 -07:00
Mikhail Zolotukhin
b9c49f0e69 [TensorExpr] Support shape inference in TE for aten::cat. (#42387)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42387

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D22879281

Pulled By: ZolotukhinM

fbshipit-source-id: 775e46a4cfd91c63196b378ee587cc4434672c89
2020-08-05 14:11:24 -07:00
Mikhail Zolotukhin
2decccea2e [TensorExpr] Implement shape inference for TE. (#41451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41451

Since TE operates on a limited subset of ops with a well-defined
semantics, we can easily infer shapes of intermediate and output tensors
given shapes of the inputs.

There is a couple of ops that are not yet supported in the shape
inference, once we add them we could relax the shape info requirements
in the TE fuser: currently it requires all values in the fusion group to
have shapes known and we can change it to only inputs.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D22543470

Pulled By: ZolotukhinM

fbshipit-source-id: 256bae921028cb6ec3af91977f12bb870c385f40
2020-07-31 20:05:21 -07:00
Mikhail Zolotukhin
5d7046522b [JIT] Teach IRPrinter and IRParser to handle 'requires_grad' and 'device' as a part of type info. (#41507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41507

These fields have always been a part of tensor types, this change just
makes them serializable through IR dumps.

Test Plan: Imported from OSS

Reviewed By: Krovatkin, ngimel

Differential Revision: D22563661

Pulled By: ZolotukhinM

fbshipit-source-id: f01aaa130b7e0005bf1ff21f65827fc24755b360
2020-07-17 10:27:04 -07:00