Commit Graph

220 Commits

Author SHA1 Message Date
kshitij12345
c9af4c2636 OpInfo: where (#58349)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58349

Reviewed By: mrshenli

Differential Revision: D28744220

Pulled By: mruberry

fbshipit-source-id: 893a2fb88a48a60df75c7d6e2f58a42ca949daa7
2021-05-28 18:22:03 -07:00
Kushashwa Ravi Shrimali
0c1420aa3c OpInfo: fmod and remainder (#57941)
Summary:
See https://github.com/pytorch/pytorch/issues/54261

cc: mruberry Lezcano kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941

Reviewed By: mrshenli

Differential Revision: D28744464

Pulled By: mruberry

fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c
2021-05-27 20:32:56 -07:00
Bin Bao
7e4e648c2a Enable NNC fusion for relu6 (#58773)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58773

Test Plan:
```
python test/test_ops.py -k relu6
python test/test_jit_fuser_te.py
```

Reviewed By: bertmaher

Differential Revision: D28721791

Pulled By: desertfire

fbshipit-source-id: a94f711977afd080faae052f66eb8dded3cdc79e
2021-05-27 10:54:02 -07:00
Bert Maher
e24362746a [nnc] Concat input shapes must be known to fuse (#58974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58974

I don't know how we overlooked this for so long...
ghstack-source-id: 129932134

Test Plan:
Predictor test of model 184778294_0 using multiple request replay
threads.  It's not clear to me why multithreading matters, except that perhaps
it makes it easier to get an unknown shape in the profile.

Reviewed By: navahgar

Differential Revision: D28702660

fbshipit-source-id: 565550b1d2e571d62d0c8b21150193f2a7ace334
2021-05-26 11:29:26 -07:00
Horace He
6093161158 Separated out working tests from not working tests for NNC OpInfo (#58788)
Summary:
This gets rid of a lot of the try/else rigamarole.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58788

Reviewed By: ZolotukhinM

Differential Revision: D28621054

Pulled By: Chillee

fbshipit-source-id: d0d8a1b6466eb318d939a1ed172b78f492ee0d5b
2021-05-22 02:24:23 -07:00
Horace He
e56d3b0238 Added OpInfo tests for NNC (#58719)
Summary:
Finds a couple of bugs:

1. permute needs to wrap dimensions
2. slice needs to wrap dimensions
3. frac doesn't work correctly for negative values
4. Permute has some other failures.

This PR also fixes 1 + 2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58719

Reviewed By: SplitInfinity

Differential Revision: D28590457

Pulled By: Chillee

fbshipit-source-id: a67fce67799602f9396bfeef615e652364918fbd
2021-05-21 01:41:28 -07:00
Edvard Ghazaryan
5211eeb22b Support aten::leaky_relu for TE (#58464)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58464

Test Plan:
./bin/test_tensorexpr

python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops

Reviewed By: Krovatkin

Differential Revision: D28499776

fbshipit-source-id: 20094a1bc78aa485f76aec4e065ff69e43d692d7
2021-05-20 16:12:03 -07:00
Bert Maher
3d20ddfe92 [nnc] Do not fuse unsqueeze with variable dim (#58346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58346

If `dim` is a variable, NNC doesn't know how to translate the result,
since the shape is unknown.  This issue manifested as a `bad_variant_access`
when we try to pull an int constant out of that arg.

Note that, while the PE will pick up the resultant shape, it won't set guards accordingly.
ghstack-source-id: 129078971

Test Plan: new fuser test

Reviewed By: navahgar

Differential Revision: D28460956

fbshipit-source-id: 57ef918ef309ee57bfdf86717b910b6549750454
2021-05-18 21:44:37 -07:00
Bert Maher
6b8b591a84 [nnc] Fix output restriding of size-1 dimensions (#58256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58256

Size-1 dims mess up our output restriding logic, because they're
technically "dense" no matter what stride the dimension has.  In this example a
size-1 dim has stride 1, which causes all the indices to be taken mod 1 (i.e.,
all indices become 0).  We work around this peculiar case by skipping size-1 in
our layout logic, since it has no impact on the rest of the tensor's indexing.
ghstack-source-id: 128932739

Test Plan:
new unit test, plus
```
buck test mode/dev //langtech/mobile/audio_stream_processor:audio_stream_processor_test -- --exact 'langtech/mobile/audio_stream_processor:audio_stream_processor_test - AudioStreamProcessorTest.DemucsReadWriteFloat'
```

Reviewed By: eellison

Differential Revision: D28424388

fbshipit-source-id: e33e39eef2a5bf2797bee78a5987558308b6d110
2021-05-14 00:09:12 -07:00
Nick Korovaiko
c524448dd1 init hardshrink (#57749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749

add to a fx test

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D28425974

fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17
2021-05-13 19:38:05 -07:00
Mikhail Zolotukhin
470cd64514 [TensorExpr] Remove disabled tests that we do not plan to re-enable. (#58207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58207

We probably don't even know what these tests check and there are no
plans on re-enabling them - let's just nuke them to keep the code clean.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28403251

Pulled By: ZolotukhinM

fbshipit-source-id: fe12e978636a74f309f57e3408ab78d459fe4d29
2021-05-13 09:19:20 -07:00
Mikhail Zolotukhin
a0f4b7cd48 [TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206

Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`.

Closes #48053.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28403250

Pulled By: ZolotukhinM

fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f
2021-05-13 09:18:09 -07:00
Bert Maher
6955d4d0f7 [nnc] Handle only the first argument of aten::to (#58028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58028

We were trying to translate the device argument and thus throwing an
unsupported dtype.
ghstack-source-id: 128748658

Test Plan: predictor models

Reviewed By: navahgar

Differential Revision: D28347704

fbshipit-source-id: 331a5786339e01f9df1b1878970b0c5983a92980
2021-05-12 12:52:29 -07:00
Bert Maher
f97650e70b [nnc] Fix float->bool conversion on cpu (#57798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57798

Our instruction sequence was just plain wrong, instead of `fcmp une %x, +0.0`
(unordered equal 0.0) we were doing `fcmp uno`, which is just an unordered check
(i.e., is either side NaN).
ghstack-source-id: 128586464

Test Plan: New unit test against the full cross-product of dtypes.

Reviewed By: navahgar

Differential Revision: D28276269

fbshipit-source-id: ba5e59778e07770fb78ef02309f10edde333a800
2021-05-10 18:31:38 -07:00
Elias Ellison
241c2f4496 Add Gelu To NNC (#57753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57753

I'm not adding symbolic gradient because that is being added in https://github.com/pytorch/pytorch/pull/46785.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28262765

Pulled By: eellison

fbshipit-source-id: be365a2d392d7ac4bcc099a184762249ec2e18a6
2021-05-06 16:04:50 -07:00
Elias Ellison
7627dd568a hardswish reland (#57652)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57652

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D28226724

Pulled By: eellison

fbshipit-source-id: 585a91ffab7a855b5600e79130a37be25ef9b354
2021-05-05 17:21:43 -07:00
Shen Li
887d0e5657 Revert D28197820: [JIT][NNC] add hardswish symbolic gradient and NNC lowering
Test Plan: revert-hammer

Differential Revision:
D28197820 (0142fd0b57)

Original commit changeset: 05305d85c5bb

fbshipit-source-id: 2e1d9699515982ba2a9be06e83a2ce043ec857ee
2021-05-05 07:53:30 -07:00
eellison
0142fd0b57 [JIT][NNC] add hardswish symbolic gradient and NNC lowering (#57383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57383

Notes: I picked up an activation from https://github.com/pytorch/pytorch/issues/56969. You can look at the [activations.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L429) file which has both forward and backward kernel code to help you write the NNC lowering and the symbolic gradient.

I added a test in test_jit_fuser_te for the fusion, and I added an OpInfo and asserted that we expect to see autodiffable nodes to test the symbolic gradient.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D28197820

Pulled By: eellison

fbshipit-source-id: 05305d85c5bb0847c8f911b95ba47b137dca7e90
2021-05-04 23:39:59 -07:00
Bert Maher
151e81b7bc [nnc][tests] Skip long running tests when using TE interpreter (#57568)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28202740

Pulled By: bertmaher

fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b
2021-05-04 16:57:48 -07:00
Bert Maher
7c8a7efe3f [nnc] Enable all fuser tests for cpu (#57332)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28113481

Pulled By: bertmaher

fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0
2021-04-30 10:11:06 -07:00
Bert Maher
17b8a4db1c [nnc] Support pow on CPU (#56308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308

But only for float tensors.  Even on CUDA, int tensors just have weird
behavior with pow, and I bet FP is so much more common that it's just not worth
trying to fuse ints here.
ghstack-source-id: 126769637

Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow`

Reviewed By: navahgar

Differential Revision: D27834694

fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810
2021-04-20 15:13:03 -07:00
Mikhail Zolotukhin
5f19385588 [TensorExpr] Add aten::matmuls to TE fuser. (#54605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54605

For small sizes we generate a naive 3-layer loopnest, for bigger sizes
we generate an external call.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27298364

Pulled By: ZolotukhinM

fbshipit-source-id: 2ddf275ff68d6fca16a3befca5ce5c26aef462b5
2021-04-16 12:54:38 -07:00
Bert Maher
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00
Bert Maher
b940516061 [nnc] Don't fuse fp16 on CPU (#56119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56119

There are apparently still more issues with fp16 on LLVM so let's just
nuke it from orbit while we develop a robust workaround.
ghstack-source-id: 126619411

Test Plan: compile

Reviewed By: ZolotukhinM

Differential Revision: D27787080

fbshipit-source-id: 9e771211fe48266f50fca1de8d40295922da5bca
2021-04-15 14:01:29 -07:00
Natalia Gimelshein
ad17fadbfc Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1
Test Plan: revert-hammer

Differential Revision:
D27652485 (e7e164f9e6)

Original commit changeset: 182580cf758d

fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af
2021-04-14 20:23:15 -07:00
Natalia Gimelshein
506eca24b9 Revert D27752279: [nnc] Do not try to vectorize kernels that use float16
Test Plan: revert-hammer

Differential Revision:
D27752279 (8df5e61fd6)

Original commit changeset: ac115080bf2a

fbshipit-source-id: cbc0aa2dcb7691d9fc9d081c6169dea711cd9fac
2021-04-14 20:21:40 -07:00
Bert Maher
8df5e61fd6 [nnc] Do not try to vectorize kernels that use float16 (#55970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55970

LLVM's support for float16 is not great, and we were seeing assertion
failures trying to generate code for vectorized uses.  I note that clang
doesn't even try to vectorize operations involving half:
https://gcc.godbolt.org/z/86MW4xr17, so that's a good sign we shouldn't either.

Fixes #55905
ghstack-source-id: 126511474

Test Plan: pytest test_jit_fuser_te.py -k test_isnan

Reviewed By: asuhan

Differential Revision: D27752279

Pulled By: bertmaher

fbshipit-source-id: ac115080bf2a4a73d52b396d64a5bce0cf13abfe
2021-04-14 11:28:34 -07:00
Bert Maher
e7e164f9e6 [nnc] Enable CPU fusion only when num_threads == 1 (#55621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621

Fuser support for thread-level parallelism is a work in progress, so
only fuse when the program is running single-threaded.
ghstack-source-id: 126069259

Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not

Reviewed By: ZolotukhinM

Differential Revision: D27652485

fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef
2021-04-14 09:16:54 -07:00
Nikita Shulga
c47cc30bf5 Skip testing torch.float16 in test_isnan (#55906)
Summary:
See https://github.com/pytorch/pytorch/issues/55905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55906

Reviewed By: walterddr

Differential Revision: D27737356

Pulled By: malfet

fbshipit-source-id: 39571cfe6f078af8bb7387ed459a5d0f2410bad1
2021-04-13 14:44:43 -07:00
Bert Maher
42486963b2 Integrate NNC conv2d with fuser (#55213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55213

Adds the integration of conv2d with the TE fuser.  A few things of interest:

- I'm *super* selective of what convs get lowered.  Only 3x3 depthwise, because
  I've benchmarked those to death and I'm pretty sure it's a good change.

- I'm allowing single-node "fusion" groups for supported convs.  (Maybe this is
  a sign that conv2d codegen should go through a different path entirely, but
  it seems to basically work).

I'll shared full benchmarkr results once I clean them up a little.  To
summarize, I tested the following torchvision models containing depthwise
convolutions.  Results are single-core on a skylake-avx512:

mobilenet_v2: 8% improvement
mobilenet_v3: 9% improvement
mnasnet: 10% improvement
shufflenet: 18% improvement

Note these are comparing against a baseline with a fast-but-buggy grouped
convolution implementation in MKLDNN.  So perf results will be better if
compared on master, but I'm going to assume the MKLDNN bug will be fixed and
re-enabled.

Perf results are more complicated when comparing to freezing plus conversion to
mkldnn layout; mobilenet v2/v3 are still faster, but mnasnet and shufflenet are
not.  Landing this doesn't prevent MKLDNN freezing from kicking in though, so
there's no harm (although landing mkldnn freezing will regress mobilenet, but
cest la vie).
ghstack-source-id: 126076112

Test Plan: New unit test, plus torchvision

Reviewed By: ZolotukhinM

Differential Revision: D27530272

fbshipit-source-id: 92153fad234bc9f1eaa4f7624c543168d1294a87
2021-04-08 21:58:27 -07:00
Hui Guo
2a53897114 [jit][tensorexpr] Added aten::batch_norm into fuser when in inference mode (#54204)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54204

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27134348

Pulled By: huiguoo

fbshipit-source-id: 5ea7a6c5bc694fcdfc436dba3fa6eb269420324e
2021-03-23 04:41:52 -07:00
Nikolay Korovaiko
d4527b4e16 add a full pipeline test for a TypeCheck (#52933)
Summary:
This tests a simple failure mode for a TypeCheck when a shape changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52933

Reviewed By: H-Huang

Differential Revision: D26727583

Pulled By: Krovatkin

fbshipit-source-id: b277218af9572cd6f89f2ece044f7d84d4c10283
2021-03-01 10:58:08 -08:00
jiej
4d94ee566e Ge v1 (#52136)
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136

Reviewed By: pbelevich

Differential Revision: D26693978

Pulled By: Krovatkin

fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
2021-02-28 00:53:13 -08:00
Hui Guo
d8b28579c3 Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v3) (#52394)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52394

Test Plan:
Imported from OSS

test/test_tensorexpr.py
test/test_jit_fuser_te.py

Reviewed By: bertmaher

Differential Revision: D26497856

Pulled By: huiguoo

fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0
2021-02-18 22:56:03 -08:00
Raghavan Raman
c7a70eec1b Make LLVM the default backend for TE (#52314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52264

When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.

This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314

Reviewed By: ejguan

Differential Revision: D26491294

Pulled By: navahgar

fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
2021-02-18 12:00:38 -08:00
Alex Suhan
1bde5a216f [TensorExpr] Use wider type for scalars (#50774)
Summary:
Scalars have to be double / 64-bit integers to match eager semantics.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50774

Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_clamp

Reviewed By: ngimel

Differential Revision: D25978214

Pulled By: asuhan

fbshipit-source-id: ba765b7d215239f2bf0f3d467e4dce876f7ccb91
2021-01-20 15:12:27 -08:00
Nikolay Korovaiko
526659db20 whitelist ops we can build shapes for (#49125)
Summary:
Whitelist ops we can build shapes for.
Otherwise, `buildShapeExpressions` assumes that `aten::unsqueeze` is just a regular op.

```
[DUMP tensorexpr_fuser.cpp:329] buildShapeExpressions for
[DUMP tensorexpr_fuser.cpp:329] graph(%1 : float,
[DUMP tensorexpr_fuser.cpp:329]       %3 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0),
[DUMP tensorexpr_fuser.cpp:329]       %8 : float,
[DUMP tensorexpr_fuser.cpp:329]       %10 : Float(50, strides=[1], requires_grad=0, device=cuda:0)):
[DUMP tensorexpr_fuser.cpp:329]   %11 : int = prim::Constant[value=1]()
[DUMP tensorexpr_fuser.cpp:329]   %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11)
[DUMP tensorexpr_fuser.cpp:329]   %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8)
[DUMP tensorexpr_fuser.cpp:329]   %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11)
[DUMP tensorexpr_fuser.cpp:329]   %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1)
[DUMP tensorexpr_fuser.cpp:329]   return (%2, %6, %9)
[DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %3 %162 : int[] = aten::size(%27)
[DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %10 %163 : int[] = aten::size(%23)
[DEBUG tensorexpr_fuser.cpp:402] Building sizes for %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11)
[DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %10
[DEBUG tensorexpr_fuser.cpp:402] Building sizes for %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8)
[DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %12
[DEBUG tensorexpr_fuser.cpp:402] Building sizes for %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11)
[DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %3
[DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %9
[DEBUG tensorexpr_fuser.cpp:402] Building sizes for %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1)
[DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %6
[DEBUG tensorexpr_fuser.cpp:907] Inserting a typecheck guard for a node%156 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = prim::TensorExprGroup[Subgraph=<Graph>](%3, %27, %16, %23)
[DUMP tensorexpr_fuser.cpp:463] After guarding fusion groups:
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49125

Reviewed By: albanD

Differential Revision: D25926997

Pulled By: Krovatkin

fbshipit-source-id: f8041bbfc12be16c329754c6d16911d12aa352ef
2021-01-19 16:17:21 -08:00
Richard Barnes
a4383a69d4 Clean up some type annotations in caffe2/test (#49943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25717534

fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146
2021-01-13 10:01:55 -08:00
Thomas Viehmann
ea087e2d92 JIT: guard DifferentiableGraph node (#49433)
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.

Fixes https://github.com/pytorch/pytorch/issues/49299

I still need to look into a handful of failing tests, but maybe it can be a discussion basis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433

Reviewed By: ngimel

Differential Revision: D25681374

Pulled By: Krovatkin

fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
2021-01-08 20:01:27 -08:00
Elias Ellison
6eee2a0a9f [JIT] disable masked fill (#50147)
Summary:
There is an internal user who is experiencing a bug with masked_fill. While I am almost certain this corresponds to an old pytorch version with the bug, the model that is breaking is important and time-sensitive and we are covering all bases to try to get it to work again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50147

Reviewed By: nhsoukai

Differential Revision: D25806541

Pulled By: eellison

fbshipit-source-id: 131bd71b5db9717a8a9cb97973d0b4f0e96455d6
2021-01-06 11:36:30 -08:00
Elias Ellison
268441c7d8 [NNC] masked fill (#49627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49627

There was a bug in the test that was hidden by the `If eager mode doesn't support a dtype/op/device combo` try /  catch, so cuda wasn't being tested �  The fix is just to rename `aten::masked_fill` to `aten_masked_fill`.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25696409

Pulled By: eellison

fbshipit-source-id: 83de1f5a194df54fe317b0035d4a6c1aed1d19a0
2020-12-28 10:37:02 -08:00
Elias Ellison
3659560fba [NNC] Disable masked fill (#49622)
Summary:
There's a bug internally, disable as quick fix before investigation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49622

Test Plan:
Imported from GitHub, without a `Test Plan:` line.
build

Reviewed By: zheng-xq, PursueHappinessDirectly

Differential Revision: D25651897

Pulled By: eellison

fbshipit-source-id: dd1454f2ef7506d7844016128aa6320d7e69aa6e
2020-12-18 16:28:00 -08:00
Peng Wu
6568572712 Support integral types for kAbs in SimpleIREvaluator (#49357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357

This is a follow-up fix for PR #48679, where the previous PR
adds support for integer inputs to aten::abs by promoting integers to
float and then demote the result back to integers. This PR supports
integer inputs to aten::abs more efficiently in the SimpleIREvaluator
by allowing implementing integer inputs for kAbs (renamed from kFabs).
- Rename kFabs to kAbs
- Add support for integer input to kAbs in SimpleIREvalator (note that:
llvm_codegen and cuda_codegen already supports integer inputs to kAbs)

Test Plan:
- `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py
TestTEFuser.test_unary_ops`
- `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops`

Imported from OSS

Reviewed By: eellison

Differential Revision: D25545791

fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230
2020-12-18 07:57:58 -08:00
Elias Ellison
904586271b Add fusion support of aten::to (#48976)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48976

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25413164

Pulled By: eellison

fbshipit-source-id: 0c31787e8b5e1368b0cba6e23660799b652389cd
2020-12-16 18:36:16 -08:00
Elias Ellison
80b508f207 [NNC] add support for masked_fill (#48974)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48974

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25413165

Pulled By: eellison

fbshipit-source-id: 8cece1dc3692389be90c0d77bd71b103254d5ad3
2020-12-16 18:36:13 -08:00
Elias Ellison
50386b9988 [NNC] Add Support For is_nan (#48973)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48973

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25413166

Pulled By: eellison

fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef
2020-12-16 18:31:01 -08:00
Bert Maher
f4e15c4a23 [te] Fix bugs with shift operators (#49396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49396

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49271

Two things:

1. These throw exceptions in their constructor, which causes a segfault (*), so
   move the exceptions to ::make.
2. They technically support FP types but the rules are complicated so let's not
   bother.

(*) The reason for the segfault: all Exprs including these inherit from
KernelScopedObject, whose constructor adds the object to a list for destruction
at the end of the containing KernelArena's lifetime.  But if the derived-class
constructor throws, the object is deleted even though it's still in the
KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
ghstack-source-id: 118594998

Test Plan: `buck test //caffe2/test:jit`

Reviewed By: bwasti

Differential Revision: D25512052

fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba
2020-12-15 12:44:59 -08:00
Bert Maher
626b8c0cf2 [te] Ban uint8 tensors from fusion groups (#49247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49247

uint8's expose all kind of corner cases in type promotion.  As an example, consider:
```
>>> torch.tensor([1], dtype=torch.uint8).lt(-1)
tensor([True])
>>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor(-1))
tensor([True])
>>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor([-1]))
tensor([False])
```
the difference is how promotions involving scalars (or 0-dim tensors, which are treated like scalars) are prioritized compared to tensor dtypes.
Per eellison, the order is something like:
1. Tensor FP types
2. Scalar FP types
3. Tensor Int types
4. Scalar Int types

The logic for this is here: c73e97033a/aten/src/ATen/native/TypeProperties.cpp (L93)

AFAICT the effects are mainly visible for the unsigned byte type (the only unsigned type, besides bool) since the others degrade more or less gracefully.

It's hard to re-use this logic as is in TensorIterator/TypeProperties, and it's complicated enough that it's not worth re-implementing in TE unless there's evidence that it matters for real models.
ghstack-source-id: 118555597

Test Plan: `buck test //caffe2/test:jit`

Reviewed By: eellison

Differential Revision: D25489035

fbshipit-source-id: db3ab84286d472fd8a247aeb7b36c441293aad85
2020-12-14 17:40:15 -08:00
Bert Maher
ae88d25c23 [te] Fix clamp with uint8 args (#49143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49143

Riddle me this, batman: how could `torch.clamp(torch.tensor([0], dtype=torch.uint8), -10, 10)` equal `10`?  The answer: the min/max args are first cast to the dtype of the input, giving min=246 and max 10.  Then you have to apply Min and Max in the right order: `Min(Max(in, min), max)`.  Differ in any way and you're doomed.  Hooray.

This PR makes TE match eager mode for this operator, plus fixes a major facepalm in the llvm min/max codegen where we were always generating signed comparisons.
ghstack-source-id: 118415318

Test Plan: `buck test //caffe2/test:{jit,tensorexpr}`

Reviewed By: robieta

Differential Revision: D25456366

fbshipit-source-id: dde3c26c2134bdbe803227601fa3d23eaac750fb
2020-12-11 22:36:52 -08:00
Peng Wu
a47a087a43 [NNC] Add missing data type support for abs and frac (#48679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48679

This addresses the remaining problem reported in issue #48053

Data type supports for aten kernels in SimpleIREvaluator are not
consistent w/ aten::native library implementation. In SimpleIREvaluator,
  - only float/double are supported on aten::abs (integral types and half
are missing)
  - only float/double are supported on aten::frac (half are missing)

It is also not clear from kernel.cpp source code what are the expected
input data types for an aten kernel, leading to potential missing data
type issues down the road.

This commit addresses both issues in a limited way by
 - Added type promotion ops from half/integral input types to float
 - Added a skeleton support for some type checking for aten kernels,
   currently, only check for valid data types for frac and abs to limit
   the scope of the change; but the utility function can be used for
   consistently adding type checking for all aten functions

Known limitations:
 - abs support for integral types can be made more effective by invoking
 std::abs for integral tensors (currently kFabs maps to std::fabs).
 Since that change is a bit more involved (e.g., changing IntrinsicsOp
 kFabs to kAbs and other code generators accordingly), will leave it to
 another issue
 - other aten kernels may need similar type checking and some scrutiny
 on the use of promoteToFloat to detect invalid data types early on.
 That is also left for another issue

Test Plan:
test_jit_fuser_te.test_unary_ops

Imported from OSS

Reviewed By: asuhan

Differential Revision: D25344839

fbshipit-source-id: 95aca04c99b947dc20f11e4b3bae002f0ae37044
2020-12-10 17:47:15 -08:00
Elias Ellison
3b57be176e [NNC] Preserve strided output (#48264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264

Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel.

Fix for https://github.com/pytorch/pytorch/issues/45604

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D25286213

Pulled By: eellison

fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f
2020-12-10 12:19:51 -08:00
Bert Maher
2d07d5b50a [te] Don't fuse integer fmod or remainder (#48700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48700

fmod and remainder on int tensors will raise ZeroDivisionError if their divisors are 0.  I don't think we should try to generate code that raises exceptions.  If at some point we really wanted to fuse these, I might lean towards calling a C++ helper function from the generated code.
ghstack-source-id: 117845642

Test Plan: `buck test //caffe2/test:jit -- test_binary_ops`

Reviewed By: eellison

Differential Revision: D25265792

fbshipit-source-id: 0be56ba3feafa1dbf3c37f6bb8c1550cb6891e6d
2020-12-04 18:02:29 -08:00
Peng Wu
bc2352e8c3 [NNC] Complete SimpleIREvaluator support for bitwise ops (#48053) (#48179)
Summary:
Add missing types for bitwise_ops in `SimpleIREvaluator`

This is the first part of fixes for issue https://github.com/pytorch/pytorch/issues/48053.
- Original implementation of bitwise_ops supports only int operands, the
fix all support for integral types supported by the IR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48179

Test Plan: `python test/test_jit_fuser_te.py TestTEFuser.test_bitwise_ops`

Reviewed By: ZolotukhinM

Differential Revision: D25126944

Pulled By: penguinwu

fbshipit-source-id: 04dc7fc00c93b2bf1bd9f9cd09f7252357840b85
2020-12-04 08:10:18 -08:00
Mikhail Zolotukhin
d0e9523c4f [TensorExpr] Add more operator tests. (#48677)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48677

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D25258656

Pulled By: ZolotukhinM

fbshipit-source-id: 173b87568f3f29f04d06b8621cbfbd53c38e4771
2020-12-01 17:34:09 -08:00
Bert Maher
adb4fd3f2f [te] Fix comparison ops on booleans (#48384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48384

As title

Test Plan: buck test //caffe2/test:jit -- test_binary_ops

Reviewed By: asuhan

Differential Revision: D25115773

fbshipit-source-id: c5f8ee21692bcf0d78f099789c0fc7c457a1e4a2
2020-11-30 18:21:35 -08:00
Mikhail Zolotukhin
d9f5ac0805 [TensorExpr] Add a envvar to disable LLVM backend and use IR Eval instead. (#48355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48355

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D25139668

Pulled By: ZolotukhinM

fbshipit-source-id: 34dfcceadb24446d103710f00526693a53f3750f
2020-11-30 18:16:28 -08:00
Mikhail Zolotukhin
a6f0c3c4f0 [TensorExpr] IREval: fix div for Half dtype. (#48354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48354

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D25139669

Pulled By: ZolotukhinM

fbshipit-source-id: a7eccad883d8b175d7d73db48bd366382eabea53
2020-11-30 18:14:08 -08:00
Mikhail Zolotukhin
b967119906 [TensorExpr] Fix lowering for aten::div. (#48329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48329

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D25130750

Pulled By: ZolotukhinM

fbshipit-source-id: 7c6345adcaec5f92cd6ce78b01f6a7d5923c0004
2020-11-21 09:20:28 -08:00
Mikhail Zolotukhin
5e1faa1d41 [TensorExpr] Fix aten::atan2 lowering and disable aten::pow lowering on CPU. (#48326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48326

The PR introduces a set of 'cuda-only' ops into `isSupported` function.
It is done to disable `pow` lowering on CPU where it's tricky to support
integer versions.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D25129211

Pulled By: ZolotukhinM

fbshipit-source-id: c62ae466e1d9ba9b3020519aadaa2a7fe7942d84
2020-11-21 09:15:42 -08:00
Mikhail Zolotukhin
eb49dabe92 [TensorExpr] Add even more operator tests. (#48292)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48292

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25113397

Pulled By: ZolotukhinM

fbshipit-source-id: a8591006e1fb71b87d50c8a150739a9bca835928
2020-11-19 23:35:19 -08:00
Mikhail Zolotukhin
efd41db32c [TensorExpr] Add more operator tests. (#48282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48282

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25108184

Pulled By: ZolotukhinM

fbshipit-source-id: ba8cdf6253533210a92348f475b8b9400d8ecb1a
2020-11-19 23:29:11 -08:00
Nikolay Korovaiko
0d8ddb5ec2 Make softmax and log_softmax handle negative dims, add tests (#48156)
Summary:
Make softmax and log_softmax handle negative dims, add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156

Reviewed By: bertmaher

Differential Revision: D25059788

Pulled By: Krovatkin

fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad
2020-11-19 01:38:14 -08:00
Bert Maher
6da26fe79b [te] Fix pow (#48213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48213

it was completely broken unless rhs was a constant.

Test Plan: new unit test in test_jit_fuser_te.py

Reviewed By: eellison

Differential Revision: D25071639

fbshipit-source-id: ef1010a9fd551db646b83adfaa961648a5c388ae
2020-11-18 22:44:16 -08:00
Nikita Shulga
06707a7ef8 Fix flake8 failure (#48124)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48124

Reviewed By: walterddr

Differential Revision: D25032696

Pulled By: malfet

fbshipit-source-id: 2519d18de7417721d53f6404dc291fd8f7cc94fe
2020-11-17 13:48:08 -08:00
Bert Maher
736deefc1f [torch][te] aten::type_as is unary, not binary (#48085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48085

We were treating it as a binary operator, which implies shape
broadcasting, even though the second arg is thrown away aside from the type.
Treating it as a unary is the proper approach.
ghstack-source-id: 116873680

Test Plan: new unit test

Reviewed By: ZolotukhinM

Differential Revision: D25017585

fbshipit-source-id: 0cfa89683c9bfd4fbb132617c74b47b268d7f368
2020-11-17 12:17:19 -08:00
Bert Maher
bbee0ecbd1 [pytorch][te] Handle negative axis in chunk (#48084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48084

as title
ghstack-source-id: 116870328

Test Plan: new unit test

Reviewed By: Krovatkin

Differential Revision: D25017489

fbshipit-source-id: 0d1998fccad6f509db04b6c67a4e4e4093d96751
2020-11-17 12:12:49 -08:00
Bert Maher
6b8d20c023 [pytorch][te] Don't start TE fusion groups with an unknown-typed result (#47884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47884

We need to know output types of everything in a fusion group to ensure
that we generate correctly-typed tensors.  We were incorrectly starting a
fusion group with an unknown-typed output.

Test Plan:
New unit tests:
```
buck test //caffe2/test:jit //caffe2/test/cpp/tensorexpr:tensorexpr
```

Reviewed By: eellison

Differential Revision: D24932786

fbshipit-source-id: 83978a951f32c1207bbc3555a7d3bd94fe4e70fb
2020-11-13 10:52:53 -08:00
Elias Ellison
664d2f48cf [NNC] Enable unary op cpu testing (#47374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47374

A few small fixes needed to enable unary op cpu testing. If reviewers would prefer I split  them  up let me know.

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805248

Pulled By: eellison

fbshipit-source-id: c2cfe2e3319a633e64da3366e68f5bf21d390cb7
2020-11-12 11:14:03 -08:00
Elias Ellison
346a71d29c [NNC] More cpu tests (#47372)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47372

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805254

Pulled By: eellison

fbshipit-source-id: b7e5ee044ef816e024b6fc5c4041fff5f2049bb3
2020-11-12 11:13:57 -08:00
Elias Ellison
450738441b [NNC] Add more CPU Tests (#47371)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47371

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805252

Pulled By: eellison

fbshipit-source-id: 16472960d09f6c981adca2a45b2a4efb75a09d4f
2020-11-12 11:13:54 -08:00
Elias Ellison
e618bd858e [NNC] Fix llvm min lowering for int inputs (#47370)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47370

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805249

Pulled By: eellison

fbshipit-source-id: e13d956899e8651600fab94dab04aa39ca427769
2020-11-12 11:13:50 -08:00
Elias Ellison
fe81faee5f Add more CPU tests (#47369)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47369

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805251

Pulled By: eellison

fbshipit-source-id: f1a8210ffdc3cc88354cb4896652151d83a0345a
2020-11-12 11:13:47 -08:00
Elias Ellison
b8a1070ec0 [TensorExpr][CPU] Fix bool -> int casting (#46951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46951

If e.g. we're casting from torch.int -> torch.bool,  previously we would just truncate from int32 -> i8. Since torch.bool has 8 bits but only uses one of them, we need to makes sure that one bit is set.

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805253

Pulled By: eellison

fbshipit-source-id: af3aa323f10820d189827eb51037adfa7d80fed9
2020-11-12 11:13:44 -08:00
Elias Ellison
ad5be26b2f Small changes/cleanup (#46950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46950

Make sure that we're fusing in a fuse tests, and refactor to more concise API to check if fusions have happened.

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805250

Pulled By: eellison

fbshipit-source-id: f898008a64b74e761bb5fe85f91b3cdf2dbdf878
2020-11-12 11:13:38 -08:00
Elias Ellison
f221a19a7f Force LLVM Compilation for CPU Tests (#46949)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46949

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24805247

Pulled By: eellison

fbshipit-source-id: 4fcaf02d8a78cc5cbcbde36940d0a2c85fba3fc5
2020-11-12 11:12:08 -08:00
Bert Maher
c4892c8efe [pytorch][tensorexpr] Promote integer arguments to sin/cos/tan to float (#46776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46776

Following numpy and (now) eager mode

Fixes #46458

Test Plan: test_jit_fuser_te

Reviewed By: navahgar

Differential Revision: D24509884

fbshipit-source-id: c063030fc609ba4aefcd9abd25b50f082fef1548
2020-10-23 17:32:54 -07:00
kshitij12345
8e13fe6c44 [numpy] torch.sin : support and promote integer inputs to float (#45733)
Summary:
References https://github.com/pytorch/pytorch/issues/42515

> Enable integer -> float unary type promotion for ops like sin

Will follow-up for other such Ops once this PR is merged.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45733

Reviewed By: zou3519

Differential Revision: D24431194

Pulled By: mruberry

fbshipit-source-id: db600bc5de0e535b538d2aa301c3526b7c75ed17
2020-10-22 01:58:57 -07:00
Elias Ellison
1b97ffa07a [1/3] [JIT] Make sure fusion occurs in test_tensorexpr file (#45788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45788

We were only running the traced graph once, which would not yet have been fused at that point. We should run for num_profiled_runs + 1, and also assert that all nodes in the graph  were fused.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D24169537

Pulled By: eellison

fbshipit-source-id: 8499bb1a5bd9d2221b1f1c54d6352558cf07ba9a
2020-10-08 12:02:57 -07:00
Nikolay Korovaiko
993628c74a Build shape expressions and remove outputs that are only used by aten::sizes (#45080)
Summary:
Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs.

A simple example would be:

```
        def test_fuse(a, b):
            c = a + b
            d = c + b
            return d
```

Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen.

Without this optimization, TE would need to materialize `c` so we can get its size

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : Tensor, %87 : Tensor = prim::If(%85)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4, %c.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %94 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%96, %97)
[DUMP profiling_graph_executor_impl.cpp:499]   %60 : int[] = aten::size(%87) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %67 : int[] = aten::size(%86) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%86, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3, %c.3)
```

With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it.

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %91 : Tensor = prim::If(%90)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %97 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %98 : (Tensor) = prim::CallFunction(%97, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %99 : Tensor = prim::TupleUnpack(%98)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%99)
[DUMP profiling_graph_executor_impl.cpp:499]   %85 : int[] = aten::size(%91)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : int[] = prim::BroadcastSizes(%59, %62)
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%91, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080

Reviewed By: bertmaher

Differential Revision: D23856410

Pulled By: Krovatkin

fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1
2020-09-28 10:45:56 -07:00
Nick Gibson
d1d9017a66 [NNC] fix Half conversion of immediates in Cuda backend (#45213)
Summary:
The Cuda HalfChecker casts up all loads and stores of Half to Float, so we do math in Float on the device. It didn't cast up HalfImmediate (ie. constants) so they could insert mixed-size ops. Fix is to do that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45213

Reviewed By: ezyang

Differential Revision: D23885287

Pulled By: nickgg

fbshipit-source-id: 912991d85cc06ebb282625cfa5080d7525c8eba9
2020-09-25 10:53:36 -07:00
Alex Suhan
3dd0e362db [TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984)
Summary:
For integral types, isnan is meaningless. Provide specializations for
maximum and minimum which don't call it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984

Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops

Reviewed By: ezyang

Differential Revision: D23885259

Pulled By: asuhan

fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437
2020-09-23 23:19:12 -07:00
Bert Maher
2d00ebd29f Failing test demonstrating problems with mixed output shapes (#44455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44455

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23886119

Pulled By: bertmaher

fbshipit-source-id: 41787930f154cf4e8a1766613c4cf33b18246555
2020-09-23 21:15:37 -07:00
Alex Suhan
0495998862 [TensorExpr] Disallow arithmetic binary operations on Bool (#44677)
Summary:
Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover,
such semantics can be implemented by the client code through insertion of
explicit casts to widen and narrow to the desired types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677

Test Plan:
test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic
python test/test_jit_fuser_te.py

Reviewed By: agolynski

Differential Revision: D23801412

Pulled By: asuhan

fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36
2020-09-23 14:59:11 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Mikhail Zolotukhin
d66520ba08 [TensorExpr] Fuser: try merging adjacent fusion groups. (#43671)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23360796

Pulled By: ZolotukhinM

fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f
2020-09-15 21:31:02 -07:00
Akihiro Nitta
84949672bf Fix exception chaining in test/ (#44193)
Summary:
## Motivation
This PR fixes https://github.com/pytorch/pytorch/issues/43770 and is the continuation of https://github.com/pytorch/pytorch/issues/43836.

## Description of the change
This PR fixes exception chaining only in files under `test/` where appropriate.
To fix exception chaining, I used either:
1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information.
2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant.

## List of lines containing `raise` in `except` clause:
I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause.

- [x] f8f35fddd4/test/test_cpp_extensions_aot.py (L16)
- [x] f8f35fddd4/test/test_jit.py (L2503)
- [x] f8f35fddd4/test/onnx/model_defs/word_language_model.py (L22)
- [x] f8f35fddd4/test/onnx/verify.py (L73)
- [x] f8f35fddd4/test/onnx/verify.py (L110)
- [x] f8f35fddd4/test/onnx/test_verify.py (L31)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L255)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L2992)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3025)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3712)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3180)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3198)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L752)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L776)
- [x] f8f35fddd4/test/test_type_hints.py (L151)
- [x] f8f35fddd4/test/test_jit_fuser.py (L771)
- [x] f8f35fddd4/test/test_jit_fuser.py (L773)
- [x] f8f35fddd4/test/test_dispatch.py (L105)
- [x] f8f35fddd4/test/test_distributions.py (L4738)
- [x] f8f35fddd4/test/test_nn.py (L9824)
- [x] f8f35fddd4/test/test_namedtensor.py (L843)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L875)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L877)
- [x] f8f35fddd4/test/test_dataloader.py (L31)
- [x] f8f35fddd4/test/test_dataloader.py (L43)
- [x] f8f35fddd4/test/test_dataloader.py (L365)
- [x] f8f35fddd4/test/test_dataloader.py (L391)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44193

Reviewed By: albanD

Differential Revision: D23681529

Pulled By: malfet

fbshipit-source-id: 7c2256ff17334625081137b35baeb816c1e53e0b
2020-09-14 14:20:16 -07:00
Bert Maher
350130a69d Prevent the TE fuser from getting datatypes it can't handle (#44160)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528508

Pulled By: bertmaher

fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85
2020-09-09 11:10:04 -07:00
Bert Maher
960c088a58 [te] Fix casting of unsigned char, and abs(int) (#44157)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23528507

Pulled By: bertmaher

fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9
2020-09-09 11:08:36 -07:00
Nikolay Korovaiko
f044b17ae2 Disable a test (#44348)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44348

Reviewed By: mrshenli

Differential Revision: D23592524

Pulled By: Krovatkin

fbshipit-source-id: 349057606ce39dd5de24314c9ba8f40516d2ae1c
2020-09-09 08:36:19 -07:00
Nick Gibson
be94dba429 [NNC] fix support for FP16 in CudaCodgen (#44209)
Summary:
Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load.

Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209

Reviewed By: izdeby

Differential Revision: D23575577

Pulled By: nickgg

fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46
2020-09-08 18:00:39 -07:00
Nikolay Korovaiko
47ac9bb105 Enable temp disabled tests in test_jit_fuser_te.py (#44222)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44222

Reviewed By: izdeby

Differential Revision: D23582214

Pulled By: Krovatkin

fbshipit-source-id: 27caa3ea02ce10b163212f6a45a81b446898953d
2020-09-08 14:40:32 -07:00
Bert Maher
98ad5ff41f [te] Disable reductions by default (#44122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D23504769

Pulled By: bertmaher

fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec
2020-09-03 23:37:45 -07:00
Bert Maher
55ff9aa185 Test TE fuser unary ops and fix sigmoid(half) (#44094)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23494950

Pulled By: bertmaher

fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de
2020-09-03 12:48:46 -07:00
Mikhail Zolotukhin
40fec4e739 [TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073

We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23487905

Pulled By: ZolotukhinM

fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1
2020-09-02 22:59:04 -07:00
Bert Maher
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00
Bert Maher
c14a3613a8 Fix NaN propagation in TE fuser's min/max implementation (#43609)
Summary:
Per eager mode source-of-truth, NaNs shall be propagated by min/max.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609

Reviewed By: ZolotukhinM

Differential Revision: D23349184

Pulled By: bertmaher

fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb
2020-09-01 02:10:13 -07:00
Elias Ellison
a7e7981c0b Use prim::TensorExprGroup interned symbol (#43635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635

Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358806

Pulled By: eellison

fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8
2020-08-31 11:52:16 -07:00
Alex Suhan
60ad7e9c04 [TensorExpr] Make sum available from Python (#43730)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730

Test Plan:
python test/test_jit_fuser_te.py -k TestTEFuser.test_sum
test_tensorexpr --gtest_filter=TensorExprTest.KernelSum*

Reviewed By: ZolotukhinM

Differential Revision: D23407600

Pulled By: asuhan

fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c
2020-08-29 10:38:21 -07:00
Elias Ellison
a4cf4c2437 refactor tests (#43631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631

I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358810

Pulled By: eellison

fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82
2020-08-27 14:35:33 -07:00
Mikhail Zolotukhin
3ec24f02af [TensorExpr] Start using typecheck in the fuser. (#43173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173

With this change the fuser starts to generate typechecks for inputs of
fusion group. For each fusion group we generate a typecheck and an if
node: the true block contains the fused subgraph, the false block
contains unoptimized original subgraph.

Differential Revision: D23178230

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b
2020-08-25 18:13:32 -07:00
Yujun Zhao
e5adf45dde Add python unittest target to caffe2/test/TARGETS (#42766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42766

**Summary**
Some python tests are missing in `caffe2/test/TARGETS`, add them to be more comprehension.

According to [run_test.py](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L125), some tests are slower. Slow tests are added as independent targets and others are put together into one `others` target. The reason is because we want to reduce overhead, especially for code covarge collection.  Tests in one target can be run as a bundle, and then coverage can be collected together. Typically coverage collection procedure is time-expensive, so this helps us save time.

Test Plan:
Run all the new test targets locally in dev server and record the time they cost.
**Statistics**

```
# jit target
real    33m7.694s
user    653m1.181s
sys     58m14.160s

--------- Compare to Initial Jit Target runtime: ----------------

real    32m13.057s
user    613m52.843s
sys     54m58.678s

```

```
# others target
real    9m2.920s
user    164m21.927s
sys     12m54.840s
```

```
# serialization target
real    4m21.090s
user    23m33.501s
sys     1m53.308s

```

```
# tensorexpr
real    11m28.187s
user    33m36.420s
sys     1m15.925s
```

```
# type target
real    3m36.197s
user    51m47.912s
sys     4m14.149s
```

Reviewed By: malfet

Differential Revision: D22979219

fbshipit-source-id: 12a30839bb76a64871359bc024e4bff670c5ca8b
2020-08-10 09:48:59 -07:00
Nikolay Korovaiko
47c57e8804 rename TestFuser to TestTEFuser (#41542)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41542

Reviewed By: jamesr66a

Differential Revision: D22579606

Pulled By: Krovatkin

fbshipit-source-id: f65b2cae996b42d55ef864bc0b424d9d43d8a2e2
2020-07-22 13:37:27 -07:00
Michael Suo
ca1b8ebbcb move misc implementation out of jit/__init__.py (#41154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41154

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D22445213

Pulled By: suo

fbshipit-source-id: 200545715c5ef13beb1437f49e01efb21498ddb7
2020-07-13 16:59:55 -07:00
Jeff Daily
ac8c8b028d [ROCm] restore jit tests (#40447)
Summary:
Remove `skipIfRocm` from most jit tests and enable `RUN_CUDA_HALF` tests for ROCm.

These changes passed more than three rounds of CI testing against the ROCm CI.

CC ezyang xw285cornell sunway513
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40447

Differential Revision: D22190711

Pulled By: xw285cornell

fbshipit-source-id: bac44825a2675d247b3abe2ec2f80420a95348a3
2020-06-27 01:03:59 -07:00
Nikolay Korovaiko
5036c94a6e properly skip legacy tests regardless of the default executor (#40381)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381

Differential Revision: D22173938

Pulled By: Krovatkin

fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7
2020-06-26 11:13:50 -07:00
Wanchao Liang
27d789500b [test] split tracer related tests out of test_jit (#40142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40142

test_jit is becoming huge again, which makes editor hard to load and
write new tests, this split out the tracer related tests.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D22085035

Pulled By: wanchaol

fbshipit-source-id: 696bee84985ecfbfeac8e2ee5c27f1bdda8de394
2020-06-17 17:26:33 -07:00
Elias Ellison
daa85cfe2e [JIT] Exit Transform Rewrite (#38282)
Summary:
After an early return, we conditionalize all further execution. This means that currently the pattern of
`if return elif return elif return` generates better code than `if return if return if return`. It's obviously not good to have semantically equivalent code generate worse IR, so we should rewrite the graph to handle this case. This came up in https://github.com/pytorch/pytorch/pull/37171

```
torch.jit.script
def test_foo(x: bool, y: bool):
    if x:
        return 1
    return 2
print(test_foo.code)
```
generates:
```
def test_foo(x: bool,
    y: bool) -> int:
  _0 = uninitialized(int)
  if x:
    _1, _2 = True, 1
  else:
    _1, _2 = False, _0
  if _1:
    _3 = _2
  else:
    _3 = 2
  return _3
```
while
```
torch.jit.script
def test_foo(x: bool, y: bool):
    if x:
        return 1
    else:
        return 2
print(test_foo.code)
```
generates:
```
def test_foo(x: bool,
    y: bool) -> int:
  if x:
    _0 = 1
  else:
    _0 = 2
  return _0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38282

Differential Revision: D21576733

Pulled By: eellison

fbshipit-source-id: 80cf1ad7fbda6d8d58557abbfb21c90eafae7488
2020-05-15 12:22:28 -07:00
Vitaly Fedyunin
57d01be92b Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102

Test Plan: Imported from OSS

Differential Revision: D21477060

Pulled By: VitalyFedyunin

fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4
2020-05-09 14:48:55 -07:00
Mikhail Zolotukhin
4784af1d78 [TensorExpr] Don't include aten::rand_like to TE fusion groups since we can't handle rand+broadcast case yet. (#38132)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38132

Test Plan: Imported from OSS

Reviewed By: resistor

Differential Revision: D21479256

Pulled By: ZolotukhinM

fbshipit-source-id: 2678cfd6ad2feea132efb5eec09e5f41bbd54487
2020-05-08 13:37:13 -07:00
Elias Ellison
0e3a05ec00 [JIT] rename enable_profiling_mode to enable_profiling_mode_for_profiling_tests (#37825)
Summary:
The existing contextmanager only conditionally enabled_profiling_mode, which was counter intuitive. When we changed the default executor it broke internal benchmarking as a result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37825

Differential Revision: D21404611

Pulled By: eellison

fbshipit-source-id: 306b3c333ef4eb44ab6a6e5ab4e0682e5ce312ce
2020-05-06 11:30:02 -07:00
Nikolay Korovaiko
edc5ef1afb run the simple executor for jit tests by default, add profiling jobs … (#37017)
Summary:
…for fusion tests

fix flake8 warnings

fix ci failures

fix test_determination.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37017

Differential Revision: D21238446

Pulled By: Krovatkin

fbshipit-source-id: 393e6135883dc5ac57bdff580de96c66829d454c
2020-04-28 19:16:52 -07:00
Nikolay Korovaiko
a80a438e37 correctly set and restore states in te tests (#37210)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37210

Differential Revision: D21238634

Pulled By: Krovatkin

fbshipit-source-id: 6462239753399c10c871baa5d5fdff5465cf2544
2020-04-24 20:16:51 -07:00
Mikhail Zolotukhin
af5121f62a Invoke TensorExpr fuser pass from a graph executor. (#35913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913

The pass itself is still disabled by default, but with this change we
don't need to register it as a custom pass anymore. It allows us to
control its behavior with env variables more easily.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D20827189

Pulled By: ZolotukhinM

fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc
2020-04-03 12:20:26 -07:00
Christian Sarofeen
6d24f8fe21 Infrastructure for a new CUDA Fuser (#34785)
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
2020-04-02 09:22:42 -07:00
Bram Wasti
a3e10d2a17 Expose enablement of TensorExpr fuser as env variable (#35341)
Summary:
This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/

```
PYTORCH_TENSOREXPR=1 python benchmark.py
```

This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341

Reviewed By: ZolotukhinM

Differential Revision: D20676348

Pulled By: bwasti

fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464
2020-03-26 14:31:57 -07:00
Johannes M Dieterich
d807292c4a [ROCm] Hotfix disable tests (#35396)
Summary:
Regressions introduced sometime these last days - disable for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35396

Differential Revision: D20656744

Pulled By: xw285cornell

fbshipit-source-id: 386e4e5d50fb81a1d44e8f3558b81cb69299fe92
2020-03-26 00:21:40 -07:00
Mikhail Zolotukhin
6bcf0b407b [TensorExpr] Disable fuser-te cuda tests when run on ROCm. (#35388)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35388

Test Plan: Imported from OSS

Differential Revision: D20648735

Pulled By: ZolotukhinM

fbshipit-source-id: 27bd776fbb84ec81034ace4b874522413d9e5643
2020-03-25 16:04:15 -07:00
Mikhail Zolotukhin
12f0052eee Add TensorExpr Fuser tests (resubmit). (#35085)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35085

Test Plan: Imported from OSS

Differential Revision: D20552334

Pulled By: ZolotukhinM

fbshipit-source-id: 628fcf4719a879f18978ff8a0a64afbb045df645
2020-03-20 13:19:31 -07:00
Natalia Gimelshein
3c90a90730 Revert D20540599: Add TensorExpr Fuser tests.
Test Plan: revert-hammer

Differential Revision:
D20540599

Original commit changeset: ced9b6657fe7

fbshipit-source-id: e8fa11f20207c35f39b3fbe6f45fc627715377c1
2020-03-19 18:37:32 -07:00
Mikhail Zolotukhin
7b59f41009 Add TensorExpr Fuser tests. (#35052)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35052

Differential Revision: D20540599

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: ced9b6657fe72bca61833ab5d59bdaddcacd114b
2020-03-19 14:31:54 -07:00