Commit Graph

91 Commits

Author SHA1 Message Date
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Wenzhe Xue
a2fe6953bc Generate nearbyint for Round in tensorexpr llvm codegen, match torch.round result (#104430)
Fixes #103465, which matches the behavior of `torch.round` ([doc](https://pytorch.org/docs/stable/generated/torch.round.html?highlight=round#torch.round)) - “round half to even”

Using the repro code, the output is correct:
```
Using torch version=2.1.0a0+git84fedbc and optimization enabled=True
[cpu ] Python = 2, Torch = 2, Torch traced = 2
Using torch version=2.1.0a0+git84fedbc and optimization enabled=False
[cpu ] Python = 2, Torch = 2, Torch traced = 2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104430
Approved by: https://github.com/jgong5, https://github.com/davidberard98
2023-07-07 01:47:46 +00:00
Jason Ansel
ae57bd6630 PT2/TorchScript interoperability fix (#94678)
Allows torch.compile() to inline into ScriptFunction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678
Approved by: https://github.com/ezyang
2023-02-15 01:21:10 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
PyTorch MergeBot
cba96366a2 Revert "remove torch.equal usages (#89527)"
This reverts commit 4095ef8b80.

Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests 4095ef8b80 https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502
2022-12-02 21:36:13 +00:00
Philip Meier
4095ef8b80 remove torch.equal usages (#89527)
Preparation for the next PR in this stack: #89559.

I replaced

- `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`,
- the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and
- `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default).

There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527
Approved by: https://github.com/mruberry
2022-12-01 11:22:52 +00:00
Wu, Chunyuan
9c867eae1a nnc: fix Store if value is fp32 while buf is bf16 (#86788)
Fixes https://github.com/pytorch/pytorch/issues/86533.
For the below graph:
```bash
[DUMP kernel.cpp:1690] TensorExprKernel graph:
[DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
[DUMP kernel.cpp:1690]   %1 : int = prim::Constant[value=0]()
[DUMP kernel.cpp:1690]   %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
[DUMP kernel.cpp:1690]   %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
[DUMP kernel.cpp:1690]   return (%3)
```

**Loop stmt before the fix:**
The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
[DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
[DEBUG llvm_codegen.cpp:489]   }
[DEBUG llvm_codegen.cpp:489] }
```

**Loop stmt after the fix:**
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
[DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
[DEBUG llvm_codegen.cpp:489]   }
[DEBUG llvm_codegen.cpp:489] }
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
Approved by: https://github.com/EikanWang, https://github.com/kit1980
2022-11-24 02:52:34 +00:00
Wang, Eikan
83261ff9a8 Use high precision accmulate buffer for bf16 accmulation (#84402)
Accumulation operation is not friendly to BFloat16 because its mantissa part is only 7bits while the operand could not impact the final result if it is very small.

Take `a += b` as an example, `a` will become bigger with running the computation. And then, the variance between `a` and `b` also is being huge, the `b` would not impact `a`.

Hence, the best practice is to use FP32 to do accumulation and then convert back to BF16 as long as the accumulation is finished. This PR also follows the best practice.

We extend the `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand. Because we need to replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction.

- Extend `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-0f4be13525117d5c49c69bd18e92eb15dda36b5a59b7a10c7e1114f5cac10afbR225-R229)
- Replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R26-R36)
- Cast the accumulation buffer from FP32 to BF16 and write back to the result buffer - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R62-R67)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84402
Approved by: https://github.com/frank-wei
2022-09-27 04:40:42 +00:00
Wang, Eikan
a531a604a0 Support BF16ImmPtr (#84041)
- To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor
- Enable BF16 test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041
Approved by: https://github.com/ZolotukhinM
2022-09-24 11:58:43 +00:00
Wang, Eikan
d3be4245bb Fix the issue that cat result would be incorrect for channels-last (#85076)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85076
Approved by: https://github.com/frank-wei
2022-09-24 09:23:46 +00:00
Wang, Eikan
11b9a81e02 [NNC] channels last propagation within NNC fusion group (#76948)
Decide the memory layout propagation policy and propagate it within the NNC fusion group. The memory layout propagation policy could be `Contiguous` and `Channels-last contiguous`.
 - `Contiguous`: Convert the non-contiguous including channels-last contiguous input tensors to contiguous and generate the contiguous output `Buf` for lowering function.
 - `Channels-last contiguous`: Convert the input tensors to channels-last contiguous and generate the channels-last contiguous output `Buf` for lowering function.

Currently, the rule is simple. If all the input and out tensors of the NNC fusion group are channels-last contiguous, then the propagated memory layout is `Channels-last contiguous`. Otherwise, it is always `Contiguous` which is as same as current situation. It means that this PR provides a fast path to channels-last and the optimization is conservative since its trigger conditions are strict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76948
Approved by: https://github.com/ZolotukhinM
2022-05-30 18:31:49 +00:00
David Berard
3941b1ab05 [NNC] call super().setUp() & tearDown() in test_tensorexpr.py (#74504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74504

Same as #73762. This will make these tests obey PYTORCH_TEST_WITH_SLOW
and PYTORCH_TEST_SKIP_FAST

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D35036771

Pulled By: davidberard98

fbshipit-source-id: a456c109cda365839cda56758ca4d6873e9e159c
(cherry picked from commit eeb70f54422dee287391f700bce298f285992704)
2022-03-22 20:17:21 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Jane Xu
49251d05ec [skip ci] Set test owners for NNC tests (#66833)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833

Reviewed By: albanD

Differential Revision: D31907812

Pulled By: janeyx99

fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602
2021-10-26 07:46:18 -07:00
Mikhail Zolotukhin
15724bcc03 [TensorExpr] Re-enable a float16 test. (#65632)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65632

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D31181798

Pulled By: ZolotukhinM

fbshipit-source-id: 1a57d0a878d44f8b73f3c24eef7ba707ce18fb70
2021-09-24 15:15:42 -07:00
Raghavan Raman
652a8bf7d0 [nnc] Updated indices during broadcast to use int64_t (#64627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627

This fixes the root cause of S242719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30801686

Pulled By: navahgar

fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
2021-09-09 08:29:37 -07:00
Bert Maher
ebc0aacf83 [nnc] Fix half2float conversion and re-enable float16 (#64199)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643865

Pulled By: bertmaher

fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b
2021-08-30 18:37:55 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Philip Meier
99203580a9 Updates internal assert_allclose callsites in favor of assert_close (#61841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58
2021-08-19 12:50:41 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Mikhail Zolotukhin
43a2f7c26a [TensorExpr] Do not fuse float16 values. (#61569)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61569

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29672564

Pulled By: ZolotukhinM

fbshipit-source-id: fe64ec38209d43f8246bcb6c397b64a28cbd86fa
2021-07-14 12:53:59 -07:00
Bert Maher
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
Mikhail Zolotukhin
daa35141e8 Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508

An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28918342

Pulled By: ZolotukhinM

fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
2021-06-08 22:48:17 -07:00
Nikita Shulga
ba3a90b55e Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors.
Test Plan: revert-hammer

Differential Revision:
D28819780

Original commit changeset: f3feff35a1ce

fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a
2021-06-04 19:25:30 -07:00
Mikhail Zolotukhin
d60efd8207 [TensorExpr] Fix handling of 0-dim tensors. (#59279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279

There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.

Differential Revision:
D28819780
D28819780

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
2021-06-04 13:58:15 -07:00
Mikhail Zolotukhin
a0f4b7cd48 [TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206

Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`.

Closes #48053.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28403250

Pulled By: ZolotukhinM

fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f
2021-05-13 09:18:09 -07:00
Bert Maher
151e81b7bc [nnc][tests] Skip long running tests when using TE interpreter (#57568)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28202740

Pulled By: bertmaher

fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b
2021-05-04 16:57:48 -07:00
Bert Maher
7c8a7efe3f [nnc] Enable all fuser tests for cpu (#57332)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28113481

Pulled By: bertmaher

fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0
2021-04-30 10:11:06 -07:00
Bert Maher
17b8a4db1c [nnc] Support pow on CPU (#56308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308

But only for float tensors.  Even on CUDA, int tensors just have weird
behavior with pow, and I bet FP is so much more common that it's just not worth
trying to fuse ints here.
ghstack-source-id: 126769637

Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow`

Reviewed By: navahgar

Differential Revision: D27834694

fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810
2021-04-20 15:13:03 -07:00
Bert Maher
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00
Natalia Gimelshein
ad17fadbfc Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1
Test Plan: revert-hammer

Differential Revision:
D27652485 (e7e164f9e6)

Original commit changeset: 182580cf758d

fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af
2021-04-14 20:23:15 -07:00
Bert Maher
e7e164f9e6 [nnc] Enable CPU fusion only when num_threads == 1 (#55621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621

Fuser support for thread-level parallelism is a work in progress, so
only fuse when the program is running single-threaded.
ghstack-source-id: 126069259

Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not

Reviewed By: ZolotukhinM

Differential Revision: D27652485

fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef
2021-04-14 09:16:54 -07:00
Hui Guo
d8b28579c3 Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v3) (#52394)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52394

Test Plan:
Imported from OSS

test/test_tensorexpr.py
test/test_jit_fuser_te.py

Reviewed By: bertmaher

Differential Revision: D26497856

Pulled By: huiguoo

fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0
2021-02-18 22:56:03 -08:00
Raghavan Raman
c7a70eec1b Make LLVM the default backend for TE (#52314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52264

When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.

This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314

Reviewed By: ejguan

Differential Revision: D26491294

Pulled By: navahgar

fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
2021-02-18 12:00:38 -08:00
Rong Rong (AI Infra)
fc5db4265b [BE] replace unittest.main with run_tests (#50451)
Summary:
fix https://github.com/pytorch/pytorch/issues/50448.

This replaces all `test/*.py` files with run_tests(). This PR does not address test files in the subdirectories because they seems unrelated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50451

Reviewed By: janeyx99

Differential Revision: D25899924

Pulled By: walterddr

fbshipit-source-id: f7c861f0096624b2791ad6ef6a16b1c4895cce71
2021-01-13 10:33:08 -08:00
Richard Barnes
a4383a69d4 Clean up some type annotations in caffe2/test (#49943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25717534

fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146
2021-01-13 10:01:55 -08:00
Elias Ellison
9056173acc [NNC] Dont inline outputs buffers on cpu (#49488)
Summary:
In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining,  which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in  perf slowdown.

The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488

Reviewed By: ezyang

Differential Revision: D25596071

Pulled By: eellison

fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd
2020-12-16 16:28:25 -08:00
Bert Maher
626b8c0cf2 [te] Ban uint8 tensors from fusion groups (#49247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49247

uint8's expose all kind of corner cases in type promotion.  As an example, consider:
```
>>> torch.tensor([1], dtype=torch.uint8).lt(-1)
tensor([True])
>>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor(-1))
tensor([True])
>>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor([-1]))
tensor([False])
```
the difference is how promotions involving scalars (or 0-dim tensors, which are treated like scalars) are prioritized compared to tensor dtypes.
Per eellison, the order is something like:
1. Tensor FP types
2. Scalar FP types
3. Tensor Int types
4. Scalar Int types

The logic for this is here: c73e97033a/aten/src/ATen/native/TypeProperties.cpp (L93)

AFAICT the effects are mainly visible for the unsigned byte type (the only unsigned type, besides bool) since the others degrade more or less gracefully.

It's hard to re-use this logic as is in TensorIterator/TypeProperties, and it's complicated enough that it's not worth re-implementing in TE unless there's evidence that it matters for real models.
ghstack-source-id: 118555597

Test Plan: `buck test //caffe2/test:jit`

Reviewed By: eellison

Differential Revision: D25489035

fbshipit-source-id: db3ab84286d472fd8a247aeb7b36c441293aad85
2020-12-14 17:40:15 -08:00
Elias Ellison
3b57be176e [NNC] Preserve strided output (#48264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264

Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel.

Fix for https://github.com/pytorch/pytorch/issues/45604

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D25286213

Pulled By: eellison

fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f
2020-12-10 12:19:51 -08:00
Mikhail Zolotukhin
2b70bcd014 [TensorExpr] Enable inlining for output tensors too. (#48967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967

We previously didn't inline output tensors which resulted in correctness
issues like #48533. This PR allows inlining for output tensors too -
this could result in duplicated computations, but we can address that
later once correctness is ensured.

Performance results on FastRNNS:
Before the fix:
```
Benchmarking LSTMs...
            name          avg_fwd          std_fwd          avg_bwd          std_bwd
           cudnn            10.09          0.05431            17.55           0.2108
            aten            21.52           0.1276             26.7            1.471
             jit            13.25           0.8748            22.47             1.73
      jit_premul            11.43           0.3226            19.43            2.231
 jit_premul_bias            11.84           0.2245            20.33            2.205
      jit_simple            13.27           0.9906            22.15           0.9724
  jit_multilayer            13.38           0.8748            22.82             1.01
              py            33.55            4.837            46.41            6.333
```
After the fix:
```
Benchmarking LSTMs...
            name          avg_fwd          std_fwd          avg_bwd          std_bwd
           cudnn            10.09          0.05979            17.45           0.1987
            aten            21.21            0.144            26.43           0.7356
             jit            13.01           0.2925            23.21           0.8454
      jit_premul             11.4           0.3905            19.62            2.448
 jit_premul_bias            11.85           0.2461            20.29           0.6592
      jit_simple            13.08           0.8533            22.81            1.315
  jit_multilayer            12.93           0.1095            23.57            1.459
              py            31.21            2.783            44.63            6.073
```

Differential Revision: D25383949

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Pulled By: ZolotukhinM

fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a
2020-12-08 13:24:40 -08:00
Bert Maher
2d07d5b50a [te] Don't fuse integer fmod or remainder (#48700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48700

fmod and remainder on int tensors will raise ZeroDivisionError if their divisors are 0.  I don't think we should try to generate code that raises exceptions.  If at some point we really wanted to fuse these, I might lean towards calling a C++ helper function from the generated code.
ghstack-source-id: 117845642

Test Plan: `buck test //caffe2/test:jit -- test_binary_ops`

Reviewed By: eellison

Differential Revision: D25265792

fbshipit-source-id: 0be56ba3feafa1dbf3c37f6bb8c1550cb6891e6d
2020-12-04 18:02:29 -08:00
Mikhail Zolotukhin
5e1faa1d41 [TensorExpr] Fix aten::atan2 lowering and disable aten::pow lowering on CPU. (#48326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48326

The PR introduces a set of 'cuda-only' ops into `isSupported` function.
It is done to disable `pow` lowering on CPU where it's tricky to support
integer versions.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D25129211

Pulled By: ZolotukhinM

fbshipit-source-id: c62ae466e1d9ba9b3020519aadaa2a7fe7942d84
2020-11-21 09:15:42 -08:00
Nikolay Korovaiko
0d8ddb5ec2 Make softmax and log_softmax handle negative dims, add tests (#48156)
Summary:
Make softmax and log_softmax handle negative dims, add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156

Reviewed By: bertmaher

Differential Revision: D25059788

Pulled By: Krovatkin

fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad
2020-11-19 01:38:14 -08:00
Nick Gibson
b1a4170ab3 [NNC] Fix lowering of aten::pow (#47795)
Summary:
NNC lowering of aten::pow assumes that the types of the exponent is either float or int cast to to float, which doesn't work great with double (or half for that matter).

Fixes https://github.com/pytorch/pytorch/issues/47304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47795

Reviewed By: ZolotukhinM

Differential Revision: D24904201

Pulled By: nickgg

fbshipit-source-id: 43c3ea704399ebb36c33cd222db16c60e5b7ada5
2020-11-12 12:33:07 -08:00
Nick Gibson
f42cdc2e43 [NNC] Fix printing of integral doubles (#47799)
Summary:
When printing doubles, we don't do anything to distinguish intregal doubles (ie, 1 or 2) from ints. Added decoration of these doubles with `.0` if they are integral (i.e. DoubleImm(1) will print as `1.0`).

This is an issue specifically on Cuda where some intrinsics do not have type coercion. Added a test which covers this case (without the fix it tries to look up pow(double, int) which doesn't exist).

Fixes https://github.com/pytorch/pytorch/issues/47304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47799

Reviewed By: ZolotukhinM

Differential Revision: D24904185

Pulled By: nickgg

fbshipit-source-id: baa38726966c94ee50473cc046b9ded5c4e748f7
2020-11-12 11:02:34 -08:00