pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
Wenzhe Xue	a2fe6953bc	Generate `nearbyint` for Round in tensorexpr llvm codegen, match `torch.round` result (#104430 ) Fixes #103465, which matches the behavior of `torch.round` ([doc](https://pytorch.org/docs/stable/generated/torch.round.html?highlight=round#torch.round)) - “round half to even” Using the repro code, the output is correct: ``` Using torch version=2.1.0a0+git84fedbc and optimization enabled=True [cpu ] Python = 2, Torch = 2, Torch traced = 2 Using torch version=2.1.0a0+git84fedbc and optimization enabled=False [cpu ] Python = 2, Torch = 2, Torch traced = 2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104430 Approved by: https://github.com/jgong5, https://github.com/davidberard98	2023-07-07 01:47:46 +00:00
Jason Ansel	ae57bd6630	PT2/TorchScript interoperability fix (#94678 ) Allows torch.compile() to inline into ScriptFunction Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678 Approved by: https://github.com/ezyang	2023-02-15 01:21:10 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
PyTorch MergeBot	cba96366a2	Revert "remove torch.equal usages (#89527 )" This reverts commit `4095ef8b80`. Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests `4095ef8b80` https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502	2022-12-02 21:36:13 +00:00
Philip Meier	4095ef8b80	remove torch.equal usages (#89527 ) Preparation for the next PR in this stack: #89559. I replaced - `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`, - the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and - `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default). There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527 Approved by: https://github.com/mruberry	2022-12-01 11:22:52 +00:00
Wu, Chunyuan	9c867eae1a	nnc: fix Store if value is fp32 while buf is bf16 (#86788 ) Fixes https://github.com/pytorch/pytorch/issues/86533. For the below graph: ```bash [DUMP kernel.cpp:1690] TensorExprKernel graph: [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)): [DUMP kernel.cpp:1690] %1 : int = prim::Constant[value=0]() [DUMP kernel.cpp:1690] %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29 [DUMP kernel.cpp:1690] %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19 [DUMP kernel.cpp:1690] return (%3) ``` Loop stmt before the fix: The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16. ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f; [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` Loop stmt after the fix: ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8)); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f); [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788 Approved by: https://github.com/EikanWang, https://github.com/kit1980	2022-11-24 02:52:34 +00:00
Wang, Eikan	83261ff9a8	Use high precision accmulate buffer for bf16 accmulation (#84402 ) Accumulation operation is not friendly to BFloat16 because its mantissa part is only 7bits while the operand could not impact the final result if it is very small. Take `a += b` as an example, `a` will become bigger with running the computation. And then, the variance between `a` and `b` also is being huge, the `b` would not impact `a`. Hence, the best practice is to use FP32 to do accumulation and then convert back to BF16 as long as the accumulation is finished. This PR also follows the best practice. We extend the `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand. Because we need to replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction. - Extend `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-0f4be13525117d5c49c69bd18e92eb15dda36b5a59b7a10c7e1114f5cac10afbR225-R229) - Replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R26-R36) - Cast the accumulation buffer from FP32 to BF16 and write back to the result buffer - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R62-R67) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84402 Approved by: https://github.com/frank-wei	2022-09-27 04:40:42 +00:00
Wang, Eikan	a531a604a0	Support BF16ImmPtr (#84041 ) - To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor - Enable BF16 test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041 Approved by: https://github.com/ZolotukhinM	2022-09-24 11:58:43 +00:00
Wang, Eikan	d3be4245bb	Fix the issue that cat result would be incorrect for channels-last (#85076 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85076 Approved by: https://github.com/frank-wei	2022-09-24 09:23:46 +00:00
Wang, Eikan	11b9a81e02	[NNC] channels last propagation within NNC fusion group (#76948 ) Decide the memory layout propagation policy and propagate it within the NNC fusion group. The memory layout propagation policy could be `Contiguous` and `Channels-last contiguous`. - `Contiguous`: Convert the non-contiguous including channels-last contiguous input tensors to contiguous and generate the contiguous output `Buf` for lowering function. - `Channels-last contiguous`: Convert the input tensors to channels-last contiguous and generate the channels-last contiguous output `Buf` for lowering function. Currently, the rule is simple. If all the input and out tensors of the NNC fusion group are channels-last contiguous, then the propagated memory layout is `Channels-last contiguous`. Otherwise, it is always `Contiguous` which is as same as current situation. It means that this PR provides a fast path to channels-last and the optimization is conservative since its trigger conditions are strict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76948 Approved by: https://github.com/ZolotukhinM	2022-05-30 18:31:49 +00:00
David Berard	3941b1ab05	[NNC] call super().setUp() & tearDown() in test_tensorexpr.py (#74504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74504 Same as #73762. This will make these tests obey PYTORCH_TEST_WITH_SLOW and PYTORCH_TEST_SKIP_FAST Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D35036771 Pulled By: davidberard98 fbshipit-source-id: a456c109cda365839cda56758ca4d6873e9e159c (cherry picked from commit eeb70f54422dee287391f700bce298f285992704)	2022-03-22 20:17:21 +00:00
David Berard	bbd42c605a	[JIT] Opinfo tests for nnc fusion - retry (#72486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486 Retry #70465. Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D34061628 Pulled By: davidberard98 fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524 (cherry picked from commit `7937808d2e`)	2022-02-09 19:01:22 +00:00
Nikita Shulga	bb101ec78d	Revert D33595240: [JIT] Opinfo tests for nnc fusion Test Plan: revert-hammer Differential Revision: D33595240 (`0b57bd4c66`) Original commit changeset: e2e17a921bc3 Original Phabricator Diff: D33595240 (`0b57bd4c66`) fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9 (cherry picked from commit `324cfaea86`)	2022-02-08 01:28:42 +00:00
David Berard	0b57bd4c66	[JIT] Opinfo tests for nnc fusion (#70465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465 These tests check to ensure that (a) the result after nnc fusion (of a single op) is the same as the unfused op (b) for certain ops where fusion is expected to occur, ensure that fusion does actually occur Test Plan: Imported from OSS Reviewed By: wenleix Differential Revision: D33595240 Pulled By: davidberard98 fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1 (cherry picked from commit `b1ba221acc`)	2022-02-07 20:56:21 +00:00
Jane Xu	49251d05ec	[skip ci] Set test owners for NNC tests (#66833 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833 Reviewed By: albanD Differential Revision: D31907812 Pulled By: janeyx99 fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602	2021-10-26 07:46:18 -07:00
Mikhail Zolotukhin	15724bcc03	[TensorExpr] Re-enable a float16 test. (#65632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65632 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D31181798 Pulled By: ZolotukhinM fbshipit-source-id: 1a57d0a878d44f8b73f3c24eef7ba707ce18fb70	2021-09-24 15:15:42 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Bert Maher	ebc0aacf83	[nnc] Fix half2float conversion and re-enable float16 (#64199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30643865 Pulled By: bertmaher fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b	2021-08-30 18:37:55 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Mikhail Zolotukhin	43a2f7c26a	[TensorExpr] Do not fuse float16 values. (#61569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29672564 Pulled By: ZolotukhinM fbshipit-source-id: fe64ec38209d43f8246bcb6c397b64a28cbd86fa	2021-07-14 12:53:59 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Mikhail Zolotukhin	daa35141e8	Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508 An assert that was triggering in a previous version is now relaxed to take 0-dim tensors into account. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918342 Pulled By: ZolotukhinM fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae	2021-06-08 22:48:17 -07:00
Nikita Shulga	ba3a90b55e	Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors. Test Plan: revert-hammer Differential Revision: D28819780 Original commit changeset: f3feff35a1ce fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a	2021-06-04 19:25:30 -07:00
Mikhail Zolotukhin	d60efd8207	[TensorExpr] Fix handling of 0-dim tensors. (#59279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279 There were some issues with how we handle 0-dim cases in lowerings and also in how we generate reductions in that special case. This PR fixes those issues and reenables a bunch of tests. Differential Revision: D28819780 D28819780 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736	2021-06-04 13:58:15 -07:00
Mikhail Zolotukhin	a0f4b7cd48	[TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206 Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`. Closes #48053. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28403250 Pulled By: ZolotukhinM fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f	2021-05-13 09:18:09 -07:00
Bert Maher	151e81b7bc	[nnc][tests] Skip long running tests when using TE interpreter (#57568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28202740 Pulled By: bertmaher fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b	2021-05-04 16:57:48 -07:00
Bert Maher	7c8a7efe3f	[nnc] Enable all fuser tests for cpu (#57332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28113481 Pulled By: bertmaher fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0	2021-04-30 10:11:06 -07:00
Bert Maher	17b8a4db1c	[nnc] Support `pow` on CPU (#56308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308 But only for float tensors. Even on CUDA, int tensors just have weird behavior with pow, and I bet FP is so much more common that it's just not worth trying to fuse ints here. ghstack-source-id: 126769637 Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow` Reviewed By: navahgar Differential Revision: D27834694 fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810	2021-04-20 15:13:03 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit `ad17fadbfc` (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Hui Guo	d8b28579c3	Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v3) (#52394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52394 Test Plan: Imported from OSS test/test_tensorexpr.py test/test_jit_fuser_te.py Reviewed By: bertmaher Differential Revision: D26497856 Pulled By: huiguoo fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0	2021-02-18 22:56:03 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Rong Rong (AI Infra)	fc5db4265b	[BE] replace unittest.main with run_tests (#50451 ) Summary: fix https://github.com/pytorch/pytorch/issues/50448. This replaces all `test/*.py` files with run_tests(). This PR does not address test files in the subdirectories because they seems unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50451 Reviewed By: janeyx99 Differential Revision: D25899924 Pulled By: walterddr fbshipit-source-id: f7c861f0096624b2791ad6ef6a16b1c4895cce71	2021-01-13 10:33:08 -08:00
Richard Barnes	a4383a69d4	Clean up some type annotations in caffe2/test (#49943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717534 fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146	2021-01-13 10:01:55 -08:00
Elias Ellison	9056173acc	[NNC] Dont inline outputs buffers on cpu (#49488 ) Summary: In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown. The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488 Reviewed By: ezyang Differential Revision: D25596071 Pulled By: eellison fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd	2020-12-16 16:28:25 -08:00
Bert Maher	626b8c0cf2	[te] Ban uint8 tensors from fusion groups (#49247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49247 uint8's expose all kind of corner cases in type promotion. As an example, consider: ``` >>> torch.tensor([1], dtype=torch.uint8).lt(-1) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor(-1)) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor([-1])) tensor([False]) ``` the difference is how promotions involving scalars (or 0-dim tensors, which are treated like scalars) are prioritized compared to tensor dtypes. Per eellison, the order is something like: 1. Tensor FP types 2. Scalar FP types 3. Tensor Int types 4. Scalar Int types The logic for this is here: `c73e97033a/aten/src/ATen/native/TypeProperties.cpp (L93)` AFAICT the effects are mainly visible for the unsigned byte type (the only unsigned type, besides bool) since the others degrade more or less gracefully. It's hard to re-use this logic as is in TensorIterator/TypeProperties, and it's complicated enough that it's not worth re-implementing in TE unless there's evidence that it matters for real models. ghstack-source-id: 118555597 Test Plan: `buck test //caffe2/test:jit` Reviewed By: eellison Differential Revision: D25489035 fbshipit-source-id: db3ab84286d472fd8a247aeb7b36c441293aad85	2020-12-14 17:40:15 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Mikhail Zolotukhin	2b70bcd014	[TensorExpr] Enable inlining for output tensors too. (#48967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967 We previously didn't inline output tensors which resulted in correctness issues like #48533. This PR allows inlining for output tensors too - this could result in duplicated computations, but we can address that later once correctness is ensured. Performance results on FastRNNS: Before the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05431 17.55 0.2108 aten 21.52 0.1276 26.7 1.471 jit 13.25 0.8748 22.47 1.73 jit_premul 11.43 0.3226 19.43 2.231 jit_premul_bias 11.84 0.2245 20.33 2.205 jit_simple 13.27 0.9906 22.15 0.9724 jit_multilayer 13.38 0.8748 22.82 1.01 py 33.55 4.837 46.41 6.333 ``` After the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05979 17.45 0.1987 aten 21.21 0.144 26.43 0.7356 jit 13.01 0.2925 23.21 0.8454 jit_premul 11.4 0.3905 19.62 2.448 jit_premul_bias 11.85 0.2461 20.29 0.6592 jit_simple 13.08 0.8533 22.81 1.315 jit_multilayer 12.93 0.1095 23.57 1.459 py 31.21 2.783 44.63 6.073 ``` Differential Revision: D25383949 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: ZolotukhinM fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a	2020-12-08 13:24:40 -08:00
Bert Maher	2d07d5b50a	[te] Don't fuse integer fmod or remainder (#48700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48700 fmod and remainder on int tensors will raise ZeroDivisionError if their divisors are 0. I don't think we should try to generate code that raises exceptions. If at some point we really wanted to fuse these, I might lean towards calling a C++ helper function from the generated code. ghstack-source-id: 117845642 Test Plan: `buck test //caffe2/test:jit -- test_binary_ops` Reviewed By: eellison Differential Revision: D25265792 fbshipit-source-id: 0be56ba3feafa1dbf3c37f6bb8c1550cb6891e6d	2020-12-04 18:02:29 -08:00
Mikhail Zolotukhin	5e1faa1d41	[TensorExpr] Fix aten::atan2 lowering and disable aten::pow lowering on CPU. (#48326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48326 The PR introduces a set of 'cuda-only' ops into `isSupported` function. It is done to disable `pow` lowering on CPU where it's tricky to support integer versions. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25129211 Pulled By: ZolotukhinM fbshipit-source-id: c62ae466e1d9ba9b3020519aadaa2a7fe7942d84	2020-11-21 09:15:42 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00
Nick Gibson	b1a4170ab3	[NNC] Fix lowering of aten::pow (#47795 ) Summary: NNC lowering of aten::pow assumes that the types of the exponent is either float or int cast to to float, which doesn't work great with double (or half for that matter). Fixes https://github.com/pytorch/pytorch/issues/47304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47795 Reviewed By: ZolotukhinM Differential Revision: D24904201 Pulled By: nickgg fbshipit-source-id: 43c3ea704399ebb36c33cd222db16c60e5b7ada5	2020-11-12 12:33:07 -08:00
Nick Gibson	f42cdc2e43	[NNC] Fix printing of integral doubles (#47799 ) Summary: When printing doubles, we don't do anything to distinguish intregal doubles (ie, 1 or 2) from ints. Added decoration of these doubles with `.0` if they are integral (i.e. DoubleImm(1) will print as `1.0`). This is an issue specifically on Cuda where some intrinsics do not have type coercion. Added a test which covers this case (without the fix it tries to look up pow(double, int) which doesn't exist). Fixes https://github.com/pytorch/pytorch/issues/47304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47799 Reviewed By: ZolotukhinM Differential Revision: D24904185 Pulled By: nickgg fbshipit-source-id: baa38726966c94ee50473cc046b9ded5c4e748f7	2020-11-12 11:02:34 -08:00

1 2

91 Commits