pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
kshitij12345	c9af4c2636	OpInfo: where (#58349 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58349 Reviewed By: mrshenli Differential Revision: D28744220 Pulled By: mruberry fbshipit-source-id: 893a2fb88a48a60df75c7d6e2f58a42ca949daa7	2021-05-28 18:22:03 -07:00
Kushashwa Ravi Shrimali	0c1420aa3c	OpInfo: `fmod` and `remainder` (#57941 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 cc: mruberry Lezcano kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941 Reviewed By: mrshenli Differential Revision: D28744464 Pulled By: mruberry fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c	2021-05-27 20:32:56 -07:00
Bin Bao	7e4e648c2a	Enable NNC fusion for relu6 (#58773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58773 Test Plan: ``` python test/test_ops.py -k relu6 python test/test_jit_fuser_te.py ``` Reviewed By: bertmaher Differential Revision: D28721791 Pulled By: desertfire fbshipit-source-id: a94f711977afd080faae052f66eb8dded3cdc79e	2021-05-27 10:54:02 -07:00
Bert Maher	e24362746a	[nnc] Concat input shapes must be known to fuse (#58974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58974 I don't know how we overlooked this for so long... ghstack-source-id: 129932134 Test Plan: Predictor test of model 184778294_0 using multiple request replay threads. It's not clear to me why multithreading matters, except that perhaps it makes it easier to get an unknown shape in the profile. Reviewed By: navahgar Differential Revision: D28702660 fbshipit-source-id: 565550b1d2e571d62d0c8b21150193f2a7ace334	2021-05-26 11:29:26 -07:00
Horace He	6093161158	Separated out working tests from not working tests for NNC OpInfo (#58788 ) Summary: This gets rid of a lot of the try/else rigamarole. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58788 Reviewed By: ZolotukhinM Differential Revision: D28621054 Pulled By: Chillee fbshipit-source-id: d0d8a1b6466eb318d939a1ed172b78f492ee0d5b	2021-05-22 02:24:23 -07:00
Horace He	e56d3b0238	Added OpInfo tests for NNC (#58719 ) Summary: Finds a couple of bugs: 1. permute needs to wrap dimensions 2. slice needs to wrap dimensions 3. frac doesn't work correctly for negative values 4. Permute has some other failures. This PR also fixes 1 + 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58719 Reviewed By: SplitInfinity Differential Revision: D28590457 Pulled By: Chillee fbshipit-source-id: a67fce67799602f9396bfeef615e652364918fbd	2021-05-21 01:41:28 -07:00
Edvard Ghazaryan	5211eeb22b	Support aten::leaky_relu for TE (#58464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58464 Test Plan: ./bin/test_tensorexpr python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops Reviewed By: Krovatkin Differential Revision: D28499776 fbshipit-source-id: 20094a1bc78aa485f76aec4e065ff69e43d692d7	2021-05-20 16:12:03 -07:00
Bert Maher	3d20ddfe92	[nnc] Do not fuse unsqueeze with variable dim (#58346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58346 If `dim` is a variable, NNC doesn't know how to translate the result, since the shape is unknown. This issue manifested as a `bad_variant_access` when we try to pull an int constant out of that arg. Note that, while the PE will pick up the resultant shape, it won't set guards accordingly. ghstack-source-id: 129078971 Test Plan: new fuser test Reviewed By: navahgar Differential Revision: D28460956 fbshipit-source-id: 57ef918ef309ee57bfdf86717b910b6549750454	2021-05-18 21:44:37 -07:00
Bert Maher	6b8b591a84	[nnc] Fix output restriding of size-1 dimensions (#58256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58256 Size-1 dims mess up our output restriding logic, because they're technically "dense" no matter what stride the dimension has. In this example a size-1 dim has stride 1, which causes all the indices to be taken mod 1 (i.e., all indices become 0). We work around this peculiar case by skipping size-1 in our layout logic, since it has no impact on the rest of the tensor's indexing. ghstack-source-id: 128932739 Test Plan: new unit test, plus ``` buck test mode/dev //langtech/mobile/audio_stream_processor:audio_stream_processor_test -- --exact 'langtech/mobile/audio_stream_processor:audio_stream_processor_test - AudioStreamProcessorTest.DemucsReadWriteFloat' ``` Reviewed By: eellison Differential Revision: D28424388 fbshipit-source-id: e33e39eef2a5bf2797bee78a5987558308b6d110	2021-05-14 00:09:12 -07:00
Nick Korovaiko	c524448dd1	init hardshrink (#57749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749 add to a fx test Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28425974 fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17	2021-05-13 19:38:05 -07:00
Mikhail Zolotukhin	470cd64514	[TensorExpr] Remove disabled tests that we do not plan to re-enable. (#58207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58207 We probably don't even know what these tests check and there are no plans on re-enabling them - let's just nuke them to keep the code clean. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28403251 Pulled By: ZolotukhinM fbshipit-source-id: fe12e978636a74f309f57e3408ab78d459fe4d29	2021-05-13 09:19:20 -07:00
Mikhail Zolotukhin	a0f4b7cd48	[TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206 Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`. Closes #48053. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28403250 Pulled By: ZolotukhinM fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f	2021-05-13 09:18:09 -07:00
Bert Maher	6955d4d0f7	[nnc] Handle only the first argument of aten::to (#58028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58028 We were trying to translate the device argument and thus throwing an unsupported dtype. ghstack-source-id: 128748658 Test Plan: predictor models Reviewed By: navahgar Differential Revision: D28347704 fbshipit-source-id: 331a5786339e01f9df1b1878970b0c5983a92980	2021-05-12 12:52:29 -07:00
Bert Maher	f97650e70b	[nnc] Fix float->bool conversion on cpu (#57798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57798 Our instruction sequence was just plain wrong, instead of `fcmp une %x, +0.0` (unordered equal 0.0) we were doing `fcmp uno`, which is just an unordered check (i.e., is either side NaN). ghstack-source-id: 128586464 Test Plan: New unit test against the full cross-product of dtypes. Reviewed By: navahgar Differential Revision: D28276269 fbshipit-source-id: ba5e59778e07770fb78ef02309f10edde333a800	2021-05-10 18:31:38 -07:00
Elias Ellison	241c2f4496	Add Gelu To NNC (#57753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57753 I'm not adding symbolic gradient because that is being added in https://github.com/pytorch/pytorch/pull/46785. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28262765 Pulled By: eellison fbshipit-source-id: be365a2d392d7ac4bcc099a184762249ec2e18a6	2021-05-06 16:04:50 -07:00
Elias Ellison	7627dd568a	hardswish reland (#57652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57652 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D28226724 Pulled By: eellison fbshipit-source-id: 585a91ffab7a855b5600e79130a37be25ef9b354	2021-05-05 17:21:43 -07:00
Shen Li	887d0e5657	Revert D28197820: [JIT][NNC] add hardswish symbolic gradient and NNC lowering Test Plan: revert-hammer Differential Revision: D28197820 (`0142fd0b57`) Original commit changeset: 05305d85c5bb fbshipit-source-id: 2e1d9699515982ba2a9be06e83a2ce043ec857ee	2021-05-05 07:53:30 -07:00
eellison	0142fd0b57	[JIT][NNC] add hardswish symbolic gradient and NNC lowering (#57383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57383 Notes: I picked up an activation from https://github.com/pytorch/pytorch/issues/56969. You can look at the [activations.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L429) file which has both forward and backward kernel code to help you write the NNC lowering and the symbolic gradient. I added a test in test_jit_fuser_te for the fusion, and I added an OpInfo and asserted that we expect to see autodiffable nodes to test the symbolic gradient. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28197820 Pulled By: eellison fbshipit-source-id: 05305d85c5bb0847c8f911b95ba47b137dca7e90	2021-05-04 23:39:59 -07:00
Bert Maher	151e81b7bc	[nnc][tests] Skip long running tests when using TE interpreter (#57568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28202740 Pulled By: bertmaher fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b	2021-05-04 16:57:48 -07:00
Bert Maher	7c8a7efe3f	[nnc] Enable all fuser tests for cpu (#57332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28113481 Pulled By: bertmaher fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0	2021-04-30 10:11:06 -07:00
Bert Maher	17b8a4db1c	[nnc] Support `pow` on CPU (#56308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308 But only for float tensors. Even on CUDA, int tensors just have weird behavior with pow, and I bet FP is so much more common that it's just not worth trying to fuse ints here. ghstack-source-id: 126769637 Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow` Reviewed By: navahgar Differential Revision: D27834694 fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810	2021-04-20 15:13:03 -07:00
Mikhail Zolotukhin	5f19385588	[TensorExpr] Add aten::matmuls to TE fuser. (#54605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54605 For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27298364 Pulled By: ZolotukhinM fbshipit-source-id: 2ddf275ff68d6fca16a3befca5ce5c26aef462b5	2021-04-16 12:54:38 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit `ad17fadbfc` (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Bert Maher	b940516061	[nnc] Don't fuse fp16 on CPU (#56119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56119 There are apparently still more issues with fp16 on LLVM so let's just nuke it from orbit while we develop a robust workaround. ghstack-source-id: 126619411 Test Plan: compile Reviewed By: ZolotukhinM Differential Revision: D27787080 fbshipit-source-id: 9e771211fe48266f50fca1de8d40295922da5bca	2021-04-15 14:01:29 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Natalia Gimelshein	506eca24b9	Revert D27752279: [nnc] Do not try to vectorize kernels that use float16 Test Plan: revert-hammer Differential Revision: D27752279 (`8df5e61fd6`) Original commit changeset: ac115080bf2a fbshipit-source-id: cbc0aa2dcb7691d9fc9d081c6169dea711cd9fac	2021-04-14 20:21:40 -07:00
Bert Maher	8df5e61fd6	[nnc] Do not try to vectorize kernels that use float16 (#55970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55970 LLVM's support for float16 is not great, and we were seeing assertion failures trying to generate code for vectorized uses. I note that clang doesn't even try to vectorize operations involving half: https://gcc.godbolt.org/z/86MW4xr17, so that's a good sign we shouldn't either. Fixes #55905 ghstack-source-id: 126511474 Test Plan: pytest test_jit_fuser_te.py -k test_isnan Reviewed By: asuhan Differential Revision: D27752279 Pulled By: bertmaher fbshipit-source-id: ac115080bf2a4a73d52b396d64a5bce0cf13abfe	2021-04-14 11:28:34 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Nikita Shulga	c47cc30bf5	Skip testing torch.float16 in test_isnan (#55906 ) Summary: See https://github.com/pytorch/pytorch/issues/55905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55906 Reviewed By: walterddr Differential Revision: D27737356 Pulled By: malfet fbshipit-source-id: 39571cfe6f078af8bb7387ed459a5d0f2410bad1	2021-04-13 14:44:43 -07:00
Bert Maher	42486963b2	Integrate NNC conv2d with fuser (#55213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55213 Adds the integration of conv2d with the TE fuser. A few things of interest: - I'm super selective of what convs get lowered. Only 3x3 depthwise, because I've benchmarked those to death and I'm pretty sure it's a good change. - I'm allowing single-node "fusion" groups for supported convs. (Maybe this is a sign that conv2d codegen should go through a different path entirely, but it seems to basically work). I'll shared full benchmarkr results once I clean them up a little. To summarize, I tested the following torchvision models containing depthwise convolutions. Results are single-core on a skylake-avx512: mobilenet_v2: 8% improvement mobilenet_v3: 9% improvement mnasnet: 10% improvement shufflenet: 18% improvement Note these are comparing against a baseline with a fast-but-buggy grouped convolution implementation in MKLDNN. So perf results will be better if compared on master, but I'm going to assume the MKLDNN bug will be fixed and re-enabled. Perf results are more complicated when comparing to freezing plus conversion to mkldnn layout; mobilenet v2/v3 are still faster, but mnasnet and shufflenet are not. Landing this doesn't prevent MKLDNN freezing from kicking in though, so there's no harm (although landing mkldnn freezing will regress mobilenet, but cest la vie). ghstack-source-id: 126076112 Test Plan: New unit test, plus torchvision Reviewed By: ZolotukhinM Differential Revision: D27530272 fbshipit-source-id: 92153fad234bc9f1eaa4f7624c543168d1294a87	2021-04-08 21:58:27 -07:00
Hui Guo	2a53897114	[jit][tensorexpr] Added aten::batch_norm into fuser when in inference mode (#54204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54204 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27134348 Pulled By: huiguoo fbshipit-source-id: 5ea7a6c5bc694fcdfc436dba3fa6eb269420324e	2021-03-23 04:41:52 -07:00
Nikolay Korovaiko	d4527b4e16	add a full pipeline test for a TypeCheck (#52933 ) Summary: This tests a simple failure mode for a TypeCheck when a shape changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52933 Reviewed By: H-Huang Differential Revision: D26727583 Pulled By: Krovatkin fbshipit-source-id: b277218af9572cd6f89f2ece044f7d84d4c10283	2021-03-01 10:58:08 -08:00
jiej	4d94ee566e	Ge v1 (#52136 ) Summary: This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136 Reviewed By: pbelevich Differential Revision: D26693978 Pulled By: Krovatkin fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52	2021-02-28 00:53:13 -08:00
Hui Guo	d8b28579c3	Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v3) (#52394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52394 Test Plan: Imported from OSS test/test_tensorexpr.py test/test_jit_fuser_te.py Reviewed By: bertmaher Differential Revision: D26497856 Pulled By: huiguoo fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0	2021-02-18 22:56:03 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Alex Suhan	1bde5a216f	[TensorExpr] Use wider type for scalars (#50774 ) Summary: Scalars have to be double / 64-bit integers to match eager semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50774 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_clamp Reviewed By: ngimel Differential Revision: D25978214 Pulled By: asuhan fbshipit-source-id: ba765b7d215239f2bf0f3d467e4dce876f7ccb91	2021-01-20 15:12:27 -08:00
Nikolay Korovaiko	526659db20	whitelist ops we can build shapes for (#49125 ) Summary: Whitelist ops we can build shapes for. Otherwise, `buildShapeExpressions` assumes that `aten::unsqueeze` is just a regular op. ``` [DUMP tensorexpr_fuser.cpp:329] buildShapeExpressions for [DUMP tensorexpr_fuser.cpp:329] graph(%1 : float, [DUMP tensorexpr_fuser.cpp:329] %3 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0), [DUMP tensorexpr_fuser.cpp:329] %8 : float, [DUMP tensorexpr_fuser.cpp:329] %10 : Float(50, strides=[1], requires_grad=0, device=cuda:0)): [DUMP tensorexpr_fuser.cpp:329] %11 : int = prim::Constant[value=1]() [DUMP tensorexpr_fuser.cpp:329] %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11) [DUMP tensorexpr_fuser.cpp:329] %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8) [DUMP tensorexpr_fuser.cpp:329] %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11) [DUMP tensorexpr_fuser.cpp:329] %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1) [DUMP tensorexpr_fuser.cpp:329] return (%2, %6, %9) [DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %3 %162 : int[] = aten::size(%27) [DEBUG tensorexpr_fuser.cpp:347] Adding a mapping for %10 %163 : int[] = aten::size(%23) [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %12 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::unsqueeze(%10, %11) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %10 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %9 : Float(50, 1, strides=[1, 1], requires_grad=0, device=cuda:0) = aten::mul(%12, %8) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %12 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %6 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::add(%3, %9, %11) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %3 [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %9 [DEBUG tensorexpr_fuser.cpp:402] Building sizes for %2 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = aten::div(%6, %1) [DEBUG tensorexpr_fuser.cpp:405] Getting aten::size for %6 [DEBUG tensorexpr_fuser.cpp:907] Inserting a typecheck guard for a node%156 : Float(50, 28, strides=[28, 1], requires_grad=0, device=cuda:0) = prim::TensorExprGroup[Subgraph=<Graph>](%3, %27, %16, %23) [DUMP tensorexpr_fuser.cpp:463] After guarding fusion groups: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49125 Reviewed By: albanD Differential Revision: D25926997 Pulled By: Krovatkin fbshipit-source-id: f8041bbfc12be16c329754c6d16911d12aa352ef	2021-01-19 16:17:21 -08:00
Richard Barnes	a4383a69d4	Clean up some type annotations in caffe2/test (#49943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D25717534 fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146	2021-01-13 10:01:55 -08:00
Thomas Viehmann	ea087e2d92	JIT: guard DifferentiableGraph node (#49433 ) Summary: This adds guarding for DifferentiableGraph nodes in order to not depend on Also bailing out on required gradients for the CUDA fuser. Fixes https://github.com/pytorch/pytorch/issues/49299 I still need to look into a handful of failing tests, but maybe it can be a discussion basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433 Reviewed By: ngimel Differential Revision: D25681374 Pulled By: Krovatkin fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296	2021-01-08 20:01:27 -08:00
Elias Ellison	6eee2a0a9f	[JIT] disable masked fill (#50147 ) Summary: There is an internal user who is experiencing a bug with masked_fill. While I am almost certain this corresponds to an old pytorch version with the bug, the model that is breaking is important and time-sensitive and we are covering all bases to try to get it to work again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50147 Reviewed By: nhsoukai Differential Revision: D25806541 Pulled By: eellison fbshipit-source-id: 131bd71b5db9717a8a9cb97973d0b4f0e96455d6	2021-01-06 11:36:30 -08:00
Elias Ellison	268441c7d8	[NNC] masked fill (#49627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49627 There was a bug in the test that was hidden by the `If eager mode doesn't support a dtype/op/device combo` try / catch, so cuda wasn't being tested � The fix is just to rename `aten::masked_fill` to `aten_masked_fill`. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25696409 Pulled By: eellison fbshipit-source-id: 83de1f5a194df54fe317b0035d4a6c1aed1d19a0	2020-12-28 10:37:02 -08:00
Elias Ellison	3659560fba	[NNC] Disable masked fill (#49622 ) Summary: There's a bug internally, disable as quick fix before investigation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49622 Test Plan: Imported from GitHub, without a `Test Plan:` line. build Reviewed By: zheng-xq, PursueHappinessDirectly Differential Revision: D25651897 Pulled By: eellison fbshipit-source-id: dd1454f2ef7506d7844016128aa6320d7e69aa6e	2020-12-18 16:28:00 -08:00
Peng Wu	6568572712	Support integral types for kAbs in SimpleIREvaluator (#49357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357 This is a follow-up fix for PR #48679, where the previous PR adds support for integer inputs to aten::abs by promoting integers to float and then demote the result back to integers. This PR supports integer inputs to aten::abs more efficiently in the SimpleIREvaluator by allowing implementing integer inputs for kAbs (renamed from kFabs). - Rename kFabs to kAbs - Add support for integer input to kAbs in SimpleIREvalator (note that: llvm_codegen and cuda_codegen already supports integer inputs to kAbs) Test Plan: - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` Imported from OSS Reviewed By: eellison Differential Revision: D25545791 fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230	2020-12-18 07:57:58 -08:00
Elias Ellison	904586271b	Add fusion support of aten::to (#48976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48976 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25413164 Pulled By: eellison fbshipit-source-id: 0c31787e8b5e1368b0cba6e23660799b652389cd	2020-12-16 18:36:16 -08:00
Elias Ellison	80b508f207	[NNC] add support for masked_fill (#48974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48974 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413165 Pulled By: eellison fbshipit-source-id: 8cece1dc3692389be90c0d77bd71b103254d5ad3	2020-12-16 18:36:13 -08:00
Elias Ellison	50386b9988	[NNC] Add Support For is_nan (#48973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48973 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413166 Pulled By: eellison fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef	2020-12-16 18:31:01 -08:00
Bert Maher	f4e15c4a23	[te] Fix bugs with shift operators (#49396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49271 Two things: 1. These throw exceptions in their constructor, which causes a segfault (), so move the exceptions to ::make. 2. They technically support FP types but the rules are complicated so let's not bother. () The reason for the segfault: all Exprs including these inherit from KernelScopedObject, whose constructor adds the object to a list for destruction at the end of the containing KernelArena's lifetime. But if the derived-class constructor throws, the object is deleted even though it's still in the KernelArena's list. So when the KernelArena is itself deleted, it double-frees the pointer and dies. I've also fixed And, Or, and Xor in this diff. ghstack-source-id: 118594998 Test Plan: `buck test //caffe2/test:jit` Reviewed By: bwasti Differential Revision: D25512052 fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba	2020-12-15 12:44:59 -08:00
Bert Maher	626b8c0cf2	[te] Ban uint8 tensors from fusion groups (#49247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49247 uint8's expose all kind of corner cases in type promotion. As an example, consider: ``` >>> torch.tensor([1], dtype=torch.uint8).lt(-1) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor(-1)) tensor([True]) >>> torch.tensor([1], dtype=torch.uint8).lt(torch.tensor([-1])) tensor([False]) ``` the difference is how promotions involving scalars (or 0-dim tensors, which are treated like scalars) are prioritized compared to tensor dtypes. Per eellison, the order is something like: 1. Tensor FP types 2. Scalar FP types 3. Tensor Int types 4. Scalar Int types The logic for this is here: `c73e97033a/aten/src/ATen/native/TypeProperties.cpp (L93)` AFAICT the effects are mainly visible for the unsigned byte type (the only unsigned type, besides bool) since the others degrade more or less gracefully. It's hard to re-use this logic as is in TensorIterator/TypeProperties, and it's complicated enough that it's not worth re-implementing in TE unless there's evidence that it matters for real models. ghstack-source-id: 118555597 Test Plan: `buck test //caffe2/test:jit` Reviewed By: eellison Differential Revision: D25489035 fbshipit-source-id: db3ab84286d472fd8a247aeb7b36c441293aad85	2020-12-14 17:40:15 -08:00
Bert Maher	ae88d25c23	[te] Fix clamp with uint8 args (#49143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49143 Riddle me this, batman: how could `torch.clamp(torch.tensor([0], dtype=torch.uint8), -10, 10)` equal `10`? The answer: the min/max args are first cast to the dtype of the input, giving min=246 and max 10. Then you have to apply Min and Max in the right order: `Min(Max(in, min), max)`. Differ in any way and you're doomed. Hooray. This PR makes TE match eager mode for this operator, plus fixes a major facepalm in the llvm min/max codegen where we were always generating signed comparisons. ghstack-source-id: 118415318 Test Plan: `buck test //caffe2/test:{jit,tensorexpr}` Reviewed By: robieta Differential Revision: D25456366 fbshipit-source-id: dde3c26c2134bdbe803227601fa3d23eaac750fb	2020-12-11 22:36:52 -08:00
Peng Wu	a47a087a43	[NNC] Add missing data type support for abs and frac (#48679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48679 This addresses the remaining problem reported in issue #48053 Data type supports for aten kernels in SimpleIREvaluator are not consistent w/ aten::native library implementation. In SimpleIREvaluator, - only float/double are supported on aten::abs (integral types and half are missing) - only float/double are supported on aten::frac (half are missing) It is also not clear from kernel.cpp source code what are the expected input data types for an aten kernel, leading to potential missing data type issues down the road. This commit addresses both issues in a limited way by - Added type promotion ops from half/integral input types to float - Added a skeleton support for some type checking for aten kernels, currently, only check for valid data types for frac and abs to limit the scope of the change; but the utility function can be used for consistently adding type checking for all aten functions Known limitations: - abs support for integral types can be made more effective by invoking std::abs for integral tensors (currently kFabs maps to std::fabs). Since that change is a bit more involved (e.g., changing IntrinsicsOp kFabs to kAbs and other code generators accordingly), will leave it to another issue - other aten kernels may need similar type checking and some scrutiny on the use of promoteToFloat to detect invalid data types early on. That is also left for another issue Test Plan: test_jit_fuser_te.test_unary_ops Imported from OSS Reviewed By: asuhan Differential Revision: D25344839 fbshipit-source-id: 95aca04c99b947dc20f11e4b3bae002f0ae37044	2020-12-10 17:47:15 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Bert Maher	2d07d5b50a	[te] Don't fuse integer fmod or remainder (#48700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48700 fmod and remainder on int tensors will raise ZeroDivisionError if their divisors are 0. I don't think we should try to generate code that raises exceptions. If at some point we really wanted to fuse these, I might lean towards calling a C++ helper function from the generated code. ghstack-source-id: 117845642 Test Plan: `buck test //caffe2/test:jit -- test_binary_ops` Reviewed By: eellison Differential Revision: D25265792 fbshipit-source-id: 0be56ba3feafa1dbf3c37f6bb8c1550cb6891e6d	2020-12-04 18:02:29 -08:00
Peng Wu	bc2352e8c3	[NNC] Complete SimpleIREvaluator support for bitwise ops (#48053 ) (#48179 ) Summary: Add missing types for bitwise_ops in `SimpleIREvaluator` This is the first part of fixes for issue https://github.com/pytorch/pytorch/issues/48053. - Original implementation of bitwise_ops supports only int operands, the fix all support for integral types supported by the IR Pull Request resolved: https://github.com/pytorch/pytorch/pull/48179 Test Plan: `python test/test_jit_fuser_te.py TestTEFuser.test_bitwise_ops` Reviewed By: ZolotukhinM Differential Revision: D25126944 Pulled By: penguinwu fbshipit-source-id: 04dc7fc00c93b2bf1bd9f9cd09f7252357840b85	2020-12-04 08:10:18 -08:00
Mikhail Zolotukhin	d0e9523c4f	[TensorExpr] Add more operator tests. (#48677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48677 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25258656 Pulled By: ZolotukhinM fbshipit-source-id: 173b87568f3f29f04d06b8621cbfbd53c38e4771	2020-12-01 17:34:09 -08:00
Bert Maher	adb4fd3f2f	[te] Fix comparison ops on booleans (#48384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48384 As title Test Plan: buck test //caffe2/test:jit -- test_binary_ops Reviewed By: asuhan Differential Revision: D25115773 fbshipit-source-id: c5f8ee21692bcf0d78f099789c0fc7c457a1e4a2	2020-11-30 18:21:35 -08:00
Mikhail Zolotukhin	d9f5ac0805	[TensorExpr] Add a envvar to disable LLVM backend and use IR Eval instead. (#48355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48355 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25139668 Pulled By: ZolotukhinM fbshipit-source-id: 34dfcceadb24446d103710f00526693a53f3750f	2020-11-30 18:16:28 -08:00
Mikhail Zolotukhin	a6f0c3c4f0	[TensorExpr] IREval: fix div for Half dtype. (#48354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48354 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25139669 Pulled By: ZolotukhinM fbshipit-source-id: a7eccad883d8b175d7d73db48bd366382eabea53	2020-11-30 18:14:08 -08:00
Mikhail Zolotukhin	b967119906	[TensorExpr] Fix lowering for aten::div. (#48329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48329 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25130750 Pulled By: ZolotukhinM fbshipit-source-id: 7c6345adcaec5f92cd6ce78b01f6a7d5923c0004	2020-11-21 09:20:28 -08:00
Mikhail Zolotukhin	5e1faa1d41	[TensorExpr] Fix aten::atan2 lowering and disable aten::pow lowering on CPU. (#48326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48326 The PR introduces a set of 'cuda-only' ops into `isSupported` function. It is done to disable `pow` lowering on CPU where it's tricky to support integer versions. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25129211 Pulled By: ZolotukhinM fbshipit-source-id: c62ae466e1d9ba9b3020519aadaa2a7fe7942d84	2020-11-21 09:15:42 -08:00
Mikhail Zolotukhin	eb49dabe92	[TensorExpr] Add even more operator tests. (#48292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48292 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25113397 Pulled By: ZolotukhinM fbshipit-source-id: a8591006e1fb71b87d50c8a150739a9bca835928	2020-11-19 23:35:19 -08:00
Mikhail Zolotukhin	efd41db32c	[TensorExpr] Add more operator tests. (#48282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48282 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25108184 Pulled By: ZolotukhinM fbshipit-source-id: ba8cdf6253533210a92348f475b8b9400d8ecb1a	2020-11-19 23:29:11 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00
Bert Maher	6da26fe79b	[te] Fix pow (#48213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48213 it was completely broken unless rhs was a constant. Test Plan: new unit test in test_jit_fuser_te.py Reviewed By: eellison Differential Revision: D25071639 fbshipit-source-id: ef1010a9fd551db646b83adfaa961648a5c388ae	2020-11-18 22:44:16 -08:00
Nikita Shulga	06707a7ef8	Fix flake8 failure (#48124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48124 Reviewed By: walterddr Differential Revision: D25032696 Pulled By: malfet fbshipit-source-id: 2519d18de7417721d53f6404dc291fd8f7cc94fe	2020-11-17 13:48:08 -08:00
Bert Maher	736deefc1f	[torch][te] aten::type_as is unary, not binary (#48085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48085 We were treating it as a binary operator, which implies shape broadcasting, even though the second arg is thrown away aside from the type. Treating it as a unary is the proper approach. ghstack-source-id: 116873680 Test Plan: new unit test Reviewed By: ZolotukhinM Differential Revision: D25017585 fbshipit-source-id: 0cfa89683c9bfd4fbb132617c74b47b268d7f368	2020-11-17 12:17:19 -08:00
Bert Maher	bbee0ecbd1	[pytorch][te] Handle negative axis in chunk (#48084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48084 as title ghstack-source-id: 116870328 Test Plan: new unit test Reviewed By: Krovatkin Differential Revision: D25017489 fbshipit-source-id: 0d1998fccad6f509db04b6c67a4e4e4093d96751	2020-11-17 12:12:49 -08:00
Bert Maher	6b8d20c023	[pytorch][te] Don't start TE fusion groups with an unknown-typed result (#47884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47884 We need to know output types of everything in a fusion group to ensure that we generate correctly-typed tensors. We were incorrectly starting a fusion group with an unknown-typed output. Test Plan: New unit tests: ``` buck test //caffe2/test:jit //caffe2/test/cpp/tensorexpr:tensorexpr ``` Reviewed By: eellison Differential Revision: D24932786 fbshipit-source-id: 83978a951f32c1207bbc3555a7d3bd94fe4e70fb	2020-11-13 10:52:53 -08:00
Elias Ellison	664d2f48cf	[NNC] Enable unary op cpu testing (#47374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47374 A few small fixes needed to enable unary op cpu testing. If reviewers would prefer I split them up let me know. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805248 Pulled By: eellison fbshipit-source-id: c2cfe2e3319a633e64da3366e68f5bf21d390cb7	2020-11-12 11:14:03 -08:00
Elias Ellison	346a71d29c	[NNC] More cpu tests (#47372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47372 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805254 Pulled By: eellison fbshipit-source-id: b7e5ee044ef816e024b6fc5c4041fff5f2049bb3	2020-11-12 11:13:57 -08:00
Elias Ellison	450738441b	[NNC] Add more CPU Tests (#47371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47371 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805252 Pulled By: eellison fbshipit-source-id: 16472960d09f6c981adca2a45b2a4efb75a09d4f	2020-11-12 11:13:54 -08:00
Elias Ellison	e618bd858e	[NNC] Fix llvm min lowering for int inputs (#47370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47370 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805249 Pulled By: eellison fbshipit-source-id: e13d956899e8651600fab94dab04aa39ca427769	2020-11-12 11:13:50 -08:00
Elias Ellison	fe81faee5f	Add more CPU tests (#47369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47369 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805251 Pulled By: eellison fbshipit-source-id: f1a8210ffdc3cc88354cb4896652151d83a0345a	2020-11-12 11:13:47 -08:00
Elias Ellison	b8a1070ec0	[TensorExpr][CPU] Fix bool -> int casting (#46951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46951 If e.g. we're casting from torch.int -> torch.bool, previously we would just truncate from int32 -> i8. Since torch.bool has 8 bits but only uses one of them, we need to makes sure that one bit is set. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805253 Pulled By: eellison fbshipit-source-id: af3aa323f10820d189827eb51037adfa7d80fed9	2020-11-12 11:13:44 -08:00
Elias Ellison	ad5be26b2f	Small changes/cleanup (#46950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46950 Make sure that we're fusing in a fuse tests, and refactor to more concise API to check if fusions have happened. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805250 Pulled By: eellison fbshipit-source-id: f898008a64b74e761bb5fe85f91b3cdf2dbdf878	2020-11-12 11:13:38 -08:00
Elias Ellison	f221a19a7f	Force LLVM Compilation for CPU Tests (#46949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46949 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805247 Pulled By: eellison fbshipit-source-id: 4fcaf02d8a78cc5cbcbde36940d0a2c85fba3fc5	2020-11-12 11:12:08 -08:00
Bert Maher	c4892c8efe	[pytorch][tensorexpr] Promote integer arguments to sin/cos/tan to float (#46776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46776 Following numpy and (now) eager mode Fixes #46458 Test Plan: test_jit_fuser_te Reviewed By: navahgar Differential Revision: D24509884 fbshipit-source-id: c063030fc609ba4aefcd9abd25b50f082fef1548	2020-10-23 17:32:54 -07:00
kshitij12345	8e13fe6c44	[numpy] `torch.sin` : support and promote integer inputs to float (#45733 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 > Enable integer -> float unary type promotion for ops like sin Will follow-up for other such Ops once this PR is merged. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/45733 Reviewed By: zou3519 Differential Revision: D24431194 Pulled By: mruberry fbshipit-source-id: db600bc5de0e535b538d2aa301c3526b7c75ed17	2020-10-22 01:58:57 -07:00
Elias Ellison	1b97ffa07a	[1/3] [JIT] Make sure fusion occurs in test_tensorexpr file (#45788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45788 We were only running the traced graph once, which would not yet have been fused at that point. We should run for num_profiled_runs + 1, and also assert that all nodes in the graph were fused. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D24169537 Pulled By: eellison fbshipit-source-id: 8499bb1a5bd9d2221b1f1c54d6352558cf07ba9a	2020-10-08 12:02:57 -07:00
Nikolay Korovaiko	993628c74a	Build shape expressions and remove outputs that are only used by `aten::size`s (#45080 ) Summary: Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs. A simple example would be: ``` def test_fuse(a, b): c = a + b d = c + b return d ``` Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen. Without this optimization, TE would need to materialize `c` so we can get its size ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %86 : Tensor, %87 : Tensor = prim::If(%85) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4, %c.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %94 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95) [DUMP profiling_graph_executor_impl.cpp:499] -> (%96, %97) [DUMP profiling_graph_executor_impl.cpp:499] %60 : int[] = aten::size(%87) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %67 : int[] = aten::size(%86) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%86, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3, %c.3) ``` With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it. ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %91 : Tensor = prim::If(%90) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %97 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %98 : (Tensor) = prim::CallFunction(%97, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %99 : Tensor = prim::TupleUnpack(%98) [DUMP profiling_graph_executor_impl.cpp:499] -> (%99) [DUMP profiling_graph_executor_impl.cpp:499] %85 : int[] = aten::size(%91) [DUMP profiling_graph_executor_impl.cpp:499] %86 : int[] = prim::BroadcastSizes(%59, %62) [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%91, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080 Reviewed By: bertmaher Differential Revision: D23856410 Pulled By: Krovatkin fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1	2020-09-28 10:45:56 -07:00
Nick Gibson	d1d9017a66	[NNC] fix Half conversion of immediates in Cuda backend (#45213 ) Summary: The Cuda HalfChecker casts up all loads and stores of Half to Float, so we do math in Float on the device. It didn't cast up HalfImmediate (ie. constants) so they could insert mixed-size ops. Fix is to do that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45213 Reviewed By: ezyang Differential Revision: D23885287 Pulled By: nickgg fbshipit-source-id: 912991d85cc06ebb282625cfa5080d7525c8eba9	2020-09-25 10:53:36 -07:00
Alex Suhan	3dd0e362db	[TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984 ) Summary: For integral types, isnan is meaningless. Provide specializations for maximum and minimum which don't call it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops Reviewed By: ezyang Differential Revision: D23885259 Pulled By: asuhan fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437	2020-09-23 23:19:12 -07:00
Bert Maher	2d00ebd29f	Failing test demonstrating problems with mixed output shapes (#44455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44455 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23886119 Pulled By: bertmaher fbshipit-source-id: 41787930f154cf4e8a1766613c4cf33b18246555	2020-09-23 21:15:37 -07:00
Alex Suhan	0495998862	[TensorExpr] Disallow arithmetic binary operations on Bool (#44677 ) Summary: Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover, such semantics can be implemented by the client code through insertion of explicit casts to widen and narrow to the desired types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic python test/test_jit_fuser_te.py Reviewed By: agolynski Differential Revision: D23801412 Pulled By: asuhan fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36	2020-09-23 14:59:11 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Mikhail Zolotukhin	d66520ba08	[TensorExpr] Fuser: try merging adjacent fusion groups. (#43671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23360796 Pulled By: ZolotukhinM fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f	2020-09-15 21:31:02 -07:00
Akihiro Nitta	84949672bf	Fix exception chaining in `test/` (#44193 ) Summary: ## Motivation This PR fixes https://github.com/pytorch/pytorch/issues/43770 and is the continuation of https://github.com/pytorch/pytorch/issues/43836. ## Description of the change This PR fixes exception chaining only in files under `test/` where appropriate. To fix exception chaining, I used either: 1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information. 2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant. ## List of lines containing `raise` in `except` clause: I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause. - [x] `f8f35fddd4/test/test_cpp_extensions_aot.py (L16)` - [x] `f8f35fddd4/test/test_jit.py (L2503)` - [x] `f8f35fddd4/test/onnx/model_defs/word_language_model.py (L22)` - [x] `f8f35fddd4/test/onnx/verify.py (L73)` - [x] `f8f35fddd4/test/onnx/verify.py (L110)` - [x] `f8f35fddd4/test/onnx/test_verify.py (L31)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L255)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L2992)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L3025)` - [x] `f8f35fddd4/test/distributed/test_c10d.py (L3712)` - [x] `f8f35fddd4/test/distributed/test_distributed.py (L3180)` - [x] `f8f35fddd4/test/distributed/test_distributed.py (L3198)` - [x] `f8f35fddd4/test/distributed/test_data_parallel.py (L752)` - [x] `f8f35fddd4/test/distributed/test_data_parallel.py (L776)` - [x] `f8f35fddd4/test/test_type_hints.py (L151)` - [x] `f8f35fddd4/test/test_jit_fuser.py (L771)` - [x] `f8f35fddd4/test/test_jit_fuser.py (L773)` - [x] `f8f35fddd4/test/test_dispatch.py (L105)` - [x] `f8f35fddd4/test/test_distributions.py (L4738)` - [x] `f8f35fddd4/test/test_nn.py (L9824)` - [x] `f8f35fddd4/test/test_namedtensor.py (L843)` - [x] `f8f35fddd4/test/test_jit_fuser_te.py (L875)` - [x] `f8f35fddd4/test/test_jit_fuser_te.py (L877)` - [x] `f8f35fddd4/test/test_dataloader.py (L31)` - [x] `f8f35fddd4/test/test_dataloader.py (L43)` - [x] `f8f35fddd4/test/test_dataloader.py (L365)` - [x] `f8f35fddd4/test/test_dataloader.py (L391)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44193 Reviewed By: albanD Differential Revision: D23681529 Pulled By: malfet fbshipit-source-id: 7c2256ff17334625081137b35baeb816c1e53e0b	2020-09-14 14:20:16 -07:00
Bert Maher	350130a69d	Prevent the TE fuser from getting datatypes it can't handle (#44160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528508 Pulled By: bertmaher fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85	2020-09-09 11:10:04 -07:00
Bert Maher	960c088a58	[te] Fix casting of unsigned char, and abs(int) (#44157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528507 Pulled By: bertmaher fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9	2020-09-09 11:08:36 -07:00
Nikolay Korovaiko	f044b17ae2	Disable a test (#44348 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44348 Reviewed By: mrshenli Differential Revision: D23592524 Pulled By: Krovatkin fbshipit-source-id: 349057606ce39dd5de24314c9ba8f40516d2ae1c	2020-09-09 08:36:19 -07:00
Nick Gibson	be94dba429	[NNC] fix support for FP16 in CudaCodgen (#44209 ) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46	2020-09-08 18:00:39 -07:00
Nikolay Korovaiko	47ac9bb105	Enable temp disabled tests in test_jit_fuser_te.py (#44222 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44222 Reviewed By: izdeby Differential Revision: D23582214 Pulled By: Krovatkin fbshipit-source-id: 27caa3ea02ce10b163212f6a45a81b446898953d	2020-09-08 14:40:32 -07:00
Bert Maher	98ad5ff41f	[te] Disable reductions by default (#44122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23504769 Pulled By: bertmaher fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec	2020-09-03 23:37:45 -07:00
Bert Maher	55ff9aa185	Test TE fuser unary ops and fix sigmoid(half) (#44094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23494950 Pulled By: bertmaher fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de	2020-09-03 12:48:46 -07:00
Mikhail Zolotukhin	40fec4e739	[TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073 We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23487905 Pulled By: ZolotukhinM fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1	2020-09-02 22:59:04 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
Bert Maher	c14a3613a8	Fix NaN propagation in TE fuser's min/max implementation (#43609 ) Summary: Per eager mode source-of-truth, NaNs shall be propagated by min/max. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609 Reviewed By: ZolotukhinM Differential Revision: D23349184 Pulled By: bertmaher fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb	2020-09-01 02:10:13 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Alex Suhan	60ad7e9c04	[TensorExpr] Make sum available from Python (#43730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_sum test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ZolotukhinM Differential Revision: D23407600 Pulled By: asuhan fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c	2020-08-29 10:38:21 -07:00
Elias Ellison	a4cf4c2437	refactor tests (#43631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631 I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358810 Pulled By: eellison fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82	2020-08-27 14:35:33 -07:00
Mikhail Zolotukhin	3ec24f02af	[TensorExpr] Start using typecheck in the fuser. (#43173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173 With this change the fuser starts to generate typechecks for inputs of fusion group. For each fusion group we generate a typecheck and an if node: the true block contains the fused subgraph, the false block contains unoptimized original subgraph. Differential Revision: D23178230 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b	2020-08-25 18:13:32 -07:00
Yujun Zhao	e5adf45dde	Add python unittest target to `caffe2/test/TARGETS` (#42766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42766 Summary Some python tests are missing in `caffe2/test/TARGETS`, add them to be more comprehension. According to [run_test.py](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L125), some tests are slower. Slow tests are added as independent targets and others are put together into one `others` target. The reason is because we want to reduce overhead, especially for code covarge collection. Tests in one target can be run as a bundle, and then coverage can be collected together. Typically coverage collection procedure is time-expensive, so this helps us save time. Test Plan: Run all the new test targets locally in dev server and record the time they cost. Statistics ``` # jit target real 33m7.694s user 653m1.181s sys 58m14.160s --------- Compare to Initial Jit Target runtime: ---------------- real 32m13.057s user 613m52.843s sys 54m58.678s ``` ``` # others target real 9m2.920s user 164m21.927s sys 12m54.840s ``` ``` # serialization target real 4m21.090s user 23m33.501s sys 1m53.308s ``` ``` # tensorexpr real 11m28.187s user 33m36.420s sys 1m15.925s ``` ``` # type target real 3m36.197s user 51m47.912s sys 4m14.149s ``` Reviewed By: malfet Differential Revision: D22979219 fbshipit-source-id: 12a30839bb76a64871359bc024e4bff670c5ca8b	2020-08-10 09:48:59 -07:00
Nikolay Korovaiko	47c57e8804	rename TestFuser to TestTEFuser (#41542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41542 Reviewed By: jamesr66a Differential Revision: D22579606 Pulled By: Krovatkin fbshipit-source-id: f65b2cae996b42d55ef864bc0b424d9d43d8a2e2	2020-07-22 13:37:27 -07:00
Michael Suo	ca1b8ebbcb	move misc implementation out of `jit/__init__.py` (#41154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41154 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22445213 Pulled By: suo fbshipit-source-id: 200545715c5ef13beb1437f49e01efb21498ddb7	2020-07-13 16:59:55 -07:00
Jeff Daily	ac8c8b028d	[ROCm] restore jit tests (#40447 ) Summary: Remove `skipIfRocm` from most jit tests and enable `RUN_CUDA_HALF` tests for ROCm. These changes passed more than three rounds of CI testing against the ROCm CI. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40447 Differential Revision: D22190711 Pulled By: xw285cornell fbshipit-source-id: bac44825a2675d247b3abe2ec2f80420a95348a3	2020-06-27 01:03:59 -07:00
Nikolay Korovaiko	5036c94a6e	properly skip legacy tests regardless of the default executor (#40381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7	2020-06-26 11:13:50 -07:00
Wanchao Liang	27d789500b	[test] split tracer related tests out of test_jit (#40142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40142 test_jit is becoming huge again, which makes editor hard to load and write new tests, this split out the tracer related tests. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22085035 Pulled By: wanchaol fbshipit-source-id: 696bee84985ecfbfeac8e2ee5c27f1bdda8de394	2020-06-17 17:26:33 -07:00
Elias Ellison	daa85cfe2e	[JIT] Exit Transform Rewrite (#38282 ) Summary: After an early return, we conditionalize all further execution. This means that currently the pattern of `if return elif return elif return` generates better code than `if return if return if return`. It's obviously not good to have semantically equivalent code generate worse IR, so we should rewrite the graph to handle this case. This came up in https://github.com/pytorch/pytorch/pull/37171 ``` torch.jit.script def test_foo(x: bool, y: bool): if x: return 1 return 2 print(test_foo.code) ``` generates: ``` def test_foo(x: bool, y: bool) -> int: _0 = uninitialized(int) if x: _1, _2 = True, 1 else: _1, _2 = False, _0 if _1: _3 = _2 else: _3 = 2 return _3 ``` while ``` torch.jit.script def test_foo(x: bool, y: bool): if x: return 1 else: return 2 print(test_foo.code) ``` generates: ``` def test_foo(x: bool, y: bool) -> int: if x: _0 = 1 else: _0 = 2 return _0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38282 Differential Revision: D21576733 Pulled By: eellison fbshipit-source-id: 80cf1ad7fbda6d8d58557abbfb21c90eafae7488	2020-05-15 12:22:28 -07:00
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
Mikhail Zolotukhin	4784af1d78	[TensorExpr] Don't include aten::rand_like to TE fusion groups since we can't handle rand+broadcast case yet. (#38132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38132 Test Plan: Imported from OSS Reviewed By: resistor Differential Revision: D21479256 Pulled By: ZolotukhinM fbshipit-source-id: 2678cfd6ad2feea132efb5eec09e5f41bbd54487	2020-05-08 13:37:13 -07:00
Elias Ellison	0e3a05ec00	[JIT] rename enable_profiling_mode to enable_profiling_mode_for_profiling_tests (#37825 ) Summary: The existing contextmanager only conditionally enabled_profiling_mode, which was counter intuitive. When we changed the default executor it broke internal benchmarking as a result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37825 Differential Revision: D21404611 Pulled By: eellison fbshipit-source-id: 306b3c333ef4eb44ab6a6e5ab4e0682e5ce312ce	2020-05-06 11:30:02 -07:00
Nikolay Korovaiko	edc5ef1afb	run the simple executor for jit tests by default, add profiling jobs … (#37017 ) Summary: …for fusion tests fix flake8 warnings fix ci failures fix test_determination.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/37017 Differential Revision: D21238446 Pulled By: Krovatkin fbshipit-source-id: 393e6135883dc5ac57bdff580de96c66829d454c	2020-04-28 19:16:52 -07:00
Nikolay Korovaiko	a80a438e37	correctly set and restore states in te tests (#37210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37210 Differential Revision: D21238634 Pulled By: Krovatkin fbshipit-source-id: 6462239753399c10c871baa5d5fdff5465cf2544	2020-04-24 20:16:51 -07:00
Mikhail Zolotukhin	af5121f62a	Invoke TensorExpr fuser pass from a graph executor. (#35913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913 The pass itself is still disabled by default, but with this change we don't need to register it as a custom pass anymore. It allows us to control its behavior with env variables more easily. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D20827189 Pulled By: ZolotukhinM fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc	2020-04-03 12:20:26 -07:00
Christian Sarofeen	6d24f8fe21	Infrastructure for a new CUDA Fuser (#34785 ) Summary: Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. Short term goals: Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout Mid-term goals: - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63	2020-04-02 09:22:42 -07:00
Bram Wasti	a3e10d2a17	Expose enablement of TensorExpr fuser as env variable (#35341 ) Summary: This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/ ``` PYTORCH_TENSOREXPR=1 python benchmark.py ``` This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser" Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341 Reviewed By: ZolotukhinM Differential Revision: D20676348 Pulled By: bwasti fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464	2020-03-26 14:31:57 -07:00
Johannes M Dieterich	d807292c4a	[ROCm] Hotfix disable tests (#35396 ) Summary: Regressions introduced sometime these last days - disable for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35396 Differential Revision: D20656744 Pulled By: xw285cornell fbshipit-source-id: 386e4e5d50fb81a1d44e8f3558b81cb69299fe92	2020-03-26 00:21:40 -07:00
Mikhail Zolotukhin	6bcf0b407b	[TensorExpr] Disable fuser-te cuda tests when run on ROCm. (#35388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35388 Test Plan: Imported from OSS Differential Revision: D20648735 Pulled By: ZolotukhinM fbshipit-source-id: 27bd776fbb84ec81034ace4b874522413d9e5643	2020-03-25 16:04:15 -07:00
Mikhail Zolotukhin	12f0052eee	Add TensorExpr Fuser tests (resubmit). (#35085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35085 Test Plan: Imported from OSS Differential Revision: D20552334 Pulled By: ZolotukhinM fbshipit-source-id: 628fcf4719a879f18978ff8a0a64afbb045df645	2020-03-20 13:19:31 -07:00
Natalia Gimelshein	3c90a90730	Revert D20540599: Add TensorExpr Fuser tests. Test Plan: revert-hammer Differential Revision: D20540599 Original commit changeset: ced9b6657fe7 fbshipit-source-id: e8fa11f20207c35f39b3fbe6f45fc627715377c1	2020-03-19 18:37:32 -07:00
Mikhail Zolotukhin	7b59f41009	Add TensorExpr Fuser tests. (#35052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35052 Differential Revision: D20540599 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ced9b6657fe72bca61833ab5d59bdaddcacd114b	2020-03-19 14:31:54 -07:00

1 2 3 4 5

220 Commits