pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bert Maher	38b6cb479b	Check fuser results when profiling (#33944 ) Summary: With the profiling executor enabled the fuser won't be invoked until the second pass over a script function, so some of these tests weren't correctly comparing the fused output with the interpreter output. I've used the `checkScript` method where applicable, which seems to do the right thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33944 Test Plan: Locally inject obvious errors into the fuser and verify that the updated tests fail when they're supposed to. Differential Revision: D20162320 Pulled By: bertmaher fbshipit-source-id: 4a2f3f2d2ff1d81f23db504dc8cd0d5417bdcc50	2020-02-28 17:01:34 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Nikolay Korovaiko	02c3493a84	Fix an invalid peephole transformation if input/output values are written to (#28455 ) Summary: fixes https://github.com/pytorch/pytorch/issues/28360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28455 Differential Revision: D19374601 Pulled By: Krovatkin fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad	2020-01-14 16:28:07 -08:00
Zachary DeVito	cc8d6342fc	make profiling take no_grad flags into account (#31071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31071 Previously the profiler would think Tensors would require grad, even when the no_grad flag is enabled during execution. This makes the profiling and guards respect the no_grad flag, which eliminates extra differentiable graphs that appear in the backward graph (where no_grad is typically enabled). Test Plan: Imported from OSS Differential Revision: D18915468 Pulled By: zdevito fbshipit-source-id: 1ae816a16ab78ae5352825cc6b4a68ed7681a089	2019-12-17 13:22:16 -08:00
Nikolay Korovaiko	5b702ab52b	switching to a simple/full executor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29230 Differential Revision: D18402229 Pulled By: Krovatkin fbshipit-source-id: 62f4bc9bc89c0c7369359bba1359c22a2fa80f46	2019-11-11 13:41:35 -08:00
Nikolay Korovaiko	47faee2fae	Switching tests to ProfilingExecutor (rebased) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28535 Differential Revision: D18197932 Pulled By: Krovatkin fbshipit-source-id: 2639b205e899f800787ee57c157447d54e4669c3	2019-10-29 11:41:42 -07:00
Nikolay Korovaiko	db5791d543	autodiff changes to enable profiling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397 Differential Revision: D17565747 Pulled By: Krovatkin fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed	2019-09-25 10:11:44 -07:00
peter	2ce8c83f67	Enable CPU fused kernel on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25578 Differential Revision: D17397156 Pulled By: ezyang fbshipit-source-id: b243528c2bfd5a0d401897833048429e67fe40ef	2019-09-17 07:29:40 -07:00
J M Dieterich	5376ee51fd	Enable more mGPU tests (#26055 ) Summary: Enable mGPU tests that pass on ROCm as of 2.7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055 Differential Revision: D17331484 Pulled By: bddppq fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c	2019-09-11 17:54:35 -07:00
J M Dieterich	00d967c39d	enable unit tests (#25963 ) Summary: These unit tests pass after landing all the warp size awareness patches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25963 Differential Revision: D17319124 Pulled By: bddppq fbshipit-source-id: 22f5d5f1ca9c67e66a7ccf983b2d2f889a74e729	2019-09-11 12:31:43 -07:00
Johannes M Dieterich	9c5a899773	Enable jit fusion on ROCm (#22872 ) Summary: As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails * new hipification rules for API_RTC * add hiprtc APIs to the shim loader * update cmake infrastructure to find the hiprtc library (it is part of the HIP package) * enabling of unit tests in the jit_fuser test set * special casing in resource strings for HIP - the typedefs CUDA requires would be redundant * for now disable the occupancy calculation we do not support yet and hard-code Thanks to t-vi for working with me on getting this integration done! Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872 Differential Revision: D17207425 Pulled By: bddppq fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe	2019-09-05 18:22:08 -07:00
Michael Suo	755f91b400	serializing function calls (#23799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799 Before, we inlined as part of the initial IR generation process, which has a few disadvantages: 1. It loses information about what nodes came from which function/method calls. Other parties who want to implement transformations on the function/module level don't have a reliable way of doing so. 2. It duplicates a ton of code if we are inlining the same function/method a tons of times. After this PR: inline is deferred to the optimization stage, so optimizations that rely on inlining will still work. But things get serialized with the function/method calls in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799 Differential Revision: D16652819 Test Plan: Imported from OSS Reviewed By: jamesr66a Pulled By: suo fbshipit-source-id: a11af82aec796487586f81f5a9102fefb6c246db	2019-08-19 18:42:43 -07:00
Mike Ruberry	c21a774076	Moves clamp from autodiff cpp to symbolic script (#23927 ) Summary: This PR: - Moves clamp from autodiff cpp to symbolic script - Adds an additional tuple lowering pass to the graph executor - Updates clamp backwards to be maximally gradient preserving Moving clamp to symbolic script presented two challenges: - When the backward graph is defined the branch taken in the conditional is known, but communicating this information to the Jit is a little tricky. It turns out the Jit has a quirk where variables that can be None at the time of graph instantiation are treated as constants, so testing min and max against None lets the Jit instantiate only one path branch. It might be more natural to select different backward functions for these cases, but that is not yet supported. - Moving clamp to symbolic script introduced an extra tuple construction and immediate unpacking which prevented fusion. This was dealt with by adding an additional tuple removal pass. This issue could appear whenever a symbolic script's return value was defined in an if statement, which made the Jit see the unpacked tuple as being constructed from an if, not a TupleConstruct. The graph is later optimized but tuple lowering was not performed again after these optimizations. Moving clamp to symbolic script also adds some explicit conversions to float in graphs which it appears, but these seem harmless. If clamp were simply moved to symbolic script then its backward graphs would look like this: `graph(%0 : Float(, ), %1 : AutogradZeroTensor, %2 : Float(, ), %3 : int[]?, %4 : Scalar?, %5 : int): %6 : None = prim::Constant() # <string>:5:31 %7 : float = aten::Float(%5) # <string>:12:37 %8 : Float(, ) = prim::FusionGroup_0(%0, %2, %7) %9 : (Float(, ), None, None) = prim::TupleConstruct(%8, %6, %6) %10 : Float(, ), %11 : None, %12 : None = prim::TupleUnpack(%9) return (%10) with prim::FusionGroup_0 = graph(%0 : Float(, ), %1 : Float(, ), %2 : float): %3 : Bool(, ) = aten::le(%1, %2) # <string>:12:29 %mask.5 : Float(, ) = aten::type_as(%3, %1) # <string>:12:29 %5 : Float(, ) = aten::mul(%0, %mask.5) # <string>:13:28 return (%5)` And adding the additional pass to remove tuples eliminates the prim::TupleConstruct and prim::TupleUnpack. Keeping these included previously would cause test_fuser_iou to fail because multiple fusion groups would be created. Since https://github.com/pytorch/pytorch/issues/23372 this test is disabled, however. When enabled the relevant portion of its graph is now: `%59 : float = aten::Float(%26) # <string>:314:38 %60 : float = aten::Float(%27) # <string>:314:61 %61 : int[] = aten::size(%14) # <string>:41:99 %62 : int[] = aten::size(%11) # <string>:42:100 %63 : int[] = aten::size(%15) # <string>:41:99 %64 : int[] = aten::size(%12) # <string>:42:100 %65 : Tensor, %66 : Tensor, %67 : Tensor, %68 : Tensor, %69 : Tensor, %70 : Tensor, %71 : Tensor, %72 : Tensor, %73 : Double(, ) = prim::FusionGroup_0(%w.1, %13, %16, %23, %h.1, %54, %inter.1, %0, %12, %15, %18, %17, %29, %11, %14, %60, %59) %74 : Tensor = aten::_grad_sum_to_size(%73, %53) %75 : Tensor = aten::_grad_sum_to_size(%73, %52) %grad_self.10 : Tensor = aten::_grad_sum_to_size(%65, %61) # <string>:41:30 %grad_other.10 : Tensor = aten::_grad_sum_to_size(%66, %62) # <string>:42:31 %78 : Tensor = prim::FusionGroup_1(%grad_self.10, %74, %36) %79 : Tensor = prim::FusionGroup_2(%grad_other.10, %75, %44) %grad_self.14 : Tensor = aten::_grad_sum_to_size(%67, %21) # <string>:33:30 %grad_other.14 : Tensor = aten::_grad_sum_to_size(%68, %22) # <string>:34:31 %grad_self.12 : Tensor = aten::_grad_sum_to_size(%69, %63) # <string>:41:30 %grad_other.12 : Tensor = aten::_grad_sum_to_size(%70, %64) # <string>:42:31 %grad_self.16 : Tensor = aten::_grad_sum_to_size(%71, %19) # <string>:33:30 %grad_other.16 : Tensor = aten::_grad_sum_to_size(%72, %20) # <string>:34:31 %86 : Tensor, %87 : Tensor = prim::FusionGroup_3(%grad_self.12, %grad_self.16, %74, %39) %88 : Tensor, %89 : Tensor = prim::FusionGroup_4(%grad_other.12, %grad_other.16, %75, %47) return (%79, %88, %89, %78, %86, %87, %grad_self.14, %grad_other.14)` Which I think is expected/desired. Finally, this implementation of clamp backwards is "maximally gradient preserving," which simply means that elements on the boundary now receive gradients. For example, if an element of a tensor is 5 and the clamp is to [2, 5], then that element will now receive a gradient. The prior implementation would zero these gradients. See https://github.com/pytorch/pytorch/issues/7002 for a discussion on preserving gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23927 Test Plan: Existing tests provided sufficient coverage. Differential Revision: D16739740 Pulled By: mruberry fbshipit-source-id: c94291d20e1f3f25197afc7b74dc61aeb204b074	2019-08-09 13:57:03 -07:00
Thomas Viehmann	cf50249bde	Disable fusion of grad_sum_to_size (#23372 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/22833 grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add. Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you! About the choice of removing the fusion completely instead of being more precise: - We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed. - There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic. - Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes. - The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372 Differential Revision: D16489930 Pulled By: soumith fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4	2019-07-25 08:55:33 -07:00
Michael Suo	711be82951	Make optimize a thread_local flag Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170 Test Plan: Imported from OSS Differential Revision: D16441912 Pulled By: suo fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55	2019-07-24 23:09:21 -07:00
Michael Suo	b6a88b3344	Make traced fns also go into the global python CU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901 Test Plan: Imported from OSS Differential Revision: D16278160 Pulled By: suo fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a	2019-07-16 12:04:16 -07:00
Wanchao Liang	02bc06a683	avoid kernel launches for zero-sized tensor inputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790 Test Plan: Imported from OSS Differential Revision: D16226168 Pulled By: wanchaol fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc	2019-07-12 12:24:52 -07:00
Gavriel State	b2197ef2b0	Adding support for JIT Fusion on Windows for CUDA (#21861 ) Summary: This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861 Differential Revision: D15940939 Pulled By: soumith fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1	2019-06-21 09:44:17 -07:00
Brian Vaughan	7284f448ba	Fix handling of kwargs from common method invocations (#21499 ) Summary: When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't. Also fixed some flake8 warnings I was seeing locally. I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree? Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499 Differential Revision: D15826974 Pulled By: nairbv fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36	2019-06-14 10:47:33 -07:00
Thomas Viehmann	17941f9979	JIT: Eliminate SumToSize by using Optional Lists (#18697 ) Summary: This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion. It consists of two parts: - In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None. - The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization step. Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward. I'm testing that different broadcasting situations lead to different graphs. I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697 Differential Revision: D15482076 Pulled By: wanchaol fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb	2019-05-24 11:24:17 -07:00
Michael Suo	62af37aa88	dropout symbolic_script should respect the training flag (#20760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760 ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0 Reviewed By: jamesr66a Differential Revision: D15486511 Pulled By: suo fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0	2019-05-23 18:17:17 -07:00
Wanchao Liang	871c9dcb1d	move batchnorm and layernorm fusion to decompose (#20337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337 ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4 Differential Revision: D15448596 Pulled By: wanchaol fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c	2019-05-22 18:01:27 -07:00
Natalia Gimelshein	da3e74b21c	define use_cuda in dropout backward to allow peephole optimization to… (#20289 ) Summary: … work Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289 Differential Revision: D15350262 Pulled By: wanchaol fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4	2019-05-15 11:36:06 -07:00
Zachary DeVito	87a6974193	Make it possible for self.forward to return a ScriptMethod (#19217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217 ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083 Reviewed By: suo Differential Revision: D14922120 Pulled By: zdevito fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e	2019-04-24 11:14:34 -07:00
James Reed	5be4bee4ff	Don't create FusionGroups for known-CPU producer values (#19342 ) Summary: I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342 Differential Revision: D15038351 Pulled By: jamesr66a fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db	2019-04-22 16:57:18 -07:00
Wanchao Liang	a3d3008e73	JIT Layernorm fusion (#18266 ) Summary: Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266 Differential Revision: D14879877 Pulled By: wanchaol fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe	2019-04-12 14:38:31 -07:00
Wanchao Liang	aa017db59c	make test_jit_fuser runnable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036 Differential Revision: D14839800 Pulled By: wanchaol fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b	2019-04-09 12:36:25 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Elias Ellison	a5ddecd00c	Move fuser to test_jit_fuser (#18590 ) Summary: Start of breaking up test_jit.py New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports. I am adding a test that's expected to fail to be sure it's running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590 Reviewed By: wanchaol Differential Revision: D14677094 Pulled By: eellison fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522	2019-03-29 18:13:26 -07:00

29 Commits