pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Wanchao Liang	02bc06a683	avoid kernel launches for zero-sized tensor inputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790 Test Plan: Imported from OSS Differential Revision: D16226168 Pulled By: wanchaol fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc	2019-07-12 12:24:52 -07:00
Gavriel State	b2197ef2b0	Adding support for JIT Fusion on Windows for CUDA (#21861 ) Summary: This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861 Differential Revision: D15940939 Pulled By: soumith fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1	2019-06-21 09:44:17 -07:00
Brian Vaughan	7284f448ba	Fix handling of kwargs from common method invocations (#21499 ) Summary: When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't. Also fixed some flake8 warnings I was seeing locally. I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree? Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499 Differential Revision: D15826974 Pulled By: nairbv fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36	2019-06-14 10:47:33 -07:00
Thomas Viehmann	17941f9979	JIT: Eliminate SumToSize by using Optional Lists (#18697 ) Summary: This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion. It consists of two parts: - In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None. - The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization step. Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward. I'm testing that different broadcasting situations lead to different graphs. I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697 Differential Revision: D15482076 Pulled By: wanchaol fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb	2019-05-24 11:24:17 -07:00
Michael Suo	62af37aa88	dropout symbolic_script should respect the training flag (#20760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760 ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0 Reviewed By: jamesr66a Differential Revision: D15486511 Pulled By: suo fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0	2019-05-23 18:17:17 -07:00
Wanchao Liang	871c9dcb1d	move batchnorm and layernorm fusion to decompose (#20337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337 ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4 Differential Revision: D15448596 Pulled By: wanchaol fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c	2019-05-22 18:01:27 -07:00
Natalia Gimelshein	da3e74b21c	define use_cuda in dropout backward to allow peephole optimization to… (#20289 ) Summary: … work Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289 Differential Revision: D15350262 Pulled By: wanchaol fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4	2019-05-15 11:36:06 -07:00
Zachary DeVito	87a6974193	Make it possible for self.forward to return a ScriptMethod (#19217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217 ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083 Reviewed By: suo Differential Revision: D14922120 Pulled By: zdevito fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e	2019-04-24 11:14:34 -07:00
James Reed	5be4bee4ff	Don't create FusionGroups for known-CPU producer values (#19342 ) Summary: I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342 Differential Revision: D15038351 Pulled By: jamesr66a fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db	2019-04-22 16:57:18 -07:00
Wanchao Liang	a3d3008e73	JIT Layernorm fusion (#18266 ) Summary: Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266 Differential Revision: D14879877 Pulled By: wanchaol fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe	2019-04-12 14:38:31 -07:00
Wanchao Liang	aa017db59c	make test_jit_fuser runnable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036 Differential Revision: D14839800 Pulled By: wanchaol fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b	2019-04-09 12:36:25 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Elias Ellison	a5ddecd00c	Move fuser to test_jit_fuser (#18590 ) Summary: Start of breaking up test_jit.py New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports. I am adding a test that's expected to fail to be sure it's running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590 Reviewed By: wanchaol Differential Revision: D14677094 Pulled By: eellison fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522	2019-03-29 18:13:26 -07:00

1 2

63 Commits