Commit Graph

63 Commits

Author SHA1 Message Date
Wanchao Liang
02bc06a683 avoid kernel launches for zero-sized tensor inputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790

Test Plan: Imported from OSS

Differential Revision: D16226168

Pulled By: wanchaol

fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc
2019-07-12 12:24:52 -07:00
Gavriel State
b2197ef2b0 Adding support for JIT Fusion on Windows for CUDA (#21861)
Summary:
This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861

Differential Revision: D15940939

Pulled By: soumith

fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1
2019-06-21 09:44:17 -07:00
Brian Vaughan
7284f448ba Fix handling of kwargs from common method invocations (#21499)
Summary:
When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't.

Also fixed some flake8 warnings I was seeing locally.

I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499

Differential Revision: D15826974

Pulled By: nairbv

fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36
2019-06-14 10:47:33 -07:00
Thomas Viehmann
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
Michael Suo
62af37aa88 dropout symbolic_script should respect the training flag (#20760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760
ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0

Reviewed By: jamesr66a

Differential Revision: D15486511

Pulled By: suo

fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0
2019-05-23 18:17:17 -07:00
Wanchao Liang
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
Natalia Gimelshein
da3e74b21c define use_cuda in dropout backward to allow peephole optimization to… (#20289)
Summary:
… work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289

Differential Revision: D15350262

Pulled By: wanchaol

fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4
2019-05-15 11:36:06 -07:00
Zachary DeVito
87a6974193 Make it possible for self.forward to return a ScriptMethod (#19217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217
ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083

Reviewed By: suo

Differential Revision: D14922120

Pulled By: zdevito

fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e
2019-04-24 11:14:34 -07:00
James Reed
5be4bee4ff Don't create FusionGroups for known-CPU producer values (#19342)
Summary:
I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342

Differential Revision: D15038351

Pulled By: jamesr66a

fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db
2019-04-22 16:57:18 -07:00
Wanchao Liang
a3d3008e73 JIT Layernorm fusion (#18266)
Summary:
Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266

Differential Revision: D14879877

Pulled By: wanchaol

fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe
2019-04-12 14:38:31 -07:00
Wanchao Liang
aa017db59c make test_jit_fuser runnable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036

Differential Revision: D14839800

Pulled By: wanchaol

fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b
2019-04-09 12:36:25 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Elias Ellison
a5ddecd00c Move fuser to test_jit_fuser (#18590)
Summary:
Start of breaking up test_jit.py

New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports.

I am adding a test that's expected to fail to be sure it's running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590

Reviewed By: wanchaol

Differential Revision: D14677094

Pulled By: eellison

fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522
2019-03-29 18:13:26 -07:00