Commit Graph

53 Commits

Author SHA1 Message Date
Nikolay Korovaiko
aea4e61ec3 skip test_jit_legacy (#68129)
Summary:
disables failing tests in [https://github.com/pytorch/pytorch/issues/66429](https://github.com/pytorch/pytorch/issues/67646)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68129

Reviewed By: suo, janeyx99

Differential Revision: D32326118

Pulled By: Krovatkin

fbshipit-source-id: ca00d2214503f418be45dc756057b990fb6e6370
2021-11-10 23:08:32 -08:00
Jane Xu
09c7771e9c Set test owners for jit tests (#66808)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808

Reviewed By: mrshenli

Differential Revision: D31761414

Pulled By: janeyx99

fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d
2021-10-25 07:51:10 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Nikita Shulga
add49e7e4e Enforce PEP263 for PyTorch python codebase (#55346)
Summary:
All python files containing non-ASCII characters should be correctly annotated with `# -*- coding: utf-8 -*-` comment

Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346

Reviewed By: samestep

Differential Revision: D27582044

Pulled By: malfet

fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44
2021-04-06 18:31:38 -07:00
Bert Maher
9db4802184 [fuser] Support bfloat16 (#54571)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54571

Supports bfloat16 via a similar method to half: upconvert inputs to
fp32, do math, then downconvert outputs to bf16.

Resource strings are mostly derived from cuda-11 headers.

Fixes #53918, for the legacy fuser at least.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27328987

Pulled By: bertmaher

fbshipit-source-id: 5c0eae44164623faa0c75cb818e8bf0211579fdc
2021-03-25 15:59:15 -07:00
Richard Barnes
a4383a69d4 Clean up some type annotations in caffe2/test (#49943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49943

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25717534

fbshipit-source-id: 5aedea4db07efca126ffb6daee79617c30a67146
2021-01-13 10:01:55 -08:00
Chester Liu
9d8bd216f9 Use Unicode friendly API in fused kernel related code (#49781)
Summary:
See https://github.com/pytorch/pytorch/issues/47422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49781

Reviewed By: gchanan

Differential Revision: D25847993

Pulled By: ezyang

fbshipit-source-id: e683a8d5841885857ea3037ac801432a1a3eda68
2021-01-10 20:03:00 -08:00
Gao, Xiang
3f5eee666c Adjust TF32 tests (#44240)
Summary:
- The thresholds of some tests are bumped up. Depending on the random generator, sometimes these tests fail with things like 0.0059 is not smaller than 0.005. I ran `test_nn.py` and `test_torch.py` for 10+ times to check these are no longer flaky.
- Add `tf32_on_and_off` to new `matrix_exp` tests.
- Disable TF32 on test suites other than `test_nn.py` and `test_torch.py`

cc: ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44240

Reviewed By: mruberry

Differential Revision: D23882498

Pulled By: ngimel

fbshipit-source-id: 44a9ec08802c93a2efaf4e01d7487222478b6df8
2020-09-24 10:25:58 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Akihiro Nitta
84949672bf Fix exception chaining in test/ (#44193)
Summary:
## Motivation
This PR fixes https://github.com/pytorch/pytorch/issues/43770 and is the continuation of https://github.com/pytorch/pytorch/issues/43836.

## Description of the change
This PR fixes exception chaining only in files under `test/` where appropriate.
To fix exception chaining, I used either:
1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information.
2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant.

## List of lines containing `raise` in `except` clause:
I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause.

- [x] f8f35fddd4/test/test_cpp_extensions_aot.py (L16)
- [x] f8f35fddd4/test/test_jit.py (L2503)
- [x] f8f35fddd4/test/onnx/model_defs/word_language_model.py (L22)
- [x] f8f35fddd4/test/onnx/verify.py (L73)
- [x] f8f35fddd4/test/onnx/verify.py (L110)
- [x] f8f35fddd4/test/onnx/test_verify.py (L31)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L255)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L2992)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3025)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3712)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3180)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3198)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L752)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L776)
- [x] f8f35fddd4/test/test_type_hints.py (L151)
- [x] f8f35fddd4/test/test_jit_fuser.py (L771)
- [x] f8f35fddd4/test/test_jit_fuser.py (L773)
- [x] f8f35fddd4/test/test_dispatch.py (L105)
- [x] f8f35fddd4/test/test_distributions.py (L4738)
- [x] f8f35fddd4/test/test_nn.py (L9824)
- [x] f8f35fddd4/test/test_namedtensor.py (L843)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L875)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L877)
- [x] f8f35fddd4/test/test_dataloader.py (L31)
- [x] f8f35fddd4/test/test_dataloader.py (L43)
- [x] f8f35fddd4/test/test_dataloader.py (L365)
- [x] f8f35fddd4/test/test_dataloader.py (L391)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44193

Reviewed By: albanD

Differential Revision: D23681529

Pulled By: malfet

fbshipit-source-id: 7c2256ff17334625081137b35baeb816c1e53e0b
2020-09-14 14:20:16 -07:00
Elias Ellison
a4cf4c2437 refactor tests (#43631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631

I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358810

Pulled By: eellison

fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82
2020-08-27 14:35:33 -07:00
Bert Maher
0bf27d64f4 Fix NaN propagation in fuser's min/max implementation (#43590)
Summary:
fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590

Reviewed By: mruberry

Differential Revision: D23338664

Pulled By: bertmaher

fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0
2020-08-26 17:31:06 -07:00
Michael Suo
ca1b8ebbcb move misc implementation out of jit/__init__.py (#41154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41154

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D22445213

Pulled By: suo

fbshipit-source-id: 200545715c5ef13beb1437f49e01efb21498ddb7
2020-07-13 16:59:55 -07:00
Wanchao Liang
27d789500b [test] split tracer related tests out of test_jit (#40142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40142

test_jit is becoming huge again, which makes editor hard to load and
write new tests, this split out the tracer related tests.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D22085035

Pulled By: wanchaol

fbshipit-source-id: 696bee84985ecfbfeac8e2ee5c27f1bdda8de394
2020-06-17 17:26:33 -07:00
Elias Ellison
daa85cfe2e [JIT] Exit Transform Rewrite (#38282)
Summary:
After an early return, we conditionalize all further execution. This means that currently the pattern of
`if return elif return elif return` generates better code than `if return if return if return`. It's obviously not good to have semantically equivalent code generate worse IR, so we should rewrite the graph to handle this case. This came up in https://github.com/pytorch/pytorch/pull/37171

```
torch.jit.script
def test_foo(x: bool, y: bool):
    if x:
        return 1
    return 2
print(test_foo.code)
```
generates:
```
def test_foo(x: bool,
    y: bool) -> int:
  _0 = uninitialized(int)
  if x:
    _1, _2 = True, 1
  else:
    _1, _2 = False, _0
  if _1:
    _3 = _2
  else:
    _3 = 2
  return _3
```
while
```
torch.jit.script
def test_foo(x: bool, y: bool):
    if x:
        return 1
    else:
        return 2
print(test_foo.code)
```
generates:
```
def test_foo(x: bool,
    y: bool) -> int:
  if x:
    _0 = 1
  else:
    _0 = 2
  return _0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38282

Differential Revision: D21576733

Pulled By: eellison

fbshipit-source-id: 80cf1ad7fbda6d8d58557abbfb21c90eafae7488
2020-05-15 12:22:28 -07:00
Elias Ellison
0e3a05ec00 [JIT] rename enable_profiling_mode to enable_profiling_mode_for_profiling_tests (#37825)
Summary:
The existing contextmanager only conditionally enabled_profiling_mode, which was counter intuitive. When we changed the default executor it broke internal benchmarking as a result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37825

Differential Revision: D21404611

Pulled By: eellison

fbshipit-source-id: 306b3c333ef4eb44ab6a6e5ab4e0682e5ce312ce
2020-05-06 11:30:02 -07:00
Nikolay Korovaiko
6ecb5bb1f0 match old fuser rem to eager (#37196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37196

Reviewed By: zdevito

Differential Revision: D21223172

Pulled By: Krovatkin

fbshipit-source-id: 4d4ff1127d5dc69ab73f07ca79c1f5b0b4dd9732
2020-05-01 10:55:06 -07:00
Brian Vaughan
54ed6fd3ee Use both absolute and relative tolerance in testing (#34258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34258

This PR allows both atol and rtol to be specified, uses defaults based on the prior analysis (spreadsheet attached to https://github.com/pytorch/pytorch/pull/32538), but retains the absolute tolerance behavior in cases where precision was previously specified explicitly.

Test Plan: Imported from OSS

Differential Revision: D21110255

Pulled By: nairbv

fbshipit-source-id: 57b3a004c7d5ac1be80ee765f03668b1b13f4a7e
2020-04-19 06:16:49 -07:00
Nikolay Korovaiko
b5483b8286 [pytorch][PR] Re-enable a failing test (#36763)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36763

Differential Revision: D21083309

Pulled By: Krovatkin

fbshipit-source-id: 4fb5b95bd3e01bd83a406d4394f266d7fd168f21
2020-04-16 22:21:47 -07:00
Sebastian Messmer
e9b4580411 Revert D20839674: [pytorch][PR] Re-enable a failing test
Test Plan: revert-hammer

Differential Revision:
D20839674

Original commit changeset: 68f41610a823

fbshipit-source-id: b69ccfd49bbde566870fa53cd3fe2931721db4ea
2020-04-16 15:26:34 -07:00
Nikolay Korovaiko
487dc0f961 Re-enable a failing test (#35847)
Summary:
This test was failing because caching resulted into a function with multiple execution plans rather than multiple functions with a single execution plan each as a test writer intended.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35847

Differential Revision: D20839674

Pulled By: Krovatkin

fbshipit-source-id: 68f41610a823d94c1e744c85ac72652c741d73ae
2020-04-16 11:46:02 -07:00
Nick Korovaiko
76d5102587 add a cuda/fuser job for legacy graph executor (#35419)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35419

Differential Revision: D20719013

Pulled By: Krovatkin

fbshipit-source-id: 745d9523a5a9b7b4b556a075351ea58a82501dff
2020-03-28 12:11:18 -07:00
Mikhail Zolotukhin
eef17edaa3 Fix warnings in test/test_jit_fuser.py (#34980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34980

We were passing sample inputs to `torch.jit.script` (as if it was
`torch.jit.trace`), but this parameter was treated as an optional
`optimize` parameter. That parameter is deprecated and that caused a
warning.

Differential Revision: D20520369

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 87b40a5e35bfc4a3d7a5d95494632bfe117e40b7
2020-03-18 19:55:25 -07:00
Bert Maher
38b6cb479b Check fuser results when profiling (#33944)
Summary:
With the profiling executor enabled the fuser won't be invoked until the second pass over a script function, so some of these tests weren't correctly comparing the fused output with the interpreter output.  I've used the `checkScript` method where applicable, which seems to do the right thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33944

Test Plan: Locally inject obvious errors into the fuser and verify that the updated tests fail when they're supposed to.

Differential Revision: D20162320

Pulled By: bertmaher

fbshipit-source-id: 4a2f3f2d2ff1d81f23db504dc8cd0d5417bdcc50
2020-02-28 17:01:34 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Nikolay Korovaiko
02c3493a84 Fix an invalid peephole transformation if input/output values are written to (#28455)
Summary:
fixes https://github.com/pytorch/pytorch/issues/28360
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28455

Differential Revision: D19374601

Pulled By: Krovatkin

fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad
2020-01-14 16:28:07 -08:00
Zachary DeVito
cc8d6342fc make profiling take no_grad flags into account (#31071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31071

Previously the profiler would think Tensors would require grad, even
when the no_grad flag is enabled during execution. This makes the profiling
and guards respect the no_grad flag, which eliminates extra differentiable
graphs that appear in the backward graph (where no_grad is typically enabled).

Test Plan: Imported from OSS

Differential Revision: D18915468

Pulled By: zdevito

fbshipit-source-id: 1ae816a16ab78ae5352825cc6b4a68ed7681a089
2019-12-17 13:22:16 -08:00
Nikolay Korovaiko
5b702ab52b switching to a simple/full executor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29230

Differential Revision: D18402229

Pulled By: Krovatkin

fbshipit-source-id: 62f4bc9bc89c0c7369359bba1359c22a2fa80f46
2019-11-11 13:41:35 -08:00
Nikolay Korovaiko
47faee2fae Switching tests to ProfilingExecutor (rebased)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28535

Differential Revision: D18197932

Pulled By: Krovatkin

fbshipit-source-id: 2639b205e899f800787ee57c157447d54e4669c3
2019-10-29 11:41:42 -07:00
Nikolay Korovaiko
db5791d543 autodiff changes to enable profiling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397

Differential Revision: D17565747

Pulled By: Krovatkin

fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed
2019-09-25 10:11:44 -07:00
peter
2ce8c83f67 Enable CPU fused kernel on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25578

Differential Revision: D17397156

Pulled By: ezyang

fbshipit-source-id: b243528c2bfd5a0d401897833048429e67fe40ef
2019-09-17 07:29:40 -07:00
J M Dieterich
5376ee51fd Enable more mGPU tests (#26055)
Summary:
Enable mGPU tests that pass on ROCm as of 2.7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055

Differential Revision: D17331484

Pulled By: bddppq

fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c
2019-09-11 17:54:35 -07:00
J M Dieterich
00d967c39d enable unit tests (#25963)
Summary:
These unit tests pass after landing all the warp size awareness patches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25963

Differential Revision: D17319124

Pulled By: bddppq

fbshipit-source-id: 22f5d5f1ca9c67e66a7ccf983b2d2f889a74e729
2019-09-11 12:31:43 -07:00
Johannes M Dieterich
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
Michael Suo
755f91b400 serializing function calls (#23799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799

Before, we inlined as part of the initial IR generation process, which
has a few disadvantages:

1. It loses information about what nodes came from which function/method
calls. Other parties who want to implement transformations on the
function/module level don't have a reliable way of doing so.
2. It duplicates a ton of code if we are inlining the same
function/method a tons of times.

After this PR: inline is deferred to the optimization stage, so
optimizations that rely on inlining will still work. But things get
serialized with the function/method calls in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799

Differential Revision: D16652819

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Pulled By: suo

fbshipit-source-id: a11af82aec796487586f81f5a9102fefb6c246db
2019-08-19 18:42:43 -07:00
Mike Ruberry
c21a774076 Moves clamp from autodiff cpp to symbolic script (#23927)
Summary:
This PR:

- Moves clamp from autodiff cpp to symbolic script
- Adds an additional tuple lowering pass to the graph executor
- Updates clamp backwards to be maximally gradient preserving

Moving clamp to symbolic script presented two challenges:

- When the backward graph is defined the branch taken in the conditional is known, but communicating this information to the Jit is a little tricky. It turns out the Jit has a quirk where variables that can be None at the time of graph instantiation are treated as constants, so testing min and max against None lets the Jit instantiate only one path branch. It might be more natural to select different backward functions for these cases, but that is not yet supported.
- Moving clamp to symbolic script introduced an extra tuple construction and immediate unpacking which prevented fusion. This was dealt with by adding an additional tuple removal pass. This issue could appear whenever a symbolic script's return value was defined in an if statement, which made the Jit see the unpacked tuple as being constructed from an if, not a TupleConstruct. The graph is later optimized but tuple lowering was not performed again after these optimizations.

Moving clamp to symbolic script also adds some explicit conversions to float in graphs which it appears, but these seem harmless.

If clamp were simply moved to symbolic script then its backward graphs would look like this:

`graph(%0 : Float(*, *),
      %1 : AutogradZeroTensor,
      %2 : Float(*, *),
      %3 : int[]?,
      %4 : Scalar?,
      %5 : int):

  %6 : None = prim::Constant() # <string>:5:31
  %7 : float = aten::Float(%5) # <string>:12:37
  %8 : Float(*, *) = prim::FusionGroup_0(%0, %2, %7)
  %9 : (Float(*, *), None, None) = prim::TupleConstruct(%8, %6, %6)
  %10 : Float(*, *), %11 : None, %12 : None = prim::TupleUnpack(%9)
  return (%10)
with prim::FusionGroup_0 = graph(%0 : Float(*, *),
      %1 : Float(*, *),
      %2 : float):
  %3 : Bool(*, *) = aten::le(%1, %2) # <string>:12:29
  %mask.5 : Float(*, *) = aten::type_as(%3, %1) # <string>:12:29
  %5 : Float(*, *) = aten::mul(%0, %mask.5) # <string>:13:28
  return (%5)`

And adding the additional pass to remove tuples eliminates the prim::TupleConstruct and prim::TupleUnpack. Keeping these included previously would cause test_fuser_iou to fail because multiple fusion groups would be created. Since https://github.com/pytorch/pytorch/issues/23372 this test is disabled, however. When enabled the relevant portion of its graph is now:

`%59 : float = aten::Float(%26) # <string>:314:38

  %60 : float = aten::Float(%27) # <string>:314:61
  %61 : int[] = aten::size(%14) # <string>:41:99
  %62 : int[] = aten::size(%11) # <string>:42:100
  %63 : int[] = aten::size(%15) # <string>:41:99
  %64 : int[] = aten::size(%12) # <string>:42:100
  %65 : Tensor, %66 : Tensor, %67 : Tensor, %68 : Tensor, %69 : Tensor, %70 : Tensor, %71 : Tensor, %72 : Tensor, %73 : Double(*, *) = prim::FusionGroup_0(%w.1, %13, %16, %23, %h.1, %54, %inter.1, %0, %12, %15, %18, %17, %29, %11, %14, %60, %59)
  %74 : Tensor = aten::_grad_sum_to_size(%73, %53)
  %75 : Tensor = aten::_grad_sum_to_size(%73, %52)
  %grad_self.10 : Tensor = aten::_grad_sum_to_size(%65, %61) # <string>:41:30
  %grad_other.10 : Tensor = aten::_grad_sum_to_size(%66, %62) # <string>:42:31
  %78 : Tensor = prim::FusionGroup_1(%grad_self.10, %74, %36)
  %79 : Tensor = prim::FusionGroup_2(%grad_other.10, %75, %44)
  %grad_self.14 : Tensor = aten::_grad_sum_to_size(%67, %21) # <string>:33:30
  %grad_other.14 : Tensor = aten::_grad_sum_to_size(%68, %22) # <string>:34:31
  %grad_self.12 : Tensor = aten::_grad_sum_to_size(%69, %63) # <string>:41:30
  %grad_other.12 : Tensor = aten::_grad_sum_to_size(%70, %64) # <string>:42:31
  %grad_self.16 : Tensor = aten::_grad_sum_to_size(%71, %19) # <string>:33:30
  %grad_other.16 : Tensor = aten::_grad_sum_to_size(%72, %20) # <string>:34:31
  %86 : Tensor, %87 : Tensor = prim::FusionGroup_3(%grad_self.12, %grad_self.16, %74, %39)
  %88 : Tensor, %89 : Tensor = prim::FusionGroup_4(%grad_other.12, %grad_other.16, %75, %47)
  return (%79, %88, %89, %78, %86, %87, %grad_self.14, %grad_other.14)`

Which I think is expected/desired.

Finally, this implementation of clamp backwards is "maximally gradient preserving," which simply means that elements on the boundary now receive gradients. For example, if an element of a tensor is 5 and the clamp is to [2, 5], then that element will now receive a gradient. The prior implementation would zero these gradients. See https://github.com/pytorch/pytorch/issues/7002 for a discussion on preserving gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23927

Test Plan: Existing tests provided sufficient coverage.

Differential Revision: D16739740

Pulled By: mruberry

fbshipit-source-id: c94291d20e1f3f25197afc7b74dc61aeb204b074
2019-08-09 13:57:03 -07:00
Thomas Viehmann
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
Michael Suo
711be82951 Make optimize a thread_local flag
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170

Test Plan: Imported from OSS

Differential Revision: D16441912

Pulled By: suo

fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55
2019-07-24 23:09:21 -07:00
Michael Suo
b6a88b3344 Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901

Test Plan: Imported from OSS

Differential Revision: D16278160

Pulled By: suo

fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a
2019-07-16 12:04:16 -07:00
Wanchao Liang
02bc06a683 avoid kernel launches for zero-sized tensor inputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790

Test Plan: Imported from OSS

Differential Revision: D16226168

Pulled By: wanchaol

fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc
2019-07-12 12:24:52 -07:00
Gavriel State
b2197ef2b0 Adding support for JIT Fusion on Windows for CUDA (#21861)
Summary:
This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861

Differential Revision: D15940939

Pulled By: soumith

fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1
2019-06-21 09:44:17 -07:00
Brian Vaughan
7284f448ba Fix handling of kwargs from common method invocations (#21499)
Summary:
When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't.

Also fixed some flake8 warnings I was seeing locally.

I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499

Differential Revision: D15826974

Pulled By: nairbv

fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36
2019-06-14 10:47:33 -07:00
Thomas Viehmann
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
Michael Suo
62af37aa88 dropout symbolic_script should respect the training flag (#20760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760
ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0

Reviewed By: jamesr66a

Differential Revision: D15486511

Pulled By: suo

fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0
2019-05-23 18:17:17 -07:00
Wanchao Liang
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
Natalia Gimelshein
da3e74b21c define use_cuda in dropout backward to allow peephole optimization to… (#20289)
Summary:
… work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289

Differential Revision: D15350262

Pulled By: wanchaol

fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4
2019-05-15 11:36:06 -07:00
Zachary DeVito
87a6974193 Make it possible for self.forward to return a ScriptMethod (#19217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217
ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083

Reviewed By: suo

Differential Revision: D14922120

Pulled By: zdevito

fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e
2019-04-24 11:14:34 -07:00
James Reed
5be4bee4ff Don't create FusionGroups for known-CPU producer values (#19342)
Summary:
I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342

Differential Revision: D15038351

Pulled By: jamesr66a

fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db
2019-04-22 16:57:18 -07:00
Wanchao Liang
a3d3008e73 JIT Layernorm fusion (#18266)
Summary:
Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266

Differential Revision: D14879877

Pulled By: wanchaol

fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe
2019-04-12 14:38:31 -07:00