Commit Graph

150 Commits

Author SHA1 Message Date
Bert Maher
a709ab34a8 [nnc] Re-enable CPU fusion" (#63665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e5.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6
2021-08-23 12:42:42 -07:00
Alban Desmaison
125e2d02e5 Revert D30417370: [nnc] Enable CPU fusion
Test Plan: revert-hammer

Differential Revision:
D30417370 (b9fc656cf2)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b
2021-08-20 12:30:21 -07:00
Bert Maher
b9fc656cf2 [nnc] Enable CPU fusion (#63545)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1
2021-08-20 11:18:21 -07:00
Nikita Shulga
30214aef2d [BE] irangefy (#62928)
Summary:
Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928

Reviewed By: driazati

Differential Revision: D30171904

Pulled By: malfet

fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae
2021-08-07 13:34:13 -07:00
Richard Barnes
f883ed9095 irange-ify 8b (#62195)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62195

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29887946

fbshipit-source-id: e3bd44721cf06a34ced47994810212be8460a2bb
2021-07-26 15:38:54 -07:00
Mike Guo
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Mike Ruberry
c0ac0fef4e Revert D27448156: irange for size_t
Test Plan: revert-hammer

Differential Revision:
D27448156 (041b4431b2)

Original commit changeset: 585da57d4de9

fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365
2021-04-03 19:14:00 -07:00
Richard Barnes
041b4431b2 irange for size_t (#55163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27448156

fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1
2021-04-02 23:22:29 -07:00
Edward Yang
c782949e17 Make the fuser raise NotImplementedError when unknown device is hit (#54709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54709

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D27338815

Pulled By: ezyang

fbshipit-source-id: 5cbaf3c19b9b85cc3f171f3b405d0cd98f832e65
2021-03-27 11:55:47 -07:00
chengjun
4a8ef4525e Add new backend type for Intel heterogeneous computation platform. (#49786)
Summary:
Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc.

https://github.com/pytorch/pytorch/issues/48246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786

Reviewed By: mrshenli

Differential Revision: D25893962

Pulled By: ezyang

fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052
2021-01-20 08:15:18 -08:00
Scott Wolchok
4a0d17ba2d [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228

`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837374

fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
2021-01-13 16:13:55 -08:00
Meghan Lele
8746e1a1cc [JIT] Fix clang-tidy warnings in jit/passes (#47984)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47984

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25258638

Pulled By: SplitInfinity

fbshipit-source-id: 0ed5ef6984ba988a2c67407efcc77355ca25bbee
2020-12-02 12:35:34 -08:00
generatedunixname89002005325676
671ee71ad4 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D25158667

fbshipit-source-id: 3b2a7facbfbfaaabc2cb5ac22906673b17fd0f15
2020-11-23 05:03:53 -08:00
Elias Ellison
a00ba63023 Disable old fuser internally (#48322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48322

Disable old fuser internally. I would like to find where we are inadvertently setting the old fuser, but in the meantime I would like to land a diff that I know will 100% cause it not to be run, and verify that it fixes the issue.

Test Plan: sandcastle

Reviewed By: ZolotukhinM

Differential Revision: D25126202

fbshipit-source-id: 5a4d0742f5f829e536f50e7ede1256c94dd05232
2020-11-21 00:42:23 -08:00
jiej
4360486346 pass strict_fuser_check for recursive fusion (#47221)
Summary:
We forgot to pass `strict_fuser_check` recursively to nested GraphFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47221

Reviewed By: zhangguanheng66

Differential Revision: D25060095

Pulled By: Krovatkin

fbshipit-source-id: 31fe79c3bc080b637fce9aacc562d60708223321
2020-11-18 16:57:38 -08:00
Nikolay Korovaiko
993628c74a Build shape expressions and remove outputs that are only used by aten::sizes (#45080)
Summary:
Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs.

A simple example would be:

```
        def test_fuse(a, b):
            c = a + b
            d = c + b
            return d
```

Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen.

Without this optimization, TE would need to materialize `c` so we can get its size

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : Tensor, %87 : Tensor = prim::If(%85)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4, %c.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %94 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%96, %97)
[DUMP profiling_graph_executor_impl.cpp:499]   %60 : int[] = aten::size(%87) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %67 : int[] = aten::size(%86) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%86, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3, %c.3)
```

With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it.

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %91 : Tensor = prim::If(%90)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %97 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %98 : (Tensor) = prim::CallFunction(%97, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %99 : Tensor = prim::TupleUnpack(%98)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%99)
[DUMP profiling_graph_executor_impl.cpp:499]   %85 : int[] = aten::size(%91)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : int[] = prim::BroadcastSizes(%59, %62)
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%91, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080

Reviewed By: bertmaher

Differential Revision: D23856410

Pulled By: Krovatkin

fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1
2020-09-28 10:45:56 -07:00
Will Constable
94e8676a70 Initialize uninitialized variable (#42419)
Summary:
Fixes internal T70924595

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42419

Reviewed By: allwu, Krovatkin

Differential Revision: D22889325

Pulled By: wconstab

fbshipit-source-id: 108b6a6c6bb7c98d77e22bae9974a6c00bc296f0
2020-08-04 11:35:54 -07:00
Yanan Cao
bdcf320bed Support custom exception message (#41907)
Summary:
Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript.

This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work.

Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message.

This is built upon an WIP PR:  https://github.com/pytorch/pytorch/pull/34112 by driazati

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41907

Reviewed By: ngimel

Differential Revision: D22778301

Pulled By: gmagogsfm

fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad
2020-08-01 13:03:45 -07:00
Will Constable
4c7fb8c2b6 make FusionCallback refer to specified GraphFuser context (#41560)
Summary:
Fixes issue where
 - top level fuser's block_ was captured by callback due to [&] capture,
 - recursive/nested fusers would compare erroneously to top-level block_ instead of own block_

Closes (https://github.com/pytorch/pytorch/issues/39810)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41560

Reviewed By: Krovatkin

Differential Revision: D22583196

Pulled By: wconstab

fbshipit-source-id: 8f543cd9ea00e116cf3e776ab168cdd9fed69632
2020-07-28 15:01:24 -07:00
Xiaomeng Yang
80d5b3785b Add torch.logit function (#41062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41062

Add torch.logit function

Test Plan: buck test mode/dev-nosan //caffe2/test:torch -- "logit"

Reviewed By: hl475

Differential Revision: D22406912

fbshipit-source-id: b303374f4c68850eb7477eb0645546a24b844606
2020-07-13 19:33:20 -07:00
Michael Suo
9e32a1f5cd [wip] update graph fuser aliasdb in-place (#37106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37106

Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with
```
x, y = f(a, b, c)
```

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the `x` and `y` `Value*`s in the process. This operation is
easy to express as an update to the aliasDb--`x_out` just takes on all
the aliasing information `x` used to have. In particular, since we know
`f` and `prim::fusionGroup` are purely functional, we don't have to mess
with any write information.

This PR is the bare minimum to get this working, in the interest of
unscrewing the compilation times ASAP.

Followups I want to do:
- We don't have a way of expressing deletion of values in AliasDb. In
`graph_fuser.cpp` we sometimes construct nodes that we end up throwing
away, and we are littering `MemoryDAG` with references to dangling
pointers. Because of the way the pass works, it's fine, but this is
fragile so I want to fix it.
- We should decouple alias analysis from write tracking, to simplify the
job of keeping the write caches consistent as we mutate the aliasing
information.
- the tensorexpr fuser doesn't do this and thus is incorrect today, we
need to update it to work.

Test Plan: Imported from OSS

Differential Revision: D21219179

Pulled By: suo

fbshipit-source-id: 8ae5397b3a0ad90edec2fbc555647091f1ad5284
2020-04-30 22:21:35 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Basil Hosmer
ad769d74d9 Collapse _like overloads into a single overload. (#33705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33705

The fact that there were two overloads appears to be a historical
artifact that dates back to when goldsborough originally added these
bindings in the first place.  If TensorOptions is made optional,
then you only need one overload, not two, as they are exactly redundant
with each other.  When MemoryFormat was added, it was made a little
harder to do this, as the C++ syntax at::empty_like(t, memory_format) would
not work if you collapsed the overload; but now it works because TensorOptions
supports MemoryFormat.

The upshot is, I can get rid of all the overloads and just have one overload.
Amazingly, this change is backwards compatible, as the test attests.  While
I was at it, I also deleted the overload name from the functions entirely.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20073355

Pulled By: bhosmer

fbshipit-source-id: c6a8908213b32ccf6737ea864d135e2cce34f56b
2020-03-01 19:40:22 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
Nikolay Korovaiko
a943b0518b strict check for a device type in Fuser (#33025)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33025

Differential Revision: D19975873

Pulled By: Krovatkin

fbshipit-source-id: 57f160bec9e4285dda63611f12665264754aac32
2020-02-20 23:53:27 -08:00
Mikhail Zolotukhin
806e7daa1f Rename TorchScript compiler to IR emitter to better reflect its function. (#33127)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33127

Test Plan: Imported from OSS

Differential Revision: D19806503

Pulled By: ZolotukhinM

fbshipit-source-id: ab78bdbbac5f12dbcc6c2e2573f5862a16ffcf3d
2020-02-12 18:45:13 -08:00
Zachary DeVito
72a00a8a9c Remove Node dependencies from operator.h (#32682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32682

This moves code around so that operator.h/cpp no longer requires a full
definition of Node* nor does it include alias analysis or the pretty printer.

This should make it possible to include in the mobile build.

Functionality for checking if operators match Node and to look up
and operator for a Node have moved to the Node object.

Test Plan: Imported from OSS

Differential Revision: D19615386

Pulled By: zdevito

fbshipit-source-id: e38bdf29971183597ef940d061c06ba56e71d9c5
2020-02-12 14:47:26 -08:00
comet
9a2691f2fc Fix spelling errors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673

Differential Revision: D19597118

Pulled By: pietern

fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955
2020-01-28 04:46:15 -08:00
Vitaly Fedyunin
4bfe2f0900 Fix jit outplace tracing and reapply changes to *_like operators. (#28839)
Summary:
Reapply reverted and fix files `gen_variable_type.py` `test_jit.py`

https://github.com/pytorch/pytorch/issues/27891 Cleanup testing of _like operators
https://github.com/pytorch/pytorch/issues/27890 Add memory format support to randn_like operator
https://github.com/pytorch/pytorch/issues/27889 Add memory format support to randint_like operator
https://github.com/pytorch/pytorch/issues/27562 Add memory format support to zeros_like operator
https://github.com/pytorch/pytorch/issues/27561 Add memory format support to rand_like operator
https://github.com/pytorch/pytorch/issues/27270 Add memory format support to ones_like operator
https://github.com/pytorch/pytorch/issues/27262 Add memory format support to full_like operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28839

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

buck test mode/dev //language_technology/neural_mt/os/pytorch_translate/test:test_onnx -- 'test_forced_decoder_export_vocab_reduction \(language_technology\.neural_mt\.os\.pytorch_translate\.test\.test_onnx\.TestONNX\)'

Differential Revision: D18203397

Pulled By: VitalyFedyunin

fbshipit-source-id: eea41cbd4c232cf5a54172b1e1b16b173798f298
2019-10-31 13:23:08 -07:00
Vitaly Fedyunin
266c1652e6 Back out "Add memory format support to rand_like operator" (#28801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28801

Original commit changeset: 2a1d47571268
ghstack-source-id: 92748792

Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx

Reviewed By: ifedan

Differential Revision: D18175304

fbshipit-source-id: ffd61f6e42f256b39b80a6b42d989c238228f25d
2019-10-28 12:44:45 -07:00
Vitaly Fedyunin
04f5325583 Add memory format support to rand_like operator (#27561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980316

Pulled By: VitalyFedyunin

fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0
2019-10-25 07:29:12 -07:00
Mike Ruberry
ac7996ccd3 Removes SymbolicVariable (#25077)
Summary:
This PR excises the last of SymbolicVariable. There should be no change in functionality. One new test for addmm fusion was added. A case where the peephole optimizer might convert a scalar argument remains untested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25077

Test Plan: Refactors existing code so mostly covered by current tests. One test for addmm fusion was added.

Differential Revision: D17145334

Pulled By: mruberry

fbshipit-source-id: 6b68faf764f9ee8398b55c43110228ed9faf81eb
2019-08-31 11:19:50 -07:00
Zachary DeVito
bdc57d3833 Merge ProfiledTensorType and TensorType (#24284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284

This PR finishes the unification of all Tensor types into a single object.
ProfiledTensorType is renamed to TensorType and the old TensorType is
deleted.

Notes:
* Fixes bug in merge for VaryingShape by changing its representation to an
 optional list of optional ints.
* Removes ProfiledTensorType::create(type) invocations that can now
  simply be expect calls on tensor type.

Test Plan: Imported from OSS

Differential Revision: D16794034

Pulled By: zdevito

fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc
2019-08-20 13:01:28 -07:00
Nikolay Korovaiko
3d15ee1b34 Remove more uses of DimensionedTensorType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060

Differential Revision: D16460391

Pulled By: Krovatkin

fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1
2019-08-01 21:19:28 -07:00
Thomas Viehmann
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
jjsjann123
252710262f (#22775)
Summary:
passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures
consistent fusion in nested Blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775

Differential Revision: D16439979

Pulled By: soumith

fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1
2019-07-25 05:54:03 -07:00
Bram Wasti
05d56bd1b6 Remove hard-coded NVRTC specific constant from fuser header
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699

Test Plan: Imported from OSS

Differential Revision: D16192290

Pulled By: bwasti

fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf
2019-07-11 13:44:25 -07:00
Thomas Viehmann
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
Wanchao Liang
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
Bram Wasti
7b733e4fc1 Rebase conflict fix for isFusableDevice (#20251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20251
ghimport-source-id: 0c8c1847a7979fcd77e4f6618730b170b6b8ce25

Differential Revision: D15262850

Pulled By: bwasti

fbshipit-source-id: 17ecc340a310ddbcce141cfa3ee0efa9660194d2
2019-05-08 12:14:12 -07:00
Bram Wasti
4ca325df87 Add Custom graph fusion (#18588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18588
ghimport-source-id: f40df177af8b87c73f04bf337f478a62133284cf

Differential Revision: D14901297

Pulled By: bwasti

fbshipit-source-id: 1b6371a5175b3d63dad542b7cc22cb82e8c6cfd0
2019-05-06 23:15:16 -07:00
Mikhail Zolotukhin
8b46938355 Cleanup includes in torch/csrc/jit/* (#19922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19922
ghimport-source-id: 0434c46bf75621ff79ea27a18a2475e7f13e2487

Differential Revision: D15125015

Pulled By: ZolotukhinM

fbshipit-source-id: 5685edfc94067f62e363a85e9badb7f757b1d321
2019-05-06 13:40:26 -07:00
Zachary DeVito
a425e1cbf8 Remove duplicate inlineCallToCode (#19724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19724
ghimport-source-id: a68d28ac9bbe62dd61f03bfd9d57f4ef1d0ce9c9

Reviewed By: jamesr66a

Differential Revision: D15078532

Pulled By: zdevito

fbshipit-source-id: bebd34ff6105f538395260b027dc169448b5bc96
2019-04-25 15:53:10 -07:00
Wanchao Liang
c571969148 Fix the insert_guard for norm decomposation (#19646)
Summary:
move the insert_guard all the way up to the beginning of the decomposation, this will fix the case that we lose insert_point context after decomposeCommonNormalization and we still need to modify the graph.

fixes #19502
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19646

Differential Revision: D15058040

Pulled By: wanchaol

fbshipit-source-id: ebdbf8623ebfe4556c461e1b650e94b905791adb
2019-04-24 23:12:37 -07:00
James Reed
e7fc7c732c Bugfix for fusion device check (#19594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19594

I missed a callsite

Reviewed By: wanchaol

Differential Revision: D15041457

fbshipit-source-id: eef76ad51bee06a56d31b4ab64f19250fe2ad8f0
2019-04-22 20:55:17 -07:00
James Reed
5be4bee4ff Don't create FusionGroups for known-CPU producer values (#19342)
Summary:
I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342

Differential Revision: D15038351

Pulled By: jamesr66a

fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db
2019-04-22 16:57:18 -07:00
Michael Suo
1e94a3bc4d Turn resolver into a class (#19236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236
ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd

Differential Revision: D14928292

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258
2019-04-19 13:01:59 -07:00