Commit Graph

71 Commits

Author SHA1 Message Date
chunyuan
8b11d81058 [Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)
Re-landing https://github.com/pytorch/pytorch/pull/68111

## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596
Approved by: https://github.com/malfet
2022-04-29 01:01:33 +00:00
Elias Ellison
e50dd5ba97 [JIT] Allow empty temporary list literals to be matched to arbitrary types (#74768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74768

As commented in code:
```
    // Empty List Literals that are not assigned to variables
    // may match to any list type in schema matching,
    // but still default to List[Tensor] if assigned to a variable
    // or returned from a function
    // Restricting empty list matching to temporary values
    // avoids difficult to handle cases such as
    // a = []
    // b = a
    // if cond:
    //    b.append(2)
    // else:
    //    a.append("hi")
    // This is also the same behavior that C++ allows with {}
    // (cannot assign to a variable typed as auto)
```

Fix for https://github.com/facebookresearch/torchdynamo/issues/95

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D35362760

Pulled By: eellison

fbshipit-source-id: da23e8889312001b60d64a1758da5c578b6fe5ea
(cherry picked from commit 75682f17204d6d444e7e7144472c6e833150c601)
2022-04-06 18:11:23 +00:00
Michael Suo
e5bf87963d Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
Test Plan: revert-hammer

Differential Revision:
D34584878 (7dd0823011)

Original commit changeset: ce817aa8cc90

Original Phabricator Diff: D34584878 (7dd0823011)

fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b
(cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)
2022-03-21 23:07:14 +00:00
chunyuan
7dd0823011 Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111)
Summary:
## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
(cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)
2022-03-21 22:12:19 +00:00
Elias Ellison
ab6395fc65 Add api for recursively analyzing function calls (#73329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73329

There is a quantization use case for having better alias analysis with function calls remaining. This does the relatively dumb approach of getting the inlined graph of each function call, and then analyzing that subgraph. Since we need a unique single analysis of every `Value*`, for every function call make a copy of the graph for every analysis past the first. This is relatively slow, but given the limited use case here should work well enough (and is no slower than calling the inlining pass).

cc vkuzo

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D34451424

Pulled By: eellison

fbshipit-source-id: b7c7e54679d723f5ded1e11ffb32eb6d2176431d
(cherry picked from commit 81a42b31522b890311a3f512448b372c4ebbefd1)
2022-02-28 17:44:45 +00:00
Scott Wolchok
3ad971798f [PyTorch][JIT] use a better hash table in alias analysis (#69854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69854

ghstack-source-id: 148315147

Test Plan: Time reported to start up static runtime on ctr_mobile_feed local_ro net is 8.8s instead of 9.5s

Reviewed By: suo, d1jang

Differential Revision: D33039733

fbshipit-source-id: 218dc7ff9aa421a352b71952ec77757368095860
(cherry picked from commit 7586712948)
2022-02-05 02:15:26 +00:00
Scott Wolchok
1bbea3c3a2 [PyTorch][JIT] Support mayContainAlias(Value*, ArrayRef<Value*>) (#69853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853

We can implement this overload more efficiently.
ghstack-source-id: 146924693

Test Plan:
patched alias_analysis tests

Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s.

Reviewed By: mikeiovine

Differential Revision: D33039731

fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397
2022-01-12 16:53:54 -08:00
Elias Ellison
4a8d4cde65 Fix for tensor in list return added to wildcard set (#71170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71170

As with an output in a tuple return, an output in a list return will not have any further uses that would make adding it directly to the list's contained elements give incorrect behavior. This unblocks a use case in op authoring.

cc Chillee

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D33535608

Pulled By: eellison

fbshipit-source-id: 2066d28e98c2f5d1b3d7e0206c7e39a27b3884b1
2022-01-11 22:12:39 -08:00
Elias Ellison
9bccb31306 Remove precise tuple construct flag (#71121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D33515234

Pulled By: eellison

fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8
2022-01-11 22:12:36 -08:00
Raghavan Raman
0bdf4702f6 [jit] Add a new op that composes all of the dynamic shape logic (#69476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476

This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard.
ghstack-source-id: 146107006

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/cpp/jit:jit
```

Reviewed By: eellison

Differential Revision: D32320082

fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73
2021-12-22 00:28:57 -08:00
David Berard
96d116fec2 [JIT] Add additional debug output when op cannot be found in AliasDb (#68099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68099

When an op in the graph cannot be matched to any known ops, alias_analysis.cpp throws an error.

Before:
```
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for aten::add but it isn't a special case. Argument types: Tensor, float, Tensor,
```

After:
```
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for a
ten::add but it isn't a special case.  Argument types: Tensor, float, Tensor,

Candidates:
        aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> (Tensor)
        aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
        aten::add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> (Tensor(a!))
        aten::add.t(t[] a, t[] b) -> (t[])
        aten::add.str(str a, str b) -> (str)
        aten::add.int(int a, int b) -> (int)
        aten::add.complex(complex a, complex b) -> (complex)
        aten::add.float(float a, float b) -> (float)
        aten::add.int_complex(int a, complex b) -> (complex)
        aten::add.complex_int(complex a, int b) -> (complex)
        aten::add.float_complex(float a, complex b) -> (complex)
        aten::add.complex_float(complex a, float b) -> (complex)
        aten::add.int_float(int a, float b) -> (float)
        aten::add.float_int(float a, int b) -> (float)
        aten::add(Scalar a, Scalar b) -> (Scalar)
```

Test Plan:
Run
```
import torch

if __name__ == '__main__':
    ir = """
graph(%x : Tensor,
      %y : Tensor):
  %2 : float = prim::Constant[value=1.2]()
  %result : Tensor= aten::add(%x, %2, %y)
  return (%result)
"""
    x = torch.tensor([[1., 2.], [3., 4.]])
    y = torch.tensor([[2., 1.], [2., 1.]])
    graph = torch._C.parse_ir(ir)
    print(graph)
    graph.alias_db().analyze()
    # print(script(x, y))
```

to get the results above

Imported from OSS

Reviewed By: anjali411

Differential Revision: D32339639

fbshipit-source-id: a79a3c2f157154b5fb1e3f33a23e43b7884e8e38
2021-11-12 08:39:41 -08:00
David Berard
e86d8323cb [JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554

In native_functions.yaml, the schemas for batch_norm and instance_norm
are incorrect: the inputs `running_mean` and `running_var` are mutated,
but are not marked as such in the function schema. Since `(a!)?`
annotations are currently not working (see #65760), this instead adds a
special case to `alias_anaysis.cpp`. If the value of `training` or
`use_input_stats` is known to be `false`, then `alias_analysis` will
mark the input as _not_ being written to.

Test Plan:
Removed the `skip` annotation on the following test, and added a special
exception in `check_alias_annotations`:
```
python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm
```

Also:
```
./build/bin/test_jit --gtest_filter="*BatchAndInstanceNormFixture*"
```

Imported from OSS

Reviewed By: eellison

Differential Revision: D31612339

fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb
2021-10-20 10:22:10 -07:00
Scott Wolchok
0aab34c26c [jit] Refcounting spot fixes in alias_analysis (#66295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66295

Tidying up the top sources of reference count decrements seen during static runtime startup in alias_analysis.cpp specifically.
ghstack-source-id: 140484160

Test Plan:
CI

perf now shows under 2% time spend in ~__shared_count instead of about 5%.

Reviewed By: suo

Differential Revision: D31490761

fbshipit-source-id: bbdcb7f9065c3aafa7fff7bfea9cea6dbc41f9d9
2021-10-13 14:47:32 -07:00
Scott Wolchok
9767282643 [jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344

Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them.
ghstack-source-id: 140484162

Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements

Reviewed By: ejguan

Differential Revision: D31027363

fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb
2021-10-13 14:47:30 -07:00
Scott Wolchok
94845fc44e [jit] Implement one-argument AliasDb::mayContainAlias more efficiently (#65177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65177

There is no need to heap-allocate any vectors in this case.
ghstack-source-id: 140052520

Test Plan:
CI

Startup for static runtime on ctr_mobile_feed local net decreased from 7.8s to about 7.0s

Reviewed By: malfet

Differential Revision: D30984194

fbshipit-source-id: 85091e55445f653ec728b27da4b459a2f1873013
2021-10-08 10:29:25 -07:00
Don Jang
7941590a51 [JIT] Selectively enable precise alias analysis for TupleConstruct (#66025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025

This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (cd458fe092)) to minimize its exposure only to `StaticRuntime` as of now.

Test Plan: Modified existing unit tests whose behavior depends on D30437737 (cd458fe092).

Reviewed By: eellison

Differential Revision: D31350285

fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6
2021-10-01 20:42:22 -07:00
Don Jang
cd458fe092 [JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879

This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.

The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.

Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`

to cover the newly added code.

Reviewed By: eellison

Differential Revision: D30437737

fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
2021-09-29 21:56:31 -07:00
Scott Wolchok
ece25c453f [PyTorch] Store Argument::alias_info_ on the heap (#64824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824

See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896

Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.

Reviewed By: suo

Differential Revision: D30860719

fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
2021-09-24 17:00:51 -07:00
Ansley Ussery
c60075d4b5 Preserve types during empty container assignment (#58911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911

Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ #58911

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30785623

Pulled By: ansley

fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090
2021-09-10 16:49:21 -07:00
Ansley Ussery
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
Matej Sladek
f807229fd4 [ONNX] add support for prim::Unitialized in lower_tuples pass (#56912)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56911

Code from issue generates this Torchscript:
```
graph(%self : __torch__.MyModule,
      %t.1 : Tensor):
  %12 : None = prim::Constant()
  %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28
  %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15
  %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31
  %33 : (Tensor, Tensor) = prim::Uninitialized()
  %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8
    block0():
       = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12
      -> (%33)
    block1():
      %11 : int[] = prim::ListConstruct(%9)
      %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19
      %18 : int[] = prim::ListConstruct(%9)
      %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35
      %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23)
      -> (%24)
  return (%34)
```

Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized.
Solution is:
1. add prim::Unitialized to supported_op in lower_tuples pass
1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912

Reviewed By: nikithamalgifb

Differential Revision: D29837200

Pulled By: SplitInfinity

fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda
2021-08-10 16:21:16 -07:00
Richard Barnes
88f8f2ab94 irange-ify 6 (#62115)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62115

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879576

fbshipit-source-id: 63cbf0ab5a52325fa2c3dec0e8239e2eac1ecf72
2021-07-28 13:32:11 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Peng Wu
838d3079ad Lazily initialize alias db in remove_mutation opt (#55949)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55949

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27793881

fbshipit-source-id: eebde5b5142d8fecfee4756604d313b0da809882
2021-04-19 09:45:33 -07:00
Mike Ruberry
c0ac0fef4e Revert D27448156: irange for size_t
Test Plan: revert-hammer

Differential Revision:
D27448156 (041b4431b2)

Original commit changeset: 585da57d4de9

fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365
2021-04-03 19:14:00 -07:00
Richard Barnes
041b4431b2 irange for size_t (#55163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27448156

fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1
2021-04-02 23:22:29 -07:00
Tugsbayasgalan Manlaibaatar
444552e7f9 Optimize alias_analysis node lookup (#54115)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54115

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27104047

Pulled By: tugsbayasgalan

fbshipit-source-id: 0ef4e78be9ea7081b63ab2303711746bf09653eb
2021-03-18 07:14:49 -07:00
Thomas Viehmann
fd5c1123e4 wrap AliasDb in Python (#51336)
Summary:
Also added a wrapper tlemo 's graphviz export to string.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51336

Reviewed By: ezyang

Differential Revision: D26150809

Pulled By: eellison

fbshipit-source-id: 9beafce5cbdc1785b986b71c3cd986c1087faa11
2021-03-17 12:55:22 -07:00
Elias Ellison
32fed3f375 Handle mkldnn broadcasting in mkldnn fuser (#51736)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51736

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696694

Pulled By: eellison

fbshipit-source-id: 473cc64c8d9f775e9d06340437aff2eb6c0619b9
2021-03-01 21:22:23 -08:00
Elias Ellison
a2f7e929ef Add MKLDNN fuser (#51600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51600

Looking for notes on implementation first, will post more notes on benchmarks and overall thoughts/implementation and solicit more input soon.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696702

Pulled By: eellison

fbshipit-source-id: cd612f093fe3859e42fb0b77560ebd1b44fccff7
2021-03-01 21:22:19 -08:00
Elias Ellison
bfae3789ba Move conv to mkldnn (#51483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483

This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696703

Pulled By: eellison

fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf
2021-03-01 21:19:27 -08:00
Bram Wasti
c4a8f0ceaa [torch script] Add pure list producing ops to alias analysis (#51999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51999

as in title

Test Plan: waiting on CI for now

Reviewed By: eellison

Differential Revision: D26349297

fbshipit-source-id: bd5574ed1f8448ba18a6fda4bdc45f45d8b158e9
2021-02-10 09:00:39 -08:00
Scott Wolchok
7328710cbc [PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229

`fastmod -m 'cast(<((at|c10)::)?\w+Type>\(\)\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the
result of `cast()` wasn't being saved anywhere, so we didn't need
it, so we can use a raw pointer instead of a new `shared_ptr`.
ghstack-source-id: 120769170

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837494

fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada
2021-02-01 23:12:07 -08:00
jiej
dd1c2a06b7 refactor profiling optional (#47667)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667

Test Plan: Imported from OSS

Reviewed By: anjali411, ngimel

Differential Revision: D25255572

Pulled By: Krovatkin

fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f
2021-01-22 14:45:28 -08:00
Lemo
ca3ce77746 Dump torch::jit::AliasDb objects as Graphviz files (#50452)
Summary:
This PR adds a simple debugging helper which exports the AliasDb state as a [GraphViz](http://www.graphviz.org/) graph definition. The generated files can be viewed with any Graphviz viewer (including online based, for example http://viz-js.com)

Usage:

1. Call `AliasDb::dumpToGraphvizFile()` from a debugger. Using gdb for example:
`call aliasDb_->dumpToGraphvizFile("alias.dot")`

2. Add explicit calls to `AliasDb::dumpToGraphvizFile()`, which returns `true` if it succeeds.

An example output file is attached: [example.zip](https://github.com/pytorch/pytorch/files/5805840/example.zip)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50452

Reviewed By: ngimel

Differential Revision: D25980222

Pulled By: eellison

fbshipit-source-id: 47805a0a81ce73c6ba859340d37b9a806f9000d5
2021-01-22 13:38:47 -08:00
Nikolay Korovaiko
8e60bf9034 add RequiresGradCheck (#50392)
Summary:
This change improves perf by 3-4% on fastrnns.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50392

Reviewed By: izdeby

Differential Revision: D25891392

Pulled By: Krovatkin

fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0
2021-01-15 16:50:42 -08:00
Scott Wolchok
4a0d17ba2d [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228

`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837374

fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
2021-01-13 16:13:55 -08:00
Nikitha Malgi
12b73fdbbf Adding JIT support for cuda streams and events (#48020)
Summary:
=======

This PR addresses the following:

 * Adds JIT support for CUDA Streams
 * Adds JIT support for CUDA Events
 * Adds JIT support for CUDA Stream context manager

Testing:
======

python test/test_jit.py -v TestCUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020

Reviewed By: navahgar

Differential Revision: D25725749

Pulled By: nikithamalgifb

fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c
2020-12-29 20:24:57 -08:00
Ansley Ussery
c619892482 Fix errata (#49903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49903

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25718411

Pulled By: ansley

fbshipit-source-id: 0cc365c5a53077752dc1c5a5c4a65b873baa3604
2020-12-28 20:40:41 -08:00
Bram Wasti
f4226b5c90 [static runtime] add static subgraph fusion pass (#49185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185

This diff adds a fusion feature that will let us use static runtime for *parts* of the graph.  This will prove useful in cases where fully eliminating control flow is hard etc.

TODO:
[x] factor out into separate fusion file
[x] add python test case
[x] add graph that isn't fully lowered test case
[x] add graph that has weird list/tuple outputs test case

the loop example looks quite good:
```
graph(%a.1 : Tensor,
      %b.1 : Tensor,
      %iters.1 : int):
  %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4
  %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1)
  %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4
    block0(%i : int, %c.12 : Tensor):
      %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1)
      -> (%12, %c.10)
  return (%c)
with prim::StaticSubgraph_0 = graph(%0 : Tensor,
      %4 : Tensor):
  %5 : int = prim::Constant[value=2]()
  %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12
  %2 : int = prim::Constant[value=1]()
  %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8
  return (%c.2)
with prim::StaticSubgraph_1 = graph(%1 : Tensor,
      %7 : Tensor,
      %8 : Tensor):
  %9 : int = prim::Constant[value=1]()
  %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12
  %5 : int = prim::Constant[value=2]()
  %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8
  %2 : int = prim::Constant[value=1]()
  %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8
  return (%c.10)
```

(Note: this ignores all push blocking failures!)

Test Plan:
buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest

buck test mode/no-gpu caffe2/test:static_runtime

Reviewed By: bertmaher

Differential Revision: D25385702

fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098
2020-12-10 14:03:11 -08:00
jiej
a6fa3b2682 adding profile_ivalue (#47666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25255573

Pulled By: Krovatkin

fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638
2020-12-09 15:29:15 -08:00
Michael Suo
dc8176356e Various cleanups to ir_emitter and friends (#46686)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686

I was trying to page this code back in after a while and some things
stuck out as unnecessarily confusing.

1. Improve documentation of closures and fork stuff to be more accurate
to how we use them today.
2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is
only ever used for a list comprehensions, and in general the nodes
emitted by `ir_emitter` should correspond to concrete operations or
language features rather than semantic constraints.
3. Change the somewhat mysterious "inputs" and "attributes" argument
names throughout the codebase to be the more obvious "args" and "kwargs"
that they generally represent (I think "inputs" and "attributes" come
from the AST naming).

Test Plan: Imported from OSS

Reviewed By: navahgar, jamesr66a

Differential Revision: D24464197

Pulled By: suo

fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4
2020-10-28 16:28:05 -07:00
Elias Ellison
ae286d81e0 [JIT] improve alias analysis for list constructs (#39111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111

In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach:
- it is not to hard to maintain the aliasDb implementation
- it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated
- It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements.

The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op.

In an example like:

```
 def foo(input):
    x = torch.tensor([1, 2, 3, 4])
    y = [x, x]
    input.add_(1)
    return torch.cat(y)
```

we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23828003

Pulled By: eellison

fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537
2020-09-22 09:38:59 -07:00
Wanchao Liang
ab6126b50e [rpc][jit] support remote call in TorchScript (#43046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23621108

Pulled By: wanchaol

fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651
2020-09-11 14:59:51 -07:00
Wanchao Liang
3e5df5f216 [rpc][jit] support rpc_sync in TorchScript (#43043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043

This add the support for rpc_sync in TorchScript in a way similar to
rpc_async

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23252039

Pulled By: wanchaol

fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c
2020-09-11 14:59:47 -07:00
Elias Ellison
a7e7981c0b Use prim::TensorExprGroup interned symbol (#43635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635

Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358806

Pulled By: eellison

fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8
2020-08-31 11:52:16 -07:00
Nikolay Korovaiko
000739c31a Function calls for fallback paths (#43274)
Summary:
This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274

Reviewed By: malfet

Differential Revision: D23406961

Pulled By: Krovatkin

fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c
2020-08-28 23:31:02 -07:00
Elias Ellison
01f974eb1e Specialize optionals for grad_sum_to_size (#43633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633

In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:"
`"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"`
 If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time).

We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values.

In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now.

Test Plan: Imported from OSS

Reviewed By: bwasti, ZolotukhinM

Differential Revision: D23358809

Pulled By: eellison

fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d
2020-08-27 14:35:37 -07:00
Nikolay Korovaiko
a97ca93c0e remove prim::profile and special-casing (#43160)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160

Reviewed By: ZolotukhinM

Differential Revision: D23284421

Pulled By: Krovatkin

fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309
2020-08-22 23:52:36 -07:00