Commit Graph

220 Commits

Author SHA1 Message Date
Tugsbayasgalan (Tugsuu) Manlaibaatar
8bdbe94344 Add forward compatability tests in CI (#64139)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64139

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30626912

Pulled By: tugsbayasgalan

fbshipit-source-id: 781a88386701b42e2e86daaca0a779d1fc1c4df3
2022-01-05 23:40:06 -08:00
Michael Suo
402f2934bf Revert D33262228: Per-overload torch.ops API
Test Plan: revert-hammer

Differential Revision:
D33262228 (8e6d1738a4)

Original commit changeset: 600dbf511514

Original Phabricator Diff: D33262228 (8e6d1738a4)

fbshipit-source-id: 238fa88ea9c4f26c7511334765c07452fbca9655
2022-01-05 22:10:11 -08:00
anjali411
8e6d1738a4 Per-overload torch.ops API (#67254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254

Fixes https://github.com/pytorch/pytorch/issues/65997

TODO: disallow `default` as an overload name for aten operators.

BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33262228

Pulled By: anjali411

fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909
2022-01-05 15:17:41 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
4ae71c8d34 Add graph op replacement pass (#69915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198158

Pulled By: tugsbayasgalan

fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8
2021-12-25 13:03:19 -08:00
jjsjann123
e429a68478 Allow single node fusion for nvfuser (#70000)
Summary:
Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000

Reviewed By: samdow

Differential Revision: D33292195

Pulled By: davidberard98

fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756
2021-12-23 17:07:57 -08:00
David Berard
c21169ea41 [JIT] optimize_for_inference on methods other than forward (#69367)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D32835529

Pulled By: davidberard98

fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf
2021-12-07 12:36:47 -08:00
Nikolay Korovaiko
ab1d879b33 [WIP] forbid aliasing between the outputs of a differentiable graph (#67732)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732

Reviewed By: cpuhrsch

Differential Revision: D32522826

Pulled By: Krovatkin

fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12
2021-11-18 15:03:35 -08:00
John Clow
a9c2f11d2a Update Freezing Logic and add new passes (#68024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024

Pull Request resolved: #67949

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32260614

Pulled By: eellison

fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43
2021-11-09 13:21:52 -08:00
John Clow
ec8a71f9ac Dtype Analysis for Unary and Binary ops with Metatensors (#66898)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175961

Pulled By: Gamrix

fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734
2021-11-04 19:00:50 -07:00
Natalia Gimelshein
3d4a6ff15d Revert D32154788: Move Concat Linear out of Optimize Numerics
Test Plan: revert-hammer

Differential Revision:
D32154788 (ea94dde573)

Original commit changeset: faa6465c89b3

fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33
2021-11-04 12:20:02 -07:00
John Clow
ea94dde573 Move Concat Linear out of Optimize Numerics (#67196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154788

Pulled By: Gamrix

fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88
2021-11-04 11:30:39 -07:00
Elias Ellison
2486061c72 [JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688)
Summary:
Some of the "no-ops" are not actually no-ops because they can change the dtype

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688

Reviewed By: davidberard98

Differential Revision: D32104601

Pulled By: eellison

fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21
2021-11-03 20:11:46 -07:00
Nikolay Korovaiko
3db536e55e add jit_trace_module python binding (#67425)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998564

Pulled By: Krovatkin

fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4
2021-11-02 23:55:23 -07:00
jjsjann123
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
Nikolay Korovaiko
a7ebf76a15 jit trace (#59949)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949

Reviewed By: ZolotukhinM

Differential Revision: D31366787

Pulled By: Krovatkin

fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af
2021-10-24 18:04:22 -07:00
Nikita Shulga
6f3f302d9f [ONNX] Deprecate fold_if pass (#65697) (#66145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145

Deprecate fold_if pass

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424097

fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-22 13:46:20 -07:00
Nikita Shulga
53a163a015 [ONNX] Export nn.Module call as ONNX local function (#63589) (#66140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140

* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424098

fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-22 13:44:56 -07:00
Elias Ellison
63b41e1f4d [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797472

Pulled By: eellison

fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed
2021-10-20 16:12:58 -07:00
Michael Suo
70c9eb130d Revert D31732419: [JIT] Add partial evaluation graph stitching logic
Test Plan: revert-hammer

Differential Revision:
D31732419 (5db7db667f)

Original commit changeset: 883a55cbeef0

fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f
2021-10-19 20:07:04 -07:00
Elias Ellison
5db7db667f [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732419

Pulled By: eellison

fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3
2021-10-19 16:41:19 -07:00
John Clow
3bad54069b Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198

Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31240642

fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
2021-10-08 10:55:46 -07:00
John Clow
6cdea8239e Precomputing Transposes for frozen linear layers (#65631)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31314248

Pulled By: Gamrix

fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0
2021-10-05 20:08:32 -07:00
jjsjann123
d609957c95 patching graph_for (#55139)
Summary:
Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139

Reviewed By: albanD

Differential Revision: D31330909

Pulled By: dzhulgakov

fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724
2021-10-04 21:50:22 -07:00
Hariom Narang
2828ce53fd Added jit log stream changing function and some refactor (#65768)
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
  API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.

Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`

Testing:
- Tested python API locally.
- Unit test for the C++ API is written

Fixes https://github.com/pytorch/pytorch/issues/54182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768

Reviewed By: mrshenli

Differential Revision: D31291739

Pulled By: ZolotukhinM

fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
2021-09-30 23:25:11 -07:00
Elias Ellison
928a4bbafb [JIT] Fix compilation unit reference link in constant object upon load (#65784)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784

Reviewed By: suo

Differential Revision: D31251033

Pulled By: eellison

fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a
2021-09-30 09:32:28 -07:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
David Berard
8eb21488fd [JIT] Improve BatchMM mutability handling (#65097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097

Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node.  If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.

Test Plan: python test/test_jit.py TestBatchMM

Reviewed By: eellison

Differential Revision: D30973515

Pulled By: davidberard98

fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
2021-09-16 10:46:14 -07:00
James Reed
e1c3e5f830 [resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30744870

Pulled By: jamesr66a

fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a
2021-09-02 21:13:21 -07:00
Eli Uriegas
32a93c2424 Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing
Test Plan: revert-hammer

Differential Revision:
D30675780 (795387477f)

Original commit changeset: b2116b51dcc8

fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae
2021-09-02 16:07:29 -07:00
James Reed
795387477f [FX] Prototype for guarding against mutable operations in tracing (#64295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30675780

Pulled By: jamesr66a

fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a
2021-09-02 15:17:04 -07:00
Meghan Lele
95d0b3199b Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280)" (#64004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904

Fixes T98808160

Test Plan: T98808160

Reviewed By: msaroufim

Differential Revision: D30527450

fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4
2021-08-26 12:49:42 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
a709ab34a8 [nnc] Re-enable CPU fusion" (#63665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e5.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6
2021-08-23 12:42:42 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
BowenBao
8760254911 [ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

	1. Conv and BatchNorm op fusion.
	2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

	1. export_params
	2. training
	3. do_constant_folding
	4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-08-20 12:46:52 -07:00
Alban Desmaison
125e2d02e5 Revert D30417370: [nnc] Enable CPU fusion
Test Plan: revert-hammer

Differential Revision:
D30417370 (b9fc656cf2)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b
2021-08-20 12:30:21 -07:00
Bert Maher
b9fc656cf2 [nnc] Enable CPU fusion (#63545)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1
2021-08-20 11:18:21 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Alban Desmaison
ce61100923 Revert D29399533: Hoisting common expressions out of If blocks
Test Plan: revert-hammer

Differential Revision:
D29399533 (9477211e7d)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7
2021-08-19 06:20:40 -07:00
John Clow
9477211e7d Hoisting common expressions out of If blocks (#59492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802
2021-08-18 16:29:30 -07:00
Elias Ellison
ea808df25d Test shape analysis with opinfos (#59814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814

Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603}

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D30200058

Pulled By: eellison

fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803
2021-08-10 09:47:33 -07:00
Richard Barnes
9e77113e85 irange-ify 11 (#62121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879701

fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca
2021-07-28 13:32:09 -07:00
Meghan Lele
05b802d4e0 [pytorch] Bring back RemoveInplaceOps() (#62200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200

This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.

Test Plan: danthe3rd

Reviewed By: danthe3rd

Differential Revision: D29833316

fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
2021-07-28 12:00:38 -07:00
Gary Miguel
dec5aa2260 [JIT] clean up (#60390)
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
  was only called in one place and had a misleading comment and
  confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
  removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
  implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
  types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
  I don't remember the repro case now but I did hit this error at some
  point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
  camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390

Reviewed By: Krovatkin

Differential Revision: D29523283

Pulled By: SplitInfinity

fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
2021-07-09 16:28:27 -07:00
Bert Maher
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
Lily Johnson
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
Hariom Narang
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
Bin Bao
add291cf66 [JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477

Currently the conversion only deals with activation operators. The legality check is somewhat strict for now.

Test Plan:
```
python test/test_jit.py -k test_functional_to_inplace_activation
python test/test_jit.py -k test_inplace_to_functional_activation
```

Reviewed By: mrshenli

Differential Revision: D28155153

Pulled By: desertfire

fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b
2021-06-03 06:43:23 -07:00
eellison
d8cbba3ee2 [JIT] Disable Complete Shape Inlining For Testing Purposes (#56966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966

This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444664

Pulled By: eellison

fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10
2021-05-27 17:57:48 -07:00
eellison
f66fbb1e2e Add unary/binary ops necessary for mobilenet (#56828)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56828

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444660

Pulled By: eellison

fbshipit-source-id: 656673e6139550f2752c0d3ac2fb8731f4bf9bbb
2021-05-27 17:56:30 -07:00