Commit Graph

213 Commits

Author SHA1 Message Date
John Clow
a9c2f11d2a Update Freezing Logic and add new passes (#68024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024

Pull Request resolved: #67949

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32260614

Pulled By: eellison

fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43
2021-11-09 13:21:52 -08:00
John Clow
ec8a71f9ac Dtype Analysis for Unary and Binary ops with Metatensors (#66898)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175961

Pulled By: Gamrix

fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734
2021-11-04 19:00:50 -07:00
Natalia Gimelshein
3d4a6ff15d Revert D32154788: Move Concat Linear out of Optimize Numerics
Test Plan: revert-hammer

Differential Revision:
D32154788 (ea94dde573)

Original commit changeset: faa6465c89b3

fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33
2021-11-04 12:20:02 -07:00
John Clow
ea94dde573 Move Concat Linear out of Optimize Numerics (#67196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154788

Pulled By: Gamrix

fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88
2021-11-04 11:30:39 -07:00
Elias Ellison
2486061c72 [JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688)
Summary:
Some of the "no-ops" are not actually no-ops because they can change the dtype

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688

Reviewed By: davidberard98

Differential Revision: D32104601

Pulled By: eellison

fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21
2021-11-03 20:11:46 -07:00
Nikolay Korovaiko
3db536e55e add jit_trace_module python binding (#67425)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998564

Pulled By: Krovatkin

fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4
2021-11-02 23:55:23 -07:00
jjsjann123
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
Nikolay Korovaiko
a7ebf76a15 jit trace (#59949)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949

Reviewed By: ZolotukhinM

Differential Revision: D31366787

Pulled By: Krovatkin

fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af
2021-10-24 18:04:22 -07:00
Nikita Shulga
6f3f302d9f [ONNX] Deprecate fold_if pass (#65697) (#66145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145

Deprecate fold_if pass

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424097

fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-22 13:46:20 -07:00
Nikita Shulga
53a163a015 [ONNX] Export nn.Module call as ONNX local function (#63589) (#66140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140

* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424098

fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-22 13:44:56 -07:00
Elias Ellison
63b41e1f4d [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797472

Pulled By: eellison

fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed
2021-10-20 16:12:58 -07:00
Michael Suo
70c9eb130d Revert D31732419: [JIT] Add partial evaluation graph stitching logic
Test Plan: revert-hammer

Differential Revision:
D31732419 (5db7db667f)

Original commit changeset: 883a55cbeef0

fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f
2021-10-19 20:07:04 -07:00
Elias Ellison
5db7db667f [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732419

Pulled By: eellison

fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3
2021-10-19 16:41:19 -07:00
John Clow
3bad54069b Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198

Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31240642

fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
2021-10-08 10:55:46 -07:00
John Clow
6cdea8239e Precomputing Transposes for frozen linear layers (#65631)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31314248

Pulled By: Gamrix

fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0
2021-10-05 20:08:32 -07:00
jjsjann123
d609957c95 patching graph_for (#55139)
Summary:
Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139

Reviewed By: albanD

Differential Revision: D31330909

Pulled By: dzhulgakov

fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724
2021-10-04 21:50:22 -07:00
Hariom Narang
2828ce53fd Added jit log stream changing function and some refactor (#65768)
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
  API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.

Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`

Testing:
- Tested python API locally.
- Unit test for the C++ API is written

Fixes https://github.com/pytorch/pytorch/issues/54182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768

Reviewed By: mrshenli

Differential Revision: D31291739

Pulled By: ZolotukhinM

fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
2021-09-30 23:25:11 -07:00
Elias Ellison
928a4bbafb [JIT] Fix compilation unit reference link in constant object upon load (#65784)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784

Reviewed By: suo

Differential Revision: D31251033

Pulled By: eellison

fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a
2021-09-30 09:32:28 -07:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
David Berard
8eb21488fd [JIT] Improve BatchMM mutability handling (#65097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097

Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node.  If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.

Test Plan: python test/test_jit.py TestBatchMM

Reviewed By: eellison

Differential Revision: D30973515

Pulled By: davidberard98

fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
2021-09-16 10:46:14 -07:00
James Reed
e1c3e5f830 [resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30744870

Pulled By: jamesr66a

fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a
2021-09-02 21:13:21 -07:00
Eli Uriegas
32a93c2424 Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing
Test Plan: revert-hammer

Differential Revision:
D30675780 (795387477f)

Original commit changeset: b2116b51dcc8

fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae
2021-09-02 16:07:29 -07:00
James Reed
795387477f [FX] Prototype for guarding against mutable operations in tracing (#64295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30675780

Pulled By: jamesr66a

fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a
2021-09-02 15:17:04 -07:00
Meghan Lele
95d0b3199b Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280)" (#64004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904

Fixes T98808160

Test Plan: T98808160

Reviewed By: msaroufim

Differential Revision: D30527450

fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4
2021-08-26 12:49:42 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
a709ab34a8 [nnc] Re-enable CPU fusion" (#63665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e5.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6
2021-08-23 12:42:42 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
BowenBao
8760254911 [ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

	1. Conv and BatchNorm op fusion.
	2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

	1. export_params
	2. training
	3. do_constant_folding
	4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-08-20 12:46:52 -07:00
Alban Desmaison
125e2d02e5 Revert D30417370: [nnc] Enable CPU fusion
Test Plan: revert-hammer

Differential Revision:
D30417370 (b9fc656cf2)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b
2021-08-20 12:30:21 -07:00
Bert Maher
b9fc656cf2 [nnc] Enable CPU fusion (#63545)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1
2021-08-20 11:18:21 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Alban Desmaison
ce61100923 Revert D29399533: Hoisting common expressions out of If blocks
Test Plan: revert-hammer

Differential Revision:
D29399533 (9477211e7d)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7
2021-08-19 06:20:40 -07:00
John Clow
9477211e7d Hoisting common expressions out of If blocks (#59492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802
2021-08-18 16:29:30 -07:00
Elias Ellison
ea808df25d Test shape analysis with opinfos (#59814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814

Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603}

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D30200058

Pulled By: eellison

fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803
2021-08-10 09:47:33 -07:00
Richard Barnes
9e77113e85 irange-ify 11 (#62121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879701

fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca
2021-07-28 13:32:09 -07:00
Meghan Lele
05b802d4e0 [pytorch] Bring back RemoveInplaceOps() (#62200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200

This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.

Test Plan: danthe3rd

Reviewed By: danthe3rd

Differential Revision: D29833316

fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
2021-07-28 12:00:38 -07:00
Gary Miguel
dec5aa2260 [JIT] clean up (#60390)
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
  was only called in one place and had a misleading comment and
  confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
  removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
  implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
  types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
  I don't remember the repro case now but I did hit this error at some
  point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
  camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390

Reviewed By: Krovatkin

Differential Revision: D29523283

Pulled By: SplitInfinity

fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
2021-07-09 16:28:27 -07:00
Bert Maher
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
Lily Johnson
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
Hariom Narang
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
Bin Bao
add291cf66 [JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477

Currently the conversion only deals with activation operators. The legality check is somewhat strict for now.

Test Plan:
```
python test/test_jit.py -k test_functional_to_inplace_activation
python test/test_jit.py -k test_inplace_to_functional_activation
```

Reviewed By: mrshenli

Differential Revision: D28155153

Pulled By: desertfire

fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b
2021-06-03 06:43:23 -07:00
eellison
d8cbba3ee2 [JIT] Disable Complete Shape Inlining For Testing Purposes (#56966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966

This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444664

Pulled By: eellison

fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10
2021-05-27 17:57:48 -07:00
eellison
f66fbb1e2e Add unary/binary ops necessary for mobilenet (#56828)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56828

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444660

Pulled By: eellison

fbshipit-source-id: 656673e6139550f2752c0d3ac2fb8731f4bf9bbb
2021-05-27 17:56:30 -07:00
Kimish Patel
e067675167 [Pytorch] Provide API to preserve source range and callstack information during graph rewrite (#58300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58300

Current state: During graph rewriting that can fuse nodes or add nodes
result in new nodes without debug information that was available in
original node. Thus we lose this information during graph rewrite.

This PR changes graph rewriting API to let user specify how the values
in the replacement pattern map to values in the pattern to be matched.
Then the graph rewriting will copy source range and inlined callstack
from the matched nodes onto the nodes being inserted.

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_jit.py
TestJit.test_pattern_based_rewrite_with_source_range_preserved

Imported from OSS

Reviewed By: malfet

Differential Revision: D28512465

fbshipit-source-id: 863173c29de726be85b3acbd3ddf3257eea36d13
2021-05-25 09:18:59 -07:00
Elias Ellison
5313bafd31 [JIT] integer value refinement (#56438)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56438

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D27924239

Pulled By: eellison

fbshipit-source-id: ace54fcb594853f30c242369ea203b0eb5527ac1
2021-05-21 08:51:01 -07:00
Elias Ellison
5cebf29b4e Add list len refinement (#55926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55926

This is necessary for code like conv2d where we wish to share a generic convolution shape function logic with that of conv2d but for conv2d always infer the output is dimension 4. I'm also hoping the refinement algorithm here could be refactored out and used to support refining tensor types from user annotations. i have a length comment explaining how this works, and the logic outside of data structures is pretty small and contained. Additionally, you might check out https://fb.quip.com/X7EVAdQ99Zzm for a very similar description of how to refine values based on comparison operators.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750997

Pulled By: eellison

fbshipit-source-id: d962415af519ac37ebc9de88f2e1ea60a1374f7c
2021-05-21 08:50:54 -07:00
Elias Ellison
9fd2306036 Add handling of symbolic shapes (#55925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55925

This sets up the initial handling of symbolic shapes. As in the test, it doesn't work perfectly yet because it needs a couple other optimization passes. The basic description is pretty simple: we resolve tensor dimension indices to the same Value *, and before extracting out the output Tensor shape we substitute in symbolic shapes. We don't substitute during optimization because they are represented as negative numbers so we don't want them inadvertently used in Constant prop or something else.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750996

Pulled By: eellison

fbshipit-source-id: 6984e7276b578f96b00fc2025cef0e13f594b6e6
2021-05-21 08:50:52 -07:00
Elias Ellison
f39471a171 Initial Symbolic Shape Analysis (#54809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54809

I'm going to post on dev-discuss soon with a more thorough explanation of the design and advantages of this shape analysis, so I'm leaving out that for now.

There is still a ton left to do, I'm posting this initial version so we can get something on master multiple can work on. List of many remaining steps to do:

- [ ] Add symbolic shapes support
- [ ] Bind shape functions for operators in C++
- [ ] Make classes of operators share the same shape function (e.g. pointwise, broadcast two inputs)
- [ ] Refactor APIs
- [ ] Only iteratively optimize shape function while a change has been made
- [ ] Expand coverage of coverage to common ops
- [ ] Add shape analysis pass on Graph that handles Ifs and Loops
- [ ] Allow concurrent reads to the operator map
- [ ] Successive applications of same inputs to same shape function (e.g. series of pointwise ops)

For this review, I am mostly looking for comments related to the implementation of symolic_shape_analysis.cpp, with the caveats listed above. I am not really looking for comments related to api/registration/graph level analysis as those are all planned to be changed. I am fine landing this as is or waiting until necessary components of the TODOs above are finished.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D27750998

Pulled By: eellison

fbshipit-source-id: 4338b99e8651df076291c6b781c0e36a1bcbec03
2021-05-21 08:49:46 -07:00
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
Luca Wehrstedt
5a238eb96e Fix deadlock in Future due to lock inversion with GIL (#58382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58382

Calling markCompleted on a Future now first acquires the Future's mutex (as usual) but then sometimes tries to acquire the GIL during the DataPtr extraction while still holding the Future's mutex. (This happens when the value passed to markCompleted is a Python object). This can cause a deadlock if someone else calls any of the other methods of Future while holding the GIL.

There are two solutions to this: avoid holding the Future's mutex when extracting DataPtrs, and avoid holding the GIL while invoking the Future's method. In this PR I'm going for the latter, because it's a very simple immediate fix, but I believe this is brittle and that we should probably also consider the former fix.
ghstack-source-id: 129105358

Test Plan: The repro in https://github.com/pytorch/pytorch/issues/58239 now doesn't deadlock.

Reviewed By: mrshenli

Differential Revision: D28472816

fbshipit-source-id: 1bc9bca426dd004f9eb2568db1ffd38f014450e2
2021-05-17 10:53:19 -07:00