Commit Graph

626 Commits

Author SHA1 Message Date
Don Jang
cd458fe092 [JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879

This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.

The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.

Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`

to cover the newly added code.

Reviewed By: eellison

Differential Revision: D30437737

fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
2021-09-29 21:56:31 -07:00
Scott Wolchok
ece25c453f [PyTorch] Store Argument::alias_info_ on the heap (#64824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824

See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896

Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.

Reviewed By: suo

Differential Revision: D30860719

fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
2021-09-24 17:00:51 -07:00
Peter Bell
68e5935498 Remove fgrad_input from slow_conv2d (#64280)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30830887

Pulled By: jbschlosser

fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4
2021-09-24 14:27:39 -07:00
XiaobingSuper
1682722152 keep output type after calling SubgraphRewriter (#65453)
Summary:
For jit **SubgraphRewriter**, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying **SubgraphRewriter**, the tensor shape info was eliminated.

The activation is that I want to replace pytorch convolution with a customer's convolution, I first register **aten::_convolution** as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as **aten::conv2d**, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing **aten::conv2d** with the customer's convolution.

Before rewrite:
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4,                       %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.                      6/site-packages/torch/nn/modules/conv.py:443:0
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py:                      24:0
  return (%16)
```
 after rewrite by using **aten::conv2d**
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0
  return (%16)
```

expected result after replace **aten::_convolution** with  **aten::conv2d**:

```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3,                       %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py                      :24:0
  return (%16)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453

Reviewed By: zdevito

Differential Revision: D31162489

Pulled By: ZolotukhinM

fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49
2021-09-24 11:07:40 -07:00
jiej
127c9402d0 Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137)
Summary:
This reverts commit 03389dc851.

Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745
Fixes the windows build failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137

Reviewed By: seemethere, dzhulgakov, heitorschueroff

Differential Revision: D30994556

Pulled By: malfet

fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d
2021-09-22 04:54:51 -07:00
Chen Lai
880098a7e3 [PyTorch Edge] Backport function for defaults args with out args, flag on (#63651)
Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651

ghstack-source-id: 138539912

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```

Reviewed By: raziel, tugsbayasgalan

Differential Revision: D30454080

fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307
2021-09-20 22:50:30 -07:00
Mengwei Liu
eaf85fad62 [PyTorch] Extract parseOperator() into a standalone source file (#65179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
2021-09-17 13:31:59 -07:00
Jane Xu
1ee66a5278 Remove CUDA 9.2 references conditionals and workarounds (#65070)
Summary:
Title says it all

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070

Reviewed By: malfet

Differential Revision: D30966464

Pulled By: janeyx99

fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8
2021-09-17 12:28:23 -07:00
Eli Uriegas
03389dc851 Revert D30752939: [pytorch][PR] nvfuser update
Test Plan: revert-hammer

Differential Revision:
D30752939 (cfaecaf40b)

Original commit changeset: ce122e80f01b

fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2
2021-09-15 17:38:47 -07:00
jiej
cfaecaf40b nvfuser update (#63745)
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:

- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).

To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.

internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`

updates affecting integration:

1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745

Reviewed By: saketh-are

Differential Revision: D30752939

Pulled By: malfet

fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c
2021-09-15 14:42:55 -07:00
Martin Yuan
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
Elias Ellison
3bf93d769c [JIT] Add gradient check in constants (#64613)
Summary:
fixes internal issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613

Reviewed By: Gamrix

Differential Revision: D30799016

Pulled By: eellison

fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58
2021-09-09 08:13:57 -07:00
Ansley Ussery
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
Chen Lai
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
Kimish Patel
468001600c Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307

Original commit changeset: 0b2aa7c57d08

Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30680354

fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
2021-09-01 13:29:35 -07:00
Kimish Patel
67cb131458 Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer

Differential Revision:
D30327514 (bc9277dca3)

Original commit changeset: 3bb2f2daaaed

fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64
2021-08-31 08:30:36 -07:00
Kimish Patel
bc9277dca3 [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367

This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)

Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30327514

fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
2021-08-30 20:54:51 -07:00
Zhengxu Chen
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
d0c63e857d Enhancement for smart serialization for out schemas (#63096)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30415255

Pulled By: tugsbayasgalan

fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc
2021-08-28 11:46:27 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
19c1b45f25 Detect out argument in the schema (#62755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755

After this change, out argument can be checked by calling is_out()

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30415256

Pulled By: tugsbayasgalan

fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d
2021-08-27 11:20:33 -07:00
Mike Iovine
1385f9fb12 [JIT] Add variadic stack op (#63578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578

Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.

Most of the implementation/tests are the same as `prim::VarConcat`.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`

Reviewed By: navahgar

Differential Revision: D30426232

fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
2021-08-24 08:20:54 -07:00
Mike Iovine
fc6dd0bc00 [JIT] Move UseVariadicCat internals (#63577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577

Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.

Also moved some test utilities that other variadic op tests will likely need.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`

Reviewed By: navahgar

Differential Revision: D30409937

fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9
2021-08-23 17:30:36 -07:00
Don Jang
e7724bb100 [JIT] Set future's error to current exception as is when --torch_jit_enable_rethrow_caught_exception=true (#63348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348

This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.

Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.

Reviewed By: singlaiiit

Differential Revision: D30347782

fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
2021-08-16 17:32:13 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
Kimish Patel
1b04d99f55 [Pytorch Profiler] Introduce scopes to enableProfiler (#62417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417

This diff adds an option to make enableProfiler enable callbacks only
for certain RecordScopes.
Why?
Profiling has some overhead when we repeatedly execute callbacks for
alls copes. On mobile side when we often have small quantized models
this overhead can be large. We observed that by only profiling top level
op and skipping profiling of other atend ops called within we can limit
this overhead. For example, instead of profling at::conv2d -> at::convolution ->
at::convolution_ and further more if ops like transpose etc. are called,
skipping profiling of those. Of course this limits the visibility, but
at the least this way we get a choice.

Test Plan: Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29993659

fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69
2021-08-13 21:40:15 -07:00
Kimish Patel
b00afe135d [Pytorch Profiler] Add debug_handles to KinetoEvent (#62228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228

This diff adds debug handles to events and provides a way to use
RECORD_FUNCTIONs that will pass debug_handles down to profiler, which
will record it in the events.

Why add debug_handles?
For pytorch mobile, with lite interpreter, we generate debug handles
that can be used for lazily symbolicate exception traces to model level
stack trace. Similar to the model level stack trace you get in
TorchScript models. The debug_handles also enable getting module
hierarchy for lite interpreter model, support for which was added to
KinetoProfiler in previous diffs.

Followup plan:
1. Enabled scope callbacks such that lite interpreter can use it to
profiler only top level ops.
2. Enable post processing callbacks that take KinetoEvents and populate
module hierarchy using debug handles.

This will let us use KinetoProfiler for lite interpter use cases on
mobile. Aim is to use RAII guard to similarly generate chrome trace for
mobile usecases as well, although only for top level ops.

Test Plan:
test_misc : RecordDebugHandles.Basic

Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29935899

fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b
2021-08-13 21:40:14 -07:00
Don Jang
61b49c8e41 [JIT] Add a flag to rethrow caught exception in jit interpreter (#63073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073

It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.

This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.

Reviewed By: Krovatkin

Differential Revision: D30241792

fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
2021-08-13 08:44:24 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Nikita Shulga
709ac6853a Fix warnings (#62930)
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons

Found while building locally on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930

Reviewed By: albanD

Differential Revision: D30171981

Pulled By: malfet

fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
2021-08-11 14:07:10 -07:00
Howard Cheng
fa22f6303f [PyTorch] Add flop count for addmm (#61895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895

* Add FLOP count for addmm, should be `2*m*n*k`.

Share the same code path for `addmm` and `mm`.

Test Plan:
Imported from OSS

`python test/test_profiler.py`
Run a sample profile and check that FLOPS for `aten::addmm` is correct.

`[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit`
`[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest*'`

Reviewed By: dskhudia

Differential Revision: D29785671

fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe
2021-08-11 12:33:43 -07:00
Jacob Szwejbka
b746fed164 [Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005

Realized I forgot to move the Runtime half of these functions be within the struct.

Test Plan: ci

Reviewed By: pavithranrao

Differential Revision: D30205521

fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5
2021-08-11 11:15:57 -07:00
Jacob Szwejbka
474d7ec43b [Pytorch Edge] Black Box Compatibility API (#61477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477

It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide.

The general usage would be something like
  < On the Client >
  RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info();

  .
  .
  .
  < On the Server >
  ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>);
  bool compatible = is_compatible(runtime_info, model_info);

Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime).

Test Plan: unit test and ci

Reviewed By: dhruvbird, raziel

Differential Revision: D29624080

fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700
2021-08-03 11:27:28 -07:00
Dhruv Matani
0b3f42fa4f [PyTorch Edge] Add test for lite interpreter operator caching (#62306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306

Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode.

In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details.

ghstack-source-id: 134634613

Test Plan:
Test command:

```
cd fbsource/fbcode/
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs'
```

```
cd fbsource/
buck test xplat/caffe2:test_lite_interpreter
```

Reviewed By: raziel

Differential Revision: D29929116

fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325
2021-07-29 20:14:45 -07:00
Dhruv Matani
0bbdf0e1e3 [PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305

Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space.

Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes!

I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`.

ghstack-source-id: 134634611

Test Plan: New test!

Reviewed By: raziel, JacobSzwejbka, cccclai

Differential Revision: D29954943

fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1
2021-07-29 20:13:06 -07:00
Raghavan Raman
7b6d569a2b [jit] Renamed prim::Concat as prim::VarConcat (#61983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983

Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29828830

Pulled By: navahgar

fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee
2021-07-29 10:28:59 -07:00
Laurence Rouesnel
3bdee2bbed [jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326) (#61980)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29917766

Pulled By: laurencer

fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea
2021-07-27 11:50:20 -07:00
Pavithran Ramachandran
d0f430927b [PyTorch][Edge] Serializing sub modules with same names (#61933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933

### Issue:

SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name.

### Fix:
Mangler creates unique names so that modules and submodules that have same names can be uniquely identified  while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224

### Diff:
The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl`

Will this have backward compatibility?
iseeyuan please feel free to correct or update this.
Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped.
ghstack-source-id: 134242696

Test Plan:
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated
  Total time: 19.5 sec
More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534
Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
```

```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates
Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated
  Total time: 5.0 sec
More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b
Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
```

To check the `bytecode.pkl` using module inspector please check:
N1007089

Reviewed By: iseeyuan

Differential Revision: D29669831

fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647
2021-07-26 16:31:48 -07:00
Kimish Patel
026cfe85b4 Fix InlinedCallStack annotation to account for module calling its own (#61791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791

methods from forward

During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:

```
def forward(self, input):
  x = input + 10
  return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward

Test Plan:
Add test

Imported from OSS

Reviewed By: larryliu0820

Differential Revision: D29745443

fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
2021-07-26 15:00:57 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Vitaly Fedyunin
33db828e52 Revert D29647586: [jit] Renamed prim::Concat as prim::VarConcat
Test Plan: revert-hammer

Differential Revision:
D29647586 (db11619901)

Original commit changeset: cdd34ea5a3c9

fbshipit-source-id: bab5ac4ed67a00ac151fe39463aa3fb56897d7f4
2021-07-21 08:28:26 -07:00
Raghavan Raman
db11619901 [jit] Renamed prim::Concat as prim::VarConcat (#61498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61498

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29647586

Pulled By: navahgar

fbshipit-source-id: cdd34ea5a3c986350a813be17e7d428844ea4cbf
2021-07-20 19:30:00 -07:00
Raghavan Raman
429908e540 [jit] Updated the concat common inputs elimination pass to use the variadic cat op instead of aten::cat (#60908)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60908

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29441865

Pulled By: navahgar

fbshipit-source-id: 2ab08168102eff1f43667ca418bdd94bb2df562a
2021-07-20 19:29:57 -07:00
Raghavan Raman
53668f8bf6 [jit] Added an API to remove list mutations and replace with variadic cat until fixed point (#60776)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60776

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29406099

Pulled By: navahgar

fbshipit-source-id: e2e69eb6ebff3bc6e25d80f46ce118e52f557fb6
2021-07-20 19:29:55 -07:00
Raghavan Raman
4dd04a8bbe [jit] Handled cases when input list to cat is mutated after cat using AliasDb (#60774)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60774

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29406100

Pulled By: navahgar

fbshipit-source-id: af6afca65881c18c51b482eb63898a0f1c94d591
2021-07-20 19:28:42 -07:00
Raghavan Raman
593e8f41ca [jit] Fixed a bug in the pass that replaces cat with the variadic op (#61795)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61795

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29748785

Pulled By: navahgar

fbshipit-source-id: df5b84c35f007718c92a21a0b44a231e6d346918
2021-07-18 21:38:30 -07:00
Nikita Shulga
ee2f2ec9a5 Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test
Test Plan: revert-hammer

Differential Revision:
D29687143 (5798a00aa4)

Original commit changeset: 9ba9e57f7f85

fbshipit-source-id: 6a672c76a04366b35c492698ae5b39fd4dd1785f
2021-07-16 13:32:51 -07:00
Amy He
5798a00aa4 [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test (#61594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594

### Summary:
Added a unit test for the Nnapi delegate's preprocess() function. The
function was previously tested locally, but now a basic test is
added for OSS.

See https://github.com/pytorch/pytorch/pull/61499 for preprocess
implementation. See D29647123 for local testing.

**TODO:**
Add more comprehensive tests.
Add tests for model execution, after the Nnapi delegate's initialization
and execution is implemented T91991928.

**CMakeLists.txt:**
Added a library for the Nnapi delegate
- Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind

**test_backends.py:**
Added a test for lowering to Nnapi
- Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py
- Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32

### Test Plan:
Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally.

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29687143

fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802
2021-07-16 11:00:38 -07:00
Martin Yuan
d8c3d555e4 [Delegate] Support composite of lowered sub modules of the same backend (#59921)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D29091143

Pulled By: iseeyuan

fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc
2021-06-25 07:18:32 -07:00
Raghavan Raman
d3a8505ee1 [jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881

This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29277876

Pulled By: navahgar

fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41
2021-06-24 01:27:41 -07:00