Commit Graph

87 Commits

Author SHA1 Message Date
Bo Wu
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
Shunting Zhang
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
David Berard
aa9fbb9ae9 [JIT] check stack size after calling operator (#68788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788

In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema)

Test Plan:
Run this in debug mode and verify that it doesn't throw an assert
```
import torch

class Thing(torch.nn.Module):
    torch.jit.export
    def en(self, x: torch.Tensor):
        return torch.add(x, 2.0)

    def forward(self, x: torch.Tensor, y: torch.Tensor):
        a = torch.mm(x, y)
        b = torch.nn.functional.gelu(a)
        c = self.en(b)
        return c.std_mean()

if __name__ == '__main__':
    unsc = Thing()
    thing = torch.jit.script(unsc)
    x = torch.randn(4, 4)
    y = torch.randn(4, 4)
    std, mean = thing.forward(x, y)
    print(std, mean)
    print(str(thing.forward.graph))
```

Reviewed By: gchanan

Differential Revision: D32625256

Pulled By: davidberard98

fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5
2021-12-07 11:43:50 -08:00
Scott Wolchok
3e45739543 [PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326

Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local.
ghstack-source-id: 144641680

Test Plan:
{F684148304}

CI

Reviewed By: zhxchen17

Differential Revision: D32812841

fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0
2021-12-03 11:09:05 -08:00
Scott Wolchok
2c84b010e6 [PyTorch] Use toObjectRef in JIT interpreter (#69324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324

This slightly shrinks runImpl.

Before:
- Move pointer out of IValue
- Clear the IValue to none
- Do our thing with the Object
- destroy the intrusive_ptr on the C stack
- destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway)

After:
- Grab the pointer out of IValue
- Do our thing with the Object
- Decref the pointer in the IValue on the JIT stack as we assign over it

We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well.
ghstack-source-id: 144638920

Test Plan:
Inspected assembly to verify shorter runImpl

Tried to microbenchmark (D32809454) but can't show a difference.

Reviewed By: gchanan

Differential Revision: D32812252

fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5
2021-12-03 11:07:16 -08:00
Han Qi
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
Scott Wolchok
7cd62621fb [PyTorch] Adopt faster Tuple::create (#65381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381

The previous diff adds a way to make Tuples of size 3 or less
more efficiently. This diff makes it easier to hit that path and
updates a bunch of callsites to hit it.
ghstack-source-id: 142065832

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D31069538

fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628
2021-11-02 10:10:31 -07:00
Zhengxu Chen
5ef62c88a9 [jit] Replace get_executor() with call() in abstract Function interface. (#65969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969

ghstack-source-id: 141759210

Test Plan: no behavior change.

Reviewed By: anjali411

Differential Revision: D31326151

fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4
2021-10-28 13:11:29 -07:00
Giuseppe Ottaviano
72803dbcfd [caffe2] Fix invalid vector accesses and polar() call (#66757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757

`InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future.

Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++).

Test Plan: JIT tests pass.

Reviewed By: zhxchen17

Differential Revision: D31715587

fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33
2021-10-19 00:29:54 -07:00
Chen Lai
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
Zhengxu Chen
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
Don Jang
e7724bb100 [JIT] Set future's error to current exception as is when --torch_jit_enable_rethrow_caught_exception=true (#63348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348

This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.

Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.

Reviewed By: singlaiiit

Differential Revision: D30347782

fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
2021-08-16 17:32:13 -07:00
Kimish Patel
54f2eb6e7e [Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792

KinetoEvent

This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.

With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.

Test Plan:
test_profiler

Imported from OSS

Reviewed By: raziel, ilia-cher

Differential Revision: D29745442

fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
2021-08-13 21:39:10 -07:00
Don Jang
61b49c8e41 [JIT] Add a flag to rethrow caught exception in jit interpreter (#63073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073

It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.

This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.

Reviewed By: Krovatkin

Differential Revision: D30241792

fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
2021-08-13 08:44:24 -07:00
Richard Barnes
4fdb9579fa irange-ify 12 (#62120)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879713

fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136
2021-08-09 15:31:51 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Zhengxu Chen
6643df2680 [jit] Use computed loop to dispatch to next instruction in interpreter. (#60211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60211

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29211283

fbshipit-source-id: 2f87b5a78d4fc00ce11ed509fc15db35332690b6
2021-06-30 17:44:26 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Zhengxu Chen
2b0ec9c3cf Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783

This reverts commit fc804b5def.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28617037

Pulled By: zhxchen17

fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14
2021-05-24 18:23:21 -07:00
Edward Yang
fc804b5def Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles.
Test Plan: revert-hammer

Differential Revision:
D28133579 (034a238bab)

Original commit changeset: e7e30e961513

fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e
2021-05-21 08:18:40 -07:00
Zhengxu Chen
034a238bab [jit] Implement ScriptProfile to collect instruction profiles. (#57397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397

Introduces two main classes in C++ runtime:

ScriptProfile is the implementation for enalbing and disabling interpreter
profiling in C++. This should be only used from Python, and we will add
corresponding Python API in the next diff.

InstructionSpan is a utility class to instrument execution of each single
instruction. A start timestamp is recorded in the consturctor, and an end
timestamp is recorded in the destructor. During destruction, this will send
runtime data to all enabled ScriptProfile instances.

Test Plan:
build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic'

Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28133579

fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b
2021-05-20 14:11:03 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
fc9c486044 Add enabling default instructions flag for mobile (#57778)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57778

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D28268997

Pulled By: tugsbayasgalan

fbshipit-source-id: 5571b233d03d3aa80c820ee4245b4d0d3b70f924
2021-05-10 17:26:05 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b0c27b44cf Enable backward/forward compatibility for TS runtime (#57498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57498

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28162448

Pulled By: tugsbayasgalan

fbshipit-source-id: 5c21ced42a22aca7cee089e876e9d98d32f68955
2021-05-07 15:41:45 -07:00
Luca Wehrstedt
36e47af58b Pass reference to parent future in callbacks (#57635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635

Note: this PR looks massive, but it's just one simple change, codemodded many times.

In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580

Test Plan: CI

Reviewed By: wanchaol

Differential Revision: D28178783

fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
2021-05-07 03:59:18 -07:00
Zhengxu Chen
8b38458011 [jit] Break interpreter.cpp into smaller files. (#56546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56546

A code move for CodeImpl and Frame to a subdirectory runtime/interpreter, so
that it's easier to reuse them and navigate the interpreter code.

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D28133580

fbshipit-source-id: 8de89a4e8e637836625e1ac1db95f0a3353da670
2021-05-06 16:43:57 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Tugsbayasgalan Manlaibaatar
2041cd6707 Enable forward/backward compatibility in TS mobile (#56079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27828149

Pulled By: tugsbayasgalan

fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da
2021-04-23 16:55:18 -07:00
Tugsbayasgalan Manlaibaatar
6de1d9b2d0 Fix bug in emitUse to drop all values that are marked as drop (#56652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56652

Previous code doesn't drop prim::Constant values even when they are marked as drop.

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27927413

fbshipit-source-id: 67cd52cf292e111be2830ccf93b0e7b089e49001
2021-04-23 12:42:51 -07:00
Mike Ruberry
c0ac0fef4e Revert D27448156: irange for size_t
Test Plan: revert-hammer

Differential Revision:
D27448156 (041b4431b2)

Original commit changeset: 585da57d4de9

fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365
2021-04-03 19:14:00 -07:00
Richard Barnes
041b4431b2 irange for size_t (#55163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27448156

fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1
2021-04-02 23:22:29 -07:00
Edward Yang
e70f3d1189 Nasty little hack to preserve NotImplementedError raised in interpreter (#54627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54627

This is the simplest little fix to get interpreter to preserve
NotImplementedError, so that the test suite doesn't start choking
on meta tensors not working in interpreter.  It is sound and correct
but doesn't work for other c10::Error subclasses with special handling.
A more proper fix is requested at
https://github.com/pytorch/pytorch/issues/54612

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: wenleix, ngimel

Differential Revision: D27328666

Pulled By: ezyang

fbshipit-source-id: 483bef062de5a907d20e2d9e25eafe2d5197cf8d
2021-03-27 11:53:06 -07:00
Scott Wolchok
3959d393b8 [PyTorch][JIT] Less shared_ptr use in dictConstruct (#54110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110

dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals.
ghstack-source-id: 124150642

Test Plan: fitsships

Reviewed By: ezyang

Differential Revision: D27101782

fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a
2021-03-22 18:31:27 -07:00
Scott Wolchok
4a24c552cc [PyTorch] Fix string copy in WARN path for both interpreters (#54076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076

If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around.
ghstack-source-id: 124040891

Test Plan:
existing tests

spot-checked regular interpreter assembly; seems better

Reviewed By: dhruvbird, walterddr

Differential Revision: D27087204

fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b
2021-03-17 08:44:08 -07:00
Scott Wolchok
665d5e2a4f [PyTorch][JIT] Audit interpreter for extra copies (#54029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54029

I found what appear to be some missed moves and/or extra copies in the JIT interpreter.
ghstack-source-id: 123958682

Test Plan:
Existing CI for correctness

Ran AdIndexer inline_cvr local_ro model benchmark with static_runtime off via
`env bin=/tmp/ptvsc2_predictor_bench.StaticDispatchModeFile static_runtime=0 caffe2=0 scripts/swolchok/static_runtime/inline_cvr/run_local_ro.sh`

before:
```
I0315 14:25:23.916893 3075680 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01635. Iters per second: 983.914
I0315 14:26:05.536207 3080560 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01689. Iters per second: 983.395
I0315 14:26:47.510561 3083335 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.02697. Iters per second: 973.737
I0315 14:27:29.024830 3086767 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01326. Iters per second: 986.918
I0315 14:28:10.849496 3091323 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.023. Iters per second: 977.517
```

after:
```
I0315 14:17:43.280469 3046242 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.997838. Iters per second: 1002.17
I0315 14:18:24.244606 3046861 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00173. Iters per second: 998.269
I0315 14:19:05.208899 3051998 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00187. Iters per second: 998.136
I0315 14:19:46.103854 3055392 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00073. Iters per second: 999.27
I0315 14:20:27.011411 3056062 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.999121. Iters per second: 1000.88
```

(This was just a convenient workload I had handy; the plan of record is to use static runtime for inline_cvr inference AIUI.)

Reviewed By: dhruvbird, walterddr

Differential Revision: D27060762

fbshipit-source-id: 5567206d7c2d9ae99776ce5524caf09ec2035e87
2021-03-16 15:09:09 -07:00
jiej
4d94ee566e Ge v1 (#52136)
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136

Reviewed By: pbelevich

Differential Revision: D26693978

Pulled By: Krovatkin

fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
2021-02-28 00:53:13 -08:00
jiej
dd1c2a06b7 refactor profiling optional (#47667)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667

Test Plan: Imported from OSS

Reviewed By: anjali411, ngimel

Differential Revision: D25255572

Pulled By: Krovatkin

fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f
2021-01-22 14:45:28 -08:00
Scott Wolchok
4a0d17ba2d [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228

`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837374

fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
2021-01-13 16:13:55 -08:00
Thomas Viehmann
ea087e2d92 JIT: guard DifferentiableGraph node (#49433)
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.

Fixes https://github.com/pytorch/pytorch/issues/49299

I still need to look into a handful of failing tests, but maybe it can be a discussion basis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433

Reviewed By: ngimel

Differential Revision: D25681374

Pulled By: Krovatkin

fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
2021-01-08 20:01:27 -08:00
Scott Wolchok
ef1fa547ba [PyTorch] Use expectRef() when calling listConstruct (#50062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50062

Avoids creating an extra shared_ptr.
ghstack-source-id: 119325645

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25766631

fbshipit-source-id: f2ab8349dfea325054820fa2c1055180c740574e
2021-01-06 18:13:38 -08:00
Scott Wolchok
480a756194 [PyTorch] IValue::toTensor can now return const Tensor& (#48868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48868

Building on the previous diff, we can make `toTensor()` return a
`const Tensor&`, which should make it easier to avoid reference
counting.
ghstack-source-id: 119327372

Test Plan: internal benchmarks.

Reviewed By: bwasti

Differential Revision: D25325379

fbshipit-source-id: ca699632901691bcee432f595f75b0a4416d55dd
2021-01-06 08:40:50 -08:00
Yanan Cao
7518f54611 Add flag torch_jit_disable_warning_prints to allow disabling all warnings.warn (#49313)
Summary:
Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn.

This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49313

Reviewed By: SplitInfinity

Differential Revision: D25534274

Pulled By: gmagogsfm

fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2
2020-12-15 15:22:41 -08:00
Ilia Cherniavskii
db5e5b439c Extra sampling of record function events [resend] (#49114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49114

resend of https://github.com/pytorch/pytorch/pull/48289

Test Plan: see 48289

Reviewed By: robieta

Differential Revision: D25443365

Pulled By: ilia-cher

fbshipit-source-id: c15ac312222bb4d744e10199ed79801cccae8227
2020-12-11 12:53:37 -08:00
Bram Wasti
f4226b5c90 [static runtime] add static subgraph fusion pass (#49185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185

This diff adds a fusion feature that will let us use static runtime for *parts* of the graph.  This will prove useful in cases where fully eliminating control flow is hard etc.

TODO:
[x] factor out into separate fusion file
[x] add python test case
[x] add graph that isn't fully lowered test case
[x] add graph that has weird list/tuple outputs test case

the loop example looks quite good:
```
graph(%a.1 : Tensor,
      %b.1 : Tensor,
      %iters.1 : int):
  %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4
  %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1)
  %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4
    block0(%i : int, %c.12 : Tensor):
      %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1)
      -> (%12, %c.10)
  return (%c)
with prim::StaticSubgraph_0 = graph(%0 : Tensor,
      %4 : Tensor):
  %5 : int = prim::Constant[value=2]()
  %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12
  %2 : int = prim::Constant[value=1]()
  %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8
  return (%c.2)
with prim::StaticSubgraph_1 = graph(%1 : Tensor,
      %7 : Tensor,
      %8 : Tensor):
  %9 : int = prim::Constant[value=1]()
  %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12
  %5 : int = prim::Constant[value=2]()
  %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8
  %2 : int = prim::Constant[value=1]()
  %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8
  return (%c.10)
```

(Note: this ignores all push blocking failures!)

Test Plan:
buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest

buck test mode/no-gpu caffe2/test:static_runtime

Reviewed By: bertmaher

Differential Revision: D25385702

fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098
2020-12-10 14:03:11 -08:00
Elias Ellison
70853c5021 Dont use symbolic shapes check (#47810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47810

`bindSymbolicShapes` wasn't checking device or dtype at all, so it wasn't correct. It also isn't being used anywhere (num_profiles is always 1 and we don't use symbolic shapes). We shouldn't have it on until we are actually using symoblic shapes.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25286214

Pulled By: eellison

fbshipit-source-id: 10fb175d0c75bd0159fb63aafc3b59cc5fd6c5af
2020-12-10 12:14:58 -08:00
jiej
a6fa3b2682 adding profile_ivalue (#47666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25255573

Pulled By: Krovatkin

fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638
2020-12-09 15:29:15 -08:00
Mike Ruberry
9f7fb54693 Revert D25111515: Extra sampling of record function events
Test Plan: revert-hammer

Differential Revision:
D25111515 (09b974c2d5)

Original commit changeset: 0d572a3636fe

fbshipit-source-id: d558d8052924d937d86db7dd40dc6388e6d28823
2020-12-09 08:37:17 -08:00
Ilia Cherniavskii
09b974c2d5 Extra sampling of record function events (#48289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48289

Adding extra sampling step when dispatching RecordFunction.

(Note: this ignores all push blocking failures!)

Reviewed By: swolchok

Differential Revision: D25111515

Pulled By: ilia-cher

fbshipit-source-id: 0d572a3636fe649a47ec47901826bbfc08368937
2020-12-09 02:29:13 -08:00
Chen Lai
416dc68341 [Pytorch][Annotation] Update inlined callstack with module instance info (#47416)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47416

Test Plan: Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D24752846

Pulled By: cccclai

fbshipit-source-id: 94d3c18c56161d1de3a16bb7c93502fedf71644c
2020-12-03 10:44:46 -08:00
Meghan Lele
fc1153a8be [JIT] Fix clang-tidy warnings in jit/runtime (#47992)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47992

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25258645

Pulled By: SplitInfinity

fbshipit-source-id: b3e4576400c101b247e80cb4044fc04471f39a47
2020-12-02 12:35:42 -08:00
Scott Wolchok
3ceec73db9 [PyTorch] Lazily construct guts of RecordFunction (#47550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47550

I saw over 5% time spent in RecordFunction's ctor during one
of our framework overhead benchmarks in `perf`. Inspecting assembly,
it looks like we just create a lot of RecordFunctions and the
constructor has to initialize a relatively large number of member
variables.

This diff takes advantage of the observation that RecordFunction does
nothing most of the time by moving its state onto the heap and only
allocating it if needed. It does add the requirement that profiling is
actually active to use RecordFunction accessors, which I hope won't be
a problem.
ghstack-source-id: 117498489

Test Plan: Run framework overhead benchmarks. Savings ranging from 3% (InPlace_ndim_1) to 7.5% (empty_ndim_3) wall time.

Reviewed By: ilia-cher

Differential Revision: D24812213

fbshipit-source-id: 823a1e2ca573d9a8d7c5b7bb3972987faaacd11a
2020-12-01 13:07:17 -08:00