Commit Graph

54 Commits

Author SHA1 Message Date
Scott Wolchok
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
Dhruv Matani
64caee1356 [PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131

Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support.
ghstack-source-id: 140045478

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: kimishpatel

Differential Revision: D31387784

fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143
2021-10-07 20:01:18 -07:00
Kimish Patel
468001600c Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307

Original commit changeset: 0b2aa7c57d08

Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30680354

fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
2021-09-01 13:29:35 -07:00
Kimish Patel
67cb131458 Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer

Differential Revision:
D30327514 (bc9277dca3)

Original commit changeset: 3bb2f2daaaed

fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64
2021-08-31 08:30:36 -07:00
Kimish Patel
bc9277dca3 [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367

This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)

Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30327514

fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
2021-08-30 20:54:51 -07:00
Chen Lai
6bb68ba507 Fix interpreter debug logging message (#63499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499

https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message.
ghstack-source-id: 136184053

Test Plan: Uncomment and it works

Reviewed By: kimishpatel, raziel

Differential Revision: D30390699

fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4
2021-08-19 02:14:13 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
Kimish Patel
77a6436cac [Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418

Debug handles have one to one correspondence with instruction, so just
combine them in one.

Test Plan:
CI

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993661

fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07
2021-08-13 21:40:17 -07:00
Richard Barnes
b5867a1b34 irange-ify 7 (#62117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62117

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879640

fbshipit-source-id: 189578a57301747a3421742e145bbcdf2ad75c49
2021-07-28 13:30:39 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Kimish Patel
d6d726f781 [Pytorch Backend delegation] Add api for backend lowering to query debug (#55462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462

handles and symbolicate exception callstack thrown from backend.

Objective of this diff is to achieve improve error reporting when
exceptions are raised from lowered backend. We would effectively like to
get the same model level stack trace that you would get without having
lowered some module to backend.

For example:
```
class AA(nn.Module):
  def forward(self, x, y):
    return x + y

class A(nn.Module):
  def __init__(...):
    self.AA0 = AA()
  def forward(self, x, y):
    return self.AA0.forward(x, y) + 3

class B(nn.Module):
  def forward(self, x):
    return x + 2

class C(nn.Module):
  def __init__(...):
    self.A0 = A()
    self.B0 = B()
  def forward(self, x, y):
    return self.A0.forward(x, y) + self.B0.forward(x)
```
If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we
will likely see error stack like:
```
C++ exception with description "The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```

We would like to see the same error stack if we lowered C.A0 to some
backend.

With this diff we get something like:
```
  Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA)
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 5, in FunctionName_UNKNOWN
                typed_inputs: List[Any] = [x, y, ]
                if self.__backend.is_available() :
                  _0, = self.__backend.execute(self.__handles["forward"], typed_inputs)
                        ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                  assert isinstance(_0, Tensor)
                  return _0
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```
This is achieved in 3 parts:
Part 1:
A. BackendDebugInfoRecorder:
   During backend lowering, in `to_backend`, before calling the preprocess
   function corresponding to the backend. This will facilitate recording of
   debug info (such as source range + inlined callstack) for the lowered module.
B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder.
   This initializes thread local pointer to BackendDebugInfoRecorder.
C. generate_debug_handles:
   In preprocess function, the backend will call generate_debug_handles
   for each method being lowered separately. generate_debug_handles
   takes `Graph` of the method being lowered and returns a map
   of Node*-to-debug_handles. Backend is responsible for storing debug
   handles appropriately so as to raise exception (and later profiling)
   using debug handles when the exception being raised corresponds to
   particular Node that was lowered.
   Inside generate_debug_handles, we will query the current
   BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug
   handle manager will issue debug handles as well as record
   debug_handles-to-<source range, inlined callstack> map.
D. Back in `to_backend`, once the preprocess function is has finished
   lowering the module, we will call `stopRecord` on
   BackendDebugInfoRecorder. This will return the debug info map. This
   debug info is then stored inside the lowered module.

Part 2:
Serialization:
During serialization for bytecode (lite interpreter), we will do two
things:
1. Extract all the source ranges that are contained inside
debug_handles-to-<source range, inlined callstack> map for lowered
module. This will be source range corresponding to debug handles,
including what is there is inlined callstack. Since we replaced original
module with lowered module, we wont be serializing code for the original
module and thus no source range. That is why the source range will have
to be stored separately. We will lump all the source ranges for all the
lowered modules in one single debug_pkl file.
2. Then we will serialize debug_handles-to-<source range, inlined
callstack> map.

Now during deserialization we will be able to reconstruct
debug_handles-to-<source range, inlined callstack> map. Given all
debug_handles are unique we would not need any module information.

Test Plan:
Tests are added in test_backend.cpp

Tests are added in test_backend.cpp

Imported from OSS

Differential Revision:
D27621330
D27621330

Reviewed By: raziel

Pulled By: kimishpatel

fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890
2021-05-22 08:33:07 -07:00
Kimish Patel
f4a921600a [PyTorch, Mobile] Serialization format change for source range (#54284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284

In order to bring mobile deployment, via lite interpreter, on feature
parity with JIT, with respect model level debug information we must make
model level debug information available to mobile runtime.
At the moment, model level debug information is stored in SourceRange
which associates node's of graph to where the come from in original
python source code.
This information is serialized as part of debug_pkl and deserialized
when JIT loads the model and reads the model code.
On lite interpreter, we do not have access to all the functionality of
JIT and hence we cannot load model in the same way as JIT, by reading
code, constructing module hierarchy and graph corresponding module
methods etc. Instead in, lite interpreter, only bytecode corresonding to
the compiled graph, Code, is saved.
Thus in order to annotate OPs in the bytecode with equivalent
SourceRange information we do the following:
1. During model serialization, we create a unique tag for each source
range of the model.
2. Create a map of <SourceRange, tag>
3. During debug_pkl serialization we save tag along with SourceRange, on
top of byte offset.
4. During bytecode generation, the methods of the top module are
lowered. During this process methods are inlined. In the inlined graph,
when the node of a graph is lowered to bytecode, we query node's source
range and look it up against the map.
5. Resulting source range tag is serialized in module_debug_info.
6. During model deserialization, we read all the debug_pkl records in
the archieve and create a map of <tag, SourceRange>
7. This map can be used to find source code information.

During mobile runtime:
1. We read all the debug_pkl records and create <tag=debug_handle,
SourceRange> map.
   1.1 This map, MobileDebugInfo, is a member of mobile Module.
2. Interpreter catches appropriate exceptions and sets the thread local
debug handle and rethrows the exception.
3. In Function's run method we catch exception and query current debug
handle where the exception happened.
4. Query MobileDebugInfo with debug handle to retrieve source range and
augment error with source range info.

This information is still incomplete as it does not contain entire
callstack.

In the following diffs we will serialize InlinedCallStack directly.

Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro,
so that mobile builds can avoid building MobileDebugInfo, source range
and source range pickler/unpickler. Later we will add path where, if
building without debug support stack trace will contain only debug
handles. They can be symbolicated later.

Test Plan:
Ported bunch of source range tests from test_jit.py. Added on more test
in test_lite_interpreter.py

Imported from OSS

Reviewed By: raziel

Differential Revision: D27174722

fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12
2021-05-04 09:19:27 -07:00
Scott Wolchok
b87d3fa432 [PyTorch][jit] Don't allow create() on singleton types (#56807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807

If I understand correctly, there's no reason to create your own instance of these global singleton types.
ghstack-source-id: 127312270

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D27973447

fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912
2021-04-30 10:28:50 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Scott Wolchok
3959d393b8 [PyTorch][JIT] Less shared_ptr use in dictConstruct (#54110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110

dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals.
ghstack-source-id: 124150642

Test Plan: fitsships

Reviewed By: ezyang

Differential Revision: D27101782

fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a
2021-03-22 18:31:27 -07:00
Scott Wolchok
4a24c552cc [PyTorch] Fix string copy in WARN path for both interpreters (#54076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076

If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around.
ghstack-source-id: 124040891

Test Plan:
existing tests

spot-checked regular interpreter assembly; seems better

Reviewed By: dhruvbird, walterddr

Differential Revision: D27087204

fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b
2021-03-17 08:44:08 -07:00
Scott Wolchok
8f1af02f35 [PyTorch][mobile] Audit mobile interpreter for extra copies (#54031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54031

Similar to D27060762 (665d5e2a4f), caught some probably-unintended copies.
ghstack-source-id: 124040889

Test Plan: CI?

Reviewed By: walterddr, iseeyuan

Differential Revision: D27061818

fbshipit-source-id: f4a77cb5c21cd3ebce7b7e82764e4361467bab91
2021-03-17 08:42:34 -07:00
Dhruv Matani
17495e0318 [PyTorch Mobile] Fix case when error messages are stripped, and stack value isn't popped off in lite-interpreter (#53201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53201

This resulted in [S22350](https://www.internalfb.com/intern/sevmanager/view/s/223540), which caused truoble on Android.

1. The Python has a call to `warnings.warn()`, which resulted in code generated to emit the `WARN` instruction on lite-interpreter.
2. The code for handling that instruction/op-code popped off the value in a call to the `TORCH_WARN()` *macro*.
3. This macro conditionally compiled out evaluation of the arguments if `STRIP_ERROR_MESSAGES` was defined, which resulted in the stack not getting popped, and the lite-interpreter returning the last pushed value on to the stack.

I've attempted to re-produce it using this python code: {P243842428}
ghstack-source-id: 122990001

(Note: this ignores all push blocking failures!)

Test Plan:
Created a new unit test to re-produce the failure in the test. Was able to do so locally using the following command:

```
buck test -c pt.strip_error_messages=1 //xplat/caffe2:test_s223540
```

However, since `pt.strip_error_messages=0` for dev and continuous builds, I have had to check in a separate contbuild config to try and trigger this failure on contbuild.

Reviewed By: iseeyuan

Differential Revision: D26765662

fbshipit-source-id: 63c3c96d84ce6a9e5471f13d80165aa3718be9a2
2021-03-04 19:10:07 -08:00
Martin Yuan
b5ae8e69a7 [Lite Interpreter] Support features from to_backend (#52870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52870

Add the missing parts to support to_backend modules by lite interpreter.
1. Add ISINSTANCE instruction support, which is used in to_backend for output type check.
2. Bypass lite interpreter's type parser by checking the qualified name. If it starts with "torch.jit", use the same type resolver as nn module (starting with "__torch__").

Tests
Mobile module is serialized and loaded in ```BackendTest.TestCompiler```. The results are compared to those from original torchscript module.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D26715351

Pulled By: iseeyuan

fbshipit-source-id: ad9d74ee81c6aa692ab9e5dd7a9003bae5d4f01f
2021-03-01 17:56:01 -08:00
Martin Yuan
23c50a4a50 [PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432

ghstack-source-id: 120976584

torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.

CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.

This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26168913

fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
2021-02-03 21:57:19 -08:00
Andres Suarez
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
Scott Wolchok
ef1fa547ba [PyTorch] Use expectRef() when calling listConstruct (#50062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50062

Avoids creating an extra shared_ptr.
ghstack-source-id: 119325645

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25766631

fbshipit-source-id: f2ab8349dfea325054820fa2c1055180c740574e
2021-01-06 18:13:38 -08:00
Kimish Patel
4d26941a9b Fix lite interpreter record function issue. (#47457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47457

This fixes two issues.
1. lite interpreter record_function is intended to be used only for root op
profiling. At the moment if RECORD_FUNCTION is enabled via Dispatcher then it
logs not just root ops but all ops.
2. Because interpreter sets op index that later gets picked up elsewhere
(decoupled design), op index that is set in lite interpreter ends up getting
used by all the record function calls not just root op. Thus we dont really get
correct per op profiling. This diff also fixes this issue.

Reviewed By: ilia-cher

Differential Revision: D24763689

fbshipit-source-id: 6c1f8bcaec9fb5ebacb2743a5dcf7090ceb176b9
2020-12-02 11:24:45 -08:00
Scott Wolchok
bef460a803 [PyTorch] Return raw ptr from ThreadLocalDebugInfo::get() (#47796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47796

`ThreadLocalDebugInfo::get()` is a hot function. For example, it is called by `DefaultCPUAllocator::allocate()`. Most callers do not even bother to keep the returned `shared_ptr` around, proving that they have no lifetime issues currently. For the rest, it appears that the only way that the returned pointer could become invalid is if they then called a function that swapped out `ThreadLocalDebugInfo` using `ThreadLocalStateGuard`. There are very few such paths, and it doesn't look like any current callers of `ThreadLocalDebugInfo::get()` needed a `shared_ptr` at all.
ghstack-source-id: 116979577

Test Plan:
1) reviewers to double-check audit of safety
2) run framework overhead benchmarks

Reviewed By: dzhulgakov

Differential Revision: D24902978

fbshipit-source-id: d684737cc2568534cac7cd3fb8d623b971c2fd28
2020-11-18 20:37:17 -08:00
Linbin Yu
b28422d444 add overload name for str cmp (#39607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39607

add overload name for strcmp macro to prevent duplicated op names in lite interpreter

also reformatted some other files

Test Plan:
verified these op schema are changed

```
-aten::eq(str a, str b) -> (bool)
+aten::eq.str(str a, str b) -> (bool)

-aten::ne(str a, str b) -> (bool)
+aten::ne.str(str a, str b) -> (bool)

-aten::lt(str a, str b) -> (bool)
+aten::lt.str(str a, str b) -> (bool)

-aten::gt(str a, str b) -> (bool)
+aten::gt.str(str a, str b) -> (bool)

-aten::le(str a, str b) -> (bool)
+aten::le.str(str a, str b) -> (bool)

-aten::ge(str a, str b) -> (bool)
+aten::ge.str(str a, str b) -> (bool)
```

Reviewed By: iseeyuan

Differential Revision: D21913049

fbshipit-source-id: 518db068c8c5b0efd19223f0bd94fc3351335dc4
2020-06-06 23:21:35 -07:00
Xingying Cheng
b08a4aaf3b [PyTorch] Fix operator perf observer index issue.
Summary: Fix operator perf observer index issue.

Test Plan:
make sure that the operator index is populated correctly, ran benchmarking for pytext_mobile_inference, see result:
https://www.internalfb.com/intern/aibench/details/598900068317693

Reviewed By: linbinyu

Differential Revision: D21779222

fbshipit-source-id: 0fc3561d83d10cfabd73e1e6b6ee240ce0bafd80
2020-05-28 21:52:24 -07:00
Xingying Cheng
262f70c986 [PyTorch] Remove module and operator observer macros. (#38489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38489

Remove module and operator observer macros.
ghstack-source-id: 104290763

Test Plan:
a. Verify that QPL is being sent while testing FB4A BI Cloaking:

{F236982877}

b. Verify that AI Benchmark is working on both module and operator level:
https://our.intern.facebook.com/intern/aibench/details/808056762618979

c. Verify that macosx segmentation effect by running buck run xplat/arfx/tracking/segmentation/tools:person_segmentation_demoAppleMac#macosx-x86_64:

{F236982853}

Reviewed By: ljk53

Differential Revision: D21540838

fbshipit-source-id: 516f84ef5673d4ceed38ae152440a5cbacc6ddaa
2020-05-18 13:28:01 -07:00
Ilia Cherniavskii
43dd8760d7 Move ThreadLocalDebugInfo to c10 (#37774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774

Move ThreadLocalDebugInfo from ATen to C10

Test Plan: Imported from OSS

Differential Revision: D21384249

Pulled By: ilia-cher

fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2
2020-05-11 19:27:41 -07:00
Ilia Cherniavskii
b4946b96c6 Don't use Profiler key in lite interpreter (#37962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37962

Temporarily re-enable RecordFunction in lite interpreter when profiler key is not set,
this allows the profiler to work without profiled wrappers in the build

Test Plan: CI

Reviewed By: smessmer, linbinyu

Differential Revision: D21409120

fbshipit-source-id: 6f0311c8eb55537a03b8bdac69def18a496ec672
2020-05-08 10:47:10 -07:00
Ilia Cherniavskii
2d708cefcc Move RecordFunction into ATen (#37548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548

Moving RecordFunction from torch::autograd::profiler into at namespace

Test Plan:
CI

Imported from OSS

Differential Revision: D21315852

fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
2020-05-07 14:52:39 -07:00
Michael Suo
b53e6bfd49 [jit] normalize getMethod (#37472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472

Our convention is for `findX` to return an optional version and `getX`
to assert that the X is there. Fix up `getMethod` to be consistent with
this convention.

Test Plan: Imported from OSS

Differential Revision: D21297543

Pulled By: suo

fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d
2020-05-06 15:22:25 -07:00
Xiang Gao
3880f14b64 Canonicalize includes in torch, and add tests for it (#36303)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36303

Test Plan: Imported from OSS

Differential Revision: D20943003

Pulled By: ezyang

fbshipit-source-id: 81fcbaccc1a7eec422bd8347d196bb66a5467884
2020-04-23 08:09:21 -07:00
Martin Yuan
f999d600d0 Fix the typo in operator name string (#36296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36296

When there's no overload name, the operator name string should be "name", instead of "name.".

Test Plan: Imported from OSS

Differential Revision: D20966759

Pulled By: iseeyuan

fbshipit-source-id: b4b31923c7ec5cdca8ac919bd6a84ba51afb6cd1
2020-04-10 12:56:16 -07:00
Martin Yuan
82087ee7f6 Add DICT_CONSTRUCT and NAMED_TUPLE_CONSTRUCT to lite interpreter (#36015)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36015

Test Plan: Imported from OSS

Reviewed By: linbinyu

Differential Revision: D20853995

Pulled By: iseeyuan

fbshipit-source-id: 153f76d223f9ffc71e2259b741a7e5d78ae63f22
2020-04-04 09:52:58 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Martin Yuan
b9fbec96e6 Support LIST_UNPACK and TUPLE_SLICE in lite interpreter. (#35241)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35241

Test Plan: Imported from OSS

Differential Revision: D20609439

Pulled By: iseeyuan

fbshipit-source-id: 4f352b8641c203aaf9f2204e4080876bd4d47b0c
2020-03-23 18:21:26 -07:00
James Reed
ab76a8206f [JIT][mobile] Support built-in Function call in lite interpreter (#34676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676

Test Plan: Imported from OSS

Differential Revision: D20427938

Pulled By: jamesr66a

fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b
2020-03-13 18:24:18 -07:00
Hong Xu
027d7f7ba5 Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623

The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid
of it entirely.

Close #34502

Test Plan: Imported from OSS

Differential Revision: D20420112

Pulled By: albanD

fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50
2020-03-13 12:27:22 -07:00
Martin Yuan
01edb7450f [Lite Trainer] Add necessary registrations for MNIST model (#33717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33717

Because of the special treatment of operator names for lite interpreter, all the operators used in lite interpreter are still prepended by "_". Add the necessary registrations for MNIST model. All the ops with autograd capability are included in torch_mobile_train. After rebase the selective build from D19649074 can be utilized to strip the unused ops.

Note that this diff is for feasibility test. The training accuracy are not covered in the test.
ghstack-source-id: 97780066

Test Plan:
```
buck run xplat/caffe2/fb/lite_trainer:lite_trainer -c pt.disable_gen_tracing=1 -c pt.static_dispatch=0 -- --model=/path/MnistModel.bc
```
{F227898221}

Reviewed By: dreiss

Differential Revision: D19743201

fbshipit-source-id: cacadd76f3729faa0018d147a69466bbf54312fd
2020-03-06 15:49:03 -08:00
Martin Yuan
ccf4d69b75 [Lite Interpreter] Enable __setstate__ (#33294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294

1. Serialize bytecode of __setstate__ and run it when loading the model.
2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile.

Test Plan: Imported from OSS

Differential Revision: D20162898

Pulled By: iseeyuan

fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c
2020-03-05 15:24:21 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
Martin Yuan
758ad516f3 [Lite interpreter] Pass shared_ptr properly (#33667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33667

Pass shared_ptr properly according to C++ guidances. Thank kimishpatel for pointing it out.

Test Plan: Imported from OSS

Differential Revision: D20111001

Pulled By: iseeyuan

fbshipit-source-id: 213a0f950a7f3b9199d789dc0155911f6102d77a
2020-02-25 21:40:05 -08:00
Martin Yuan
5782758b54 Add instructions and operators for new bytecode format of PyText model (#33555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555

A quick fix for the PyText model (in internal production) on the new bytecode format.

Test Plan: Imported from OSS

Differential Revision: D20008266

Pulled By: iseeyuan

fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440
2020-02-20 15:05:37 -08:00
Zachary DeVito
7f2c25b6fa Move special ops into interpreter (#32889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889

Common primitive ops that have special inputs make it very hard to
serialize the bytecode for mobile because information about how the
op behaves is hidden in the Node*. This changes how we handle the following
ops so that they are encoded as their own interpreter bytecodes.

```
    USES NODE: prim::TupleUnpack(...) -> (...)
    USES NODE: prim::TupleSlice(...) -> (...)
    USES NODE: prim::TupleConstruct(...) -> (...)
    USES NODE: prim::ListUnpack(...) -> (...)
    USES NODE: prim::ListConstruct(...) -> (...)
    USES NODE: prim::DictConstruct(...) -> (...)
    USES NODE: prim::Constant() -> (...)
    USES NODE: prim::isinstance(...) -> (...)
    USES NODE: prim::CreateObject(...) -> (...)
    USES NODE: prim::fork(...) -> (...)
    USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```

This leaves a state where the _only_ remaining Node*-consuming builtins
are things that are only introduced during JIT optimization and will
not appear in mobile code.

Serialization of bytecode can now be made to directly write the CodeImpl
object without modification.

Test Plan: Imported from OSS

Differential Revision: D19673157

Pulled By: zdevito

fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26
2020-02-18 15:07:01 -08:00
Xingying Cheng
4493b10500 [PyTorch] Gate out mobile operator logging observer.
Summary: Introduce separate gating for mobile operator logging observer.

Reviewed By: ljk53

Differential Revision: D19665993

fbshipit-source-id: b81a228c55110a02edb8c2b6f9fd02e750b2ad69
2020-01-31 17:25:53 -08:00
Zachary DeVito
14593f077f remove list specialization from ivalue (#30734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734

What are specialized lists?

The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types.
e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>.

Why do we have specialized lists?

When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>,
std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain
these same types. Conversion was just unwrapping the IValue, very easy and cheap.

What is the problem with specialized lists?

We end up with significant special cases through the compiler. Other types like Dict are not
specialized. So in the Pickler, for instance, there is a single piece of logic to handle
their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't
match Python, leading to problems along translation boundaries. Our pickle serialization
is slightly different than python, so it is harder to load objects from our IValue serialization
as Python values.

They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++
bindings to TorchScript. This would entail having a single torch::List class (untemplated)
that can be used to construct inputs. This is made much harder if the underlying ivalue needs
to be different depending on the type inside the list. The ideal case would be to have a constructor like

```
template<typename T>
List(std::vector<T> foo);
```

It would then set up the type tags correctly based on type T, without the need for passing tags.

Do specialized lists improve perf?

Not in a way we have been able to measure. Our major concern initially was having to translate
a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern
for aten::_convolution which takes a number of mostly-constant lists of integers. However,
when we measure the effect of actually having to do this conversion for an aten::_convolution,
it does not take measurable time (benchmark results below).
This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code.

What are the issues removing them?

This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly
the same. The only visible change is that toTensorListRef and family have turned into toTensorVector
because they now return by value a copy of the list as a vector.

Further PRs can then clean up the complexity issues that arose from speclization. This will likely
involve removing the isTensorList/isIntList functions, and refactoring the code that used them to
work generically. At some point we will also change serialization to no longer write specialized
lists in the pickle binary. This is forward incompatible, so will go in its own PR.

Benchmark:
```
import torch

import torch.nn as nn
import torch.nn.functional as F
import time

class MnistNet(nn.Module):
    def __init__(self):
        super(MnistNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=1)
        self.conv2 = nn.Conv2d(1, 1, kernel_size=1)

    def forward(self, x):
        for i in range(10):
            x = F.relu(self.conv1(x))
            x = F.relu(self.conv2(x))
        return x

model = MnistNet()
x = torch.rand(1, 1, 1, 1)
r = torch.jit.trace(model, x )
r(x)
r(x)
r(x)
r(x)
print(torch.jit.last_executed_optimized_graph())

while True:
    b = time.time()
    for i in range(100):
        r(x)
    e = time.time()
    print(e - b)
```

Results (no observable difference):

```
Before (actual conv)
0.13251137733459473
0.13260436058044434
0.13276338577270508
0.1327497959136963
0.13250041007995605
0.13270330429077148
0.13290190696716309
0.13265132904052734
0.13274288177490234
0.1326758861541748
0.13253355026245117
0.13254785537719727
0.13260746002197266
0.13285017013549805
0.13264012336730957
0.132490873336792
0.13280034065246582
0.13243484497070312
0.1325232982635498
0.1326127052307129
0.13264131546020508
0.13274383544921875
0.13298296928405762
0.1326909065246582
-------------------
After (actual conv)
0.13127517700195312
0.13150334358215332
0.13092470169067383
0.13102364540100098
0.13134360313415527
0.13155555725097656
0.13314104080200195
0.13151955604553223
0.13160037994384766
0.1315293312072754
0.13137340545654297
0.13148093223571777
0.131455659866333
0.1327371597290039
0.13134026527404785
0.13152337074279785
0.13151192665100098
0.13165974617004395
0.13403725624084473
0.13251852989196777
0.13135504722595215
0.1315624713897705
0.1317615509033203
0.1314380168914795
0.13157200813293457
--------------------

The following replace the convolution operator with a no-op, to show
that even if the conv op was made faster, then we still would not see
a difference:

Before (fake conv)
0.0069539546966552734
0.0069522857666015625
0.007120847702026367
0.007344722747802734
0.007689952850341797
0.007932662963867188
0.00761723518371582
0.007501363754272461
0.007532835006713867
0.007141828536987305
0.007174253463745117
0.007114410400390625
0.007071495056152344
------------------
After (fake conv)
0.007458209991455078
0.007337093353271484
0.007268190383911133
0.007313251495361328
0.007306575775146484
0.007468700408935547
0.0073091983795166016
0.007308483123779297
0.007538318634033203
0.007356882095336914
0.007464170455932617
0.007372140884399414
```

Test Plan: Imported from OSS

Differential Revision: D18814702

Pulled By: zdevito

fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6
2020-01-12 18:28:25 -08:00
Xingying Cheng
78254eab45 Add mobile operator observer for qpl logging.
Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a).

Test Plan:
Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent:
1. buck install -r fb4a
2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params:
a. sample_rate: 1.0
b. enabled: true
c. use_bytedoc_pytorch_model: true
d. use_bytedoc_caffe2_model: false
e. use_full_jit: false
3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage;
4. Click on the ads, wait for the offsite ads loads;
5. Click back to news feed;
6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged:

{F223250762}

Reviewed By: ljk53

Differential Revision: D18131224

fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0
2019-12-06 11:55:32 -08:00
Martin Yuan
6980cb2519 Add overload name to JIT prim operators, version 2 (#29960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960

Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators.

Test Plan: Imported from OSS

Differential Revision: D18555484

fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34
2019-11-16 23:59:07 -08:00
Zachary DeVito
a5b4d78c6d Revert D18499600: Add overload name to JIT prim operators.
Test Plan: revert-hammer

Differential Revision:
D18499600

Original commit changeset: a1b49e64c908

fbshipit-source-id: 73e27b72f53799c0133850d2352ae8cd8a82d87c
2019-11-15 18:36:17 -08:00