Commit Graph

89 Commits

Author SHA1 Message Date
Zhengxu Chen
d6b15bfcbd [jit][edge] Load interface methods to corresponding ClassTypes. (#65971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65971

ghstack-source-id: 141842335

We should be able to load methods into their ClassTypes. Right now mobile runtime only loads data member to ClassTypes but not for methods. To support interface call, we inject methods into ClassTypes when the methods are loaded.

Test Plan: existing tests should all pass.

Reviewed By: qihqi

Differential Revision: D31326146

fbshipit-source-id: fb1dbea619910ef1f8fa26146da3ebab348fe902
2021-10-29 12:48:57 -07:00
Scott Wolchok
e88d1c4f10 [PyTorch] Add tuple inline storage (#64066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066

I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395

Test Plan:
Added automated tests for TupleElements.

Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30592622

fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
2021-10-15 12:16:51 -07:00
Mengwei Liu
ab25516054 [PyTorch] Remove unused function in import (#65865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65865

`operator_str` is not used in `import.cpp` and it is also defined in `parse_operators.cpp` so removing it from `import.cpp`.

Test Plan: CI passing

Reviewed By: iseeyuan

Differential Revision: D31293008

fbshipit-source-id: 1c857cbd63c57b8f79c1a068789fc8605605b642
2021-10-06 06:34:51 -07:00
Scott Wolchok
176d3c6fb4 [PyTorch] Fix many Tuple::elements() callsites (#64065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065

It is only safe to mutate Tuple elements if you are the sole owner
of the tuple. The most efficient way to do this, then, is
`std::move(*std::move(tupleIValue).toTuple()).elements()` (the
innermost move allows `IValue::toTuple()` to avoid a refcount bump and
the outermost move allows the element vector to be moved out of the
tuple), but many callsites write simply
`tupleIValue.toTuple().elements()`, which incurs many extra refcount
bumps.

ghstack-source-id: 139468088

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D30592621

fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09
2021-10-01 11:36:05 -07:00
Scott Wolchok
38c77539e8 [PyTorch][Edge] Fix inefficiency in objLoaderMobile (#65710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65710

No need to incur extra refcount bumps, and no need to use a stringstream for what are presumably string keys anyway.
ghstack-source-id: 139325445

Test Plan: CI, reviewers to confirm the keys are supposed to be strings

Reviewed By: dhruvbird

Differential Revision: D31215347

fbshipit-source-id: 82be93cb2e57aefe94edf74d149115cb734112be
2021-09-30 14:53:40 -07:00
Mengwei Liu
eaf85fad62 [PyTorch] Extract parseOperator() into a standalone source file (#65179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
2021-09-17 13:31:59 -07:00
Salil Desai
3727baea6f [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269

Revert changes in D29826210 (693d8f2f07) (we don't need operator lambda caching since there aren't duplicate operators anymore)

This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904

Test Plan:
**Speech Transducer v25 model (as in D29826210 (693d8f2f07))**

|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|

The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.

Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```

**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434

Reviewed By: iseeyuan

Differential Revision: D30615710

fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c
2021-09-14 12:12:46 -07:00
Martin Yuan
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
Scott Wolchok
0d0d2f2ac5 [PyTorch] move from input ivalues in ByteCodeDeserializer (#64029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029

This should save us a separate pass over the data structure to destroy it.
ghstack-source-id: 137566821

Test Plan:
Pixel3
before:
https://www.internalfb.com/intern/aibench/details/503337445067962
after:
https://our.intern.facebook.com/intern/aibench/details/320277034999340

overall mean time decreased from 373 ms to 358 ms. In flame graph, we
can see that some time spent destroying a vector of IValues was moved
into parseMethods, and the new parseMethods time is less than the old
time plus the recursive destruction time.

Reviewed By: dhruvbird

Differential Revision: D30559530

fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8
2021-09-08 18:32:48 -07:00
Kimish Patel
468001600c Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307

Original commit changeset: 0b2aa7c57d08

Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30680354

fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
2021-09-01 13:29:35 -07:00
Kimish Patel
67cb131458 Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer

Differential Revision:
D30327514 (bc9277dca3)

Original commit changeset: 3bb2f2daaaed

fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64
2021-08-31 08:30:36 -07:00
Kimish Patel
bc9277dca3 [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367

This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)

Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30327514

fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
2021-08-30 20:54:51 -07:00
Scott Wolchok
9777887f0e [PyTorch] Reduce copies/refcount bumps in BytecodeDeserializer::parseMethods (#63961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63961

Saw a report that this function was slow and was doing unexplained vector copies. First pass to remove a bunch of copying.
ghstack-source-id: 136760976

Test Plan:
Pixel 3
before: https://our.intern.facebook.com/intern/aibench/details/461850118893980
after: https://www.internalfb.com/intern/aibench/details/48965886029524

MilanBoard failed to return data from simpleperf

Reviewed By: dhruvbird

Differential Revision: D30544551

fbshipit-source-id: 0e2b5471a10c0803d52c923e6fb5625f5542b99d
2021-08-30 09:37:10 -07:00
Priya Ramani
f4496528e3 [Light] Fix error message (#64010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64010

Fixing typos in a error message

Test Plan:
Error message before fix:
Lite Interpreter verson number does not match. The model version must be between 3 and 5But the model version is 6

Error message after fix:
Lite Interpreter version number does not match. The model version must be between 3 and 5 but the model version is 6

Reviewed By: larryliu0820

Differential Revision: D30568367

fbshipit-source-id: 205f3278ee8dcf38579dbb828580a9e986ccacc1
2021-08-27 22:54:38 -07:00
Dhruv Matani
693d8f2f07 [PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996

A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model).

This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided.

See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou.
ghstack-source-id: 134634612

Test Plan:
Run using AI Bench.

### Speech Transducer v25 model (87MiB)

Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle):

```
cd /tmp/
mkdir speech_model
cd speech_model
fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25
cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl
```

Here's how to build and run the benchmark using AI Bench:

```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote
```

Reviewed By: raziel

Differential Revision: D29826210

fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f
2021-07-29 20:14:47 -07:00
Richard Barnes
b5867a1b34 irange-ify 7 (#62117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62117

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879640

fbshipit-source-id: 189578a57301747a3421742e145bbcdf2ad75c49
2021-07-28 13:30:39 -07:00
Richard Barnes
a91be24e2d Modernize make pointers (#61741)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717385

fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e
2021-07-22 15:54:37 -07:00
Tianyi Yu
39ce29efe0 Refactor metadata_map with flattened key/value pair (#61731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61731

In the previous diff, metadata_map contains mobile_info.json and producer_info.json. We need to parse json each time when we log the required information. This diff helps to flatten the content in the files into key/value pair. It allows logger to directly loop through the metadata_map and log the information.

Test Plan:
Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner.

# Test On CC Scanner
**Test content with LOG(WARNING)**
{P429123273}

**Scuba Logger Output**

1. MOBILE_MODULE_LOAD_STATS

{F631884673}

2.  MOBILE_MODULE_STATS

{F631884787}

Reviewed By: xcheng16

Differential Revision: D29690702

fbshipit-source-id: 1db5a1f5c25e98e5b2f1cc254fd880dfdfa025e2
2021-07-16 00:37:17 -07:00
Tianyi Yu
00a7f55b6e Apply for MOBILE_MODULE_STATS Logging (#61600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61600

This diff changes the module.h constructor, and removes metadata_. It refactors all the constructors caller side, and creates a getter & setting for metadata_. MOBILE_MODULE_STATS reads the metadata from mobile::Module, and pass it into logger.

Test Plan:
Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner.

# Test On CC Scanner
**Test content with LOG(WARNING)**
{P428930572}

**Scuba Logger Output**

{F631761194}

Reviewed By: xcheng16

Differential Revision: D29673184

fbshipit-source-id: 962e0d7b06a07caaa0c695a4ac58b885fd1505ea
2021-07-16 00:37:15 -07:00
Tianyi Yu
fc710eecc0 Apply for MOBILE_MODULE_LOAD_STATS Logging (#61480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61480

Append mobile_info.json and producer_info.json into extra_files and parse the jsons from “model_info.json” in onExitLoadModel.
ghstack-source-id: 133327912

Test Plan:
# Test On CC Scanner
**Test content with LOG(WARNING)**
{P428339274}

**Scuba Logger Output**
{F631024095}

# Test On 3D Photo
**Test content with LOG(WARNING)**
{P428340927}

**Scuba Logger Output**

{F631026739}

Reviewed By: xcheng16, guangy10

Differential Revision: D29608014

fbshipit-source-id: abc39c44b947632fd4349de8a432649e84284a87
2021-07-16 00:36:09 -07:00
David Reiss
a682ff7ef1 Add kMaxSupportedBytecodeVersion for Lite Interpreter (#59472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59472

Previously, the lite interpreter would refuse to load any model
with a version greater than kProducedBytecodeVersion.  Now, we're
able to independently advance the loading and saving code, so we
can roll out changes without breaking forward compatibility.

Test Plan:
CI.
Loaded a bytecode v5 model even with setting kProducedBytecodeVersion
to v4.

Reviewed By: raziel

Differential Revision: D28904350

fbshipit-source-id: 598c22f0adf47d4ed3e976bcbebdf3959dacb1df
2021-06-04 17:55:02 -07:00
Mike Ruberry
57e452ff5d Revert D28856713: [PyTorch Edge] Add proper error message when loading incompatible model with lite interpreter
Test Plan: revert-hammer

Differential Revision:
D28856713

Original commit changeset: c3f9a3b64459

fbshipit-source-id: cc6ba8ec1047f29e62061107a2e5f245981b8039
2021-06-03 08:40:28 -07:00
Chen Lai
91b7bcf4c0 [PyTorch Edge] Add proper error message when loading incompatible model with lite interpreter (#59354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59354

Check if the model has `bytecode.pkl` and provide proper error message before loading model. Test it by loading a model.pt and model.ptl.
```
>>> from torch.jit.mobile import _load_for_lite_interpreter
>>> _load_for_lite_interpreter("/Users/chenlai/Documents/pytorch/data/mobilenet_v2.pt")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chenlai/pytorch/torch/jit/mobile/__init__.py", line 48, in _load_for_lite_interpreter
    cpp_module = torch._C._load_for_lite_interpreter(f, map_location)  # type: ignore[attr-defined]
RuntimeError: The model is not generated from the api _save_for_lite_interpreter. Please regenerate the module by scripted_module._save_for_lite_interpreter('model.ptl'). Refer to https://pytorch.org/tutorials/prototype/lite_interpreter.html for more details.
```

iOS:
![image](https://user-images.githubusercontent.com/16430979/120593077-cbe23180-c3f3-11eb-9745-ee2b04b78c6c.png)

Android:
![image](https://user-images.githubusercontent.com/16430979/120594357-af46f900-c3f5-11eb-9fb0-500a038148e3.png)

Differential Revision:
D28856713
D28856713

Test Plan: Imported from OSS

Reviewed By: dhruvbird

Pulled By: cccclai

fbshipit-source-id: c3f9a3b64459dda6811d296371c8a2eaf22f8b20
2021-06-03 03:18:14 -07:00
Chen Lai
d3fbb41c61 [PyTorch Edge] share tensors in mobile with new api (#58182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58182

As title, the v5 model format will be
```
(base) chenlai@chenlai-mp reuse_constant % zipinfo /Users/chenlai/Documents/pytorch/reuse_constant/tmp/zip/script_module_v5_unify.ptl
Archive:  /Users/chenlai/Documents/pytorch/reuse_constant/tmp/zip/script_module_v5_unify.ptl
Zip file size: 3120 bytes, number of entries: 7
-rw----     0.0 fat       77 bl stor 80-000-00 00:00 script_module_v4_unify/data.pkl
-rw----     0.0 fat      240 bl defN 80-000-00 00:00 script_module_v4_unify/code/__torch__/___torch_mangle_5.py
-rw----     0.0 fat      422 bl defN 80-000-00 00:00 script_module_v4_unify/code/__torch__/___torch_mangle_5.py.debug_pkl
-rw----     0.0 fat       64 bl stor 80-000-00 00:00 script_module_v4_unify/constants/140245072983168.storage
-rw----     0.0 fat      172 bl stor 80-000-00 00:00 script_module_v4_unify/constants.pkl
-rw----     0.0 fat      678 bl stor 80-000-00 00:00 script_module_v4_unify/bytecode.pkl
-rw----     0.0 fat        2 bl stor 80-000-00 00:00 script_module_v4_unify/version
7 files, 1655 bytes uncompressed, 1453 bytes compressed:  12.2%
```
bytecode.pkl is:
```
(5,
 ('__torch__.___torch_mangle_5.TestModule.forward',
  (('instructions',
    (('STOREN', 1, 2),
     ('DROPR', 1, 0),
     ('LOADC', 0, 0),
     ('LOADC', 1, 0),
     ('MOVE', 2, 0),
     ('OP', 0, 0),
     ('LOADC', 1, 0),
     ('OP', 1, 0),
     ('RET', 0, 0))),
   ('operators', (('aten::add', 'int'), ('aten::add', 'Scalar'))),
   ('constants',
    (torch._utils._rebuild_tensor_v2(pers.obj(('storage',
          torch.DoubleStorage,
          '140245072983168.storage',
          'cpu',
          8),),
       0,
       (2, 4),
       (4, 1),
       False,
       collections.OrderedDict()),
     1)),
   ('types', ()),
   ('register_size', 2)),
  (('arguments',
    ((('name', 'self'),
      ('type', '__torch__.___torch_mangle_5.TestModule'),
      ('default_value', None)),
     (('name', 'y'), ('type', 'int'), ('default_value', None)))),
   ('returns',
    ((('name', ''), ('type', 'Tensor'), ('default_value', None)),)))))
```

constants.pkl is:
```
(torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.DoubleStorage, '140245072983168.storage', 'cpu', 8),),
   0,
   (2, 4),
   (4, 1),
   False,
   collections.OrderedDict()),)
```

Both tensors will refer to the tensor in at the path `script_module_v4_unify/constants/140245072983168.storage`.

## Note
According to unify format, all tensors will be written to the folder `.data`, however, torch.jit.load() can't handle the unified format at this moment, so this change will write tensors at the `constants` folders, and mobile will write/read tensors from `constants` folder. such that the model can be interpreted by both jit and mobile.
ghstack-source-id: 129010347

Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit

Reviewed By: raziel, iseeyuan

Differential Revision: D28375257

fbshipit-source-id: 6544472db4c957c5ea037e0bb5112b637dd15897
2021-05-14 14:03:56 -07:00
Lillian Johnson
07de11c26d [torch.Package/TorchScript] TS serialization importer to handle unified format (#54891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54891

Changed TorchScript's jit/serialization importer logic to handle both original TS serialization format and new unified TS format

Original TS file format:
```
resnet.pt
├── data  # tensor data
│   ├── 94286146172688
│   ├── 94286146172784
│   └── ...
├── code/  # TorchScript code
│   ├── __torch__
│   │   ├── torch
│   │   │   └── nn ...
│   │   └── torchvision ...
│   ├── __torch__.py
│   └── __torch__.py.debug_pkl
├── data.pkl  # the ScriptModule object, pickled by the TS pickler
├── version  # version metadata
├── constants.pkl  # any tensor constants present in the TS code
└── extra
     ├── name_of_file
     └── foo
```

Unified file format:
```
─── package_name.pt
    ├── .data
    │   ├── ts_code # code shared between models
    │   │   ├── 0
    │   │   │   ├── constants.pkl
    │   │   │   └── data.pkl
    │   │   ├── 1
    │   │   │   ├── constants.pkl
    │   │   │   └── data.pkl
    │   │   └── code
    │   │       ├── __torch__
    │   │       │   ├── torch
    │   │       │   │   └── nn ...
    │   │       │   └── torchvision ...
    │   │       ├── __torch__.py
    │   │       └── __torch__.py.debug_pkl
    │   ├── 0.storage
    │   ├── 1.storage
    │   ├── <many more storages>
    │   ├── 201.storage
    │   ├── extern_modules
    │   └── version
    └── res
        ├── mod.pkl  # maps to ts_id 0 and .data/ts_code/0
        └── mod2.pkl # maps to ts_id 1 and .data/ts_code/1
```

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D27832548

Pulled By: Lilyjjo

fbshipit-source-id: 4a6e84c3a9bac8eed6a4e4afc2ac76dd691858b0
2021-05-14 08:20:34 -07:00
Martin Yuan
d833caaf6b [PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845

Handle forward/backward compatibility caused by added default arguments in mobile. As an example,

In older version, operator aten::foo's schema is
```
foo(Tensor a, Tensor b) -> Tensor
```
In the new version, the schema is updated to
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```

## Model file
Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name:
```
('operators', (('aten::foo', ''),))
```
Now the number of specified arguments is added:
```
# bytecode version 6
('operators', (('aten::foo', '', 2),))
```
where "2" means the number of specified arguments.

Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6.

## Runtime and backward compatibility
When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by
```
op.value().schema().arguments()
```
Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema:
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```
Whereas the model file carries
```
(('aten::foo', ''), 2)
```
We can implement a wrapper around the original function pointer to push the default argument to the stack.

## Deliver time and forward compatibility
At model delivery time, two checks can be done:
### Operator check
Two APIs to be provided:
* Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP)
* Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required).

The APIs can be used to check
* runtime.ops() is a superset of model.ops()
* for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op.

Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too.
### Bytecode version backport (PR coming)
When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed.
* The number of arguments is removed from the operator table
* The bytecode version is changed from 6 to 5

Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27986544

Pulled By: iseeyuan

fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda
2021-05-13 14:20:47 -07:00
Chen Lai
fb9a32b7b4 [PyTorch][Edge] Add api to get bytecode model version (#56801)
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801

ghstack-source-id: 128169647

Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Reviewed By: iseeyuan

Differential Revision: D27961417

fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
2021-05-05 09:17:26 -07:00
Dhruv Matani
d728491fc1 [RFC] [PyTorch Edge] Simplify error logging in mobile/import.cpp (#55711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55711

Currently, there is some complex logic that tries to handle all exceptions but re-throws them as a `c10::Error` so that it can log the error message. I'm looking for context on why this was added. The current logic (after talking with swolchok) seems equivalent, simpler, and also preserves the original stack trace from where the exception was originally thrown. This is useful when viewing the backtrace in logview. Re-throwing an exception using `TORCH_CHECK(false, message)` results in the original exception stack trace getting lost, so we want to avoid that.
ghstack-source-id: 128043281

Test Plan: Build.

Reviewed By: iseeyuan

Differential Revision: D27688352

fbshipit-source-id: b7b1a29b652b31da80d72f16d284e48b8623377b
2021-05-04 20:45:32 -07:00
Kimish Patel
bb3c6699a5 [Pytorch Mobile DebugInfo Serialization] Save debug handles for all instructions. (#55252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55252

Earlier for bytecode serialization we were saving debug handles only for OPs and not all
instructions. This PR makes changes to add that for all instructions.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule

Imported from OSS

Reviewed By: dreiss

Differential Revision: D27542502

fbshipit-source-id: cff75118c721ce9f0c2f60d2c9471481f05264ca
2021-05-04 09:21:13 -07:00
Kimish Patel
e0fc473e47 [Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062

This diff introduces the following changes:
1. InlinedCallStack pickler/serializer is introduced. It is serialized
as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack}
Module instance info is serialized as tuple of {class_type_name,
instance_name}.
Note that callee of the serialized inlined callstack points to the tuple
of already serialized callstack. This means the first callstack ptr to
serialize, will serialize entire path of the tree, where some callee
nodes might be shared with callstack pointers that will be serialized
subsequently. Pickler supports memoization of pickled objects, where if
a tuple has been serialized then object id is obtained instead of
serialized object again. Thus we stll serialize the tree and not every
path from the root separately. Furthermore, InlinedCallStackSerializer
also uses cache to lookup the pointer and return the serialized IValue.
Furthermore, note that we must also serialize the source range of
InlinedCallStack. In order to this serializer requires map of
source-range-tags-to-source-range map. This was done in the previous
diff, where as part of source range serialization we also generate
unique tags. These are the tags that are serialized in InlinedCallStack.
Thus during deserialization we would have to deserialize source range
before deserializing InlinedCallStacks.
2. Furthermore, each serialized InlinedCallStack is serialized with a
unique debug_handle and source range tag.
BackendDebugHandleManager manages generation of
unique debug handles and saves the map of
debug-handles-to-{source_range_tag, inlined-callstack-ptr}.
This map is then serialized as callstack_debug_map.pkl. Note that
inlined callstack is not sufficient to get all the source information
since it contains source information about the nodes which are inlined.
The top-of-the-stack (or bottom) node, which is the actual op node, is
not part of the inlined callstack pointer and thus the source range of
this node is serialized separately using source_range_tag. This is
similar to how JIT creates callstack in
torch/csrc/jit/runtime/interpreter.cpp

Unique debug handles facilitates exception throwing or profiling using
just the debug handle without any further qualifications, such as which
function or module the inlined-callstack belongs to.

Furthermore, this diff refactors the old mobile code for tracking
module hierarchy information per op. Mainly now bytecode serialization
will serialize debug handles corresponding to ops/nodes in graph and
have callstack_debug_map.pkl help generate:
1. Entire callstack and
2. Module hierarchy information.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule
./build/bin/test_jit --gtest_filter=*ModuleInfo

Imported from OSS

Reviewed By: raziel

Differential Revision: D27468709

fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23
2021-05-04 09:21:12 -07:00
Kimish Patel
f4a921600a [PyTorch, Mobile] Serialization format change for source range (#54284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284

In order to bring mobile deployment, via lite interpreter, on feature
parity with JIT, with respect model level debug information we must make
model level debug information available to mobile runtime.
At the moment, model level debug information is stored in SourceRange
which associates node's of graph to where the come from in original
python source code.
This information is serialized as part of debug_pkl and deserialized
when JIT loads the model and reads the model code.
On lite interpreter, we do not have access to all the functionality of
JIT and hence we cannot load model in the same way as JIT, by reading
code, constructing module hierarchy and graph corresponding module
methods etc. Instead in, lite interpreter, only bytecode corresonding to
the compiled graph, Code, is saved.
Thus in order to annotate OPs in the bytecode with equivalent
SourceRange information we do the following:
1. During model serialization, we create a unique tag for each source
range of the model.
2. Create a map of <SourceRange, tag>
3. During debug_pkl serialization we save tag along with SourceRange, on
top of byte offset.
4. During bytecode generation, the methods of the top module are
lowered. During this process methods are inlined. In the inlined graph,
when the node of a graph is lowered to bytecode, we query node's source
range and look it up against the map.
5. Resulting source range tag is serialized in module_debug_info.
6. During model deserialization, we read all the debug_pkl records in
the archieve and create a map of <tag, SourceRange>
7. This map can be used to find source code information.

During mobile runtime:
1. We read all the debug_pkl records and create <tag=debug_handle,
SourceRange> map.
   1.1 This map, MobileDebugInfo, is a member of mobile Module.
2. Interpreter catches appropriate exceptions and sets the thread local
debug handle and rethrows the exception.
3. In Function's run method we catch exception and query current debug
handle where the exception happened.
4. Query MobileDebugInfo with debug handle to retrieve source range and
augment error with source range info.

This information is still incomplete as it does not contain entire
callstack.

In the following diffs we will serialize InlinedCallStack directly.

Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro,
so that mobile builds can avoid building MobileDebugInfo, source range
and source range pickler/unpickler. Later we will add path where, if
building without debug support stack trace will contain only debug
handles. They can be symbolicated later.

Test Plan:
Ported bunch of source range tests from test_jit.py. Added on more test
in test_lite_interpreter.py

Imported from OSS

Reviewed By: raziel

Differential Revision: D27174722

fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12
2021-05-04 09:19:27 -07:00
Chen Lai
9486fc3229 [PyTorch][Edge] share readArchiveAndTensors between mobile and jit (#57098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57098

1. Separate `readArchiveAndTensors()` from `jit/import.cpp` to a new file `jit/import_read.cpp`.
2. Use `readArchiveAndTensors()` in `mobile/import.cpp`
ghstack-source-id: 127703081
3. Add a util function in cpp that could read .pkl files directly instead of loading the entire module

Test Plan: CI

Reviewed By: raziel, iseeyuan

Differential Revision: D28052193

fbshipit-source-id: c8d57f3270bdcf2e52a32f7c111899bd5da7cac2
2021-04-29 10:09:50 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Dhruv Matani
bd3c63aeeb [PyTorch Edge] Move torch::jit::mobile::_export_operator_list() from serialization/export_module.cpp to mobile/import.cpp (#56044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56044

We want to be able to drop the dependence of full-jit deps in the auto-generated unit tests for 2 reasons:

1. Running bloaty on the auto-generated unit tests should be somewhat representative of the actual size.
2. The runtime environment of the auto-generated unit tests should be as close to the production environment as possible to ensure that we are running the tests in a production-like runtime.

Due to the dependece on full-jit, we aren't there yet. For the auto-generated tests, we probably don't need to depend on `_export_operator_list()` evetually, but for now we do since it is used to decide whether the model being run is a Metal GPU model or a CPU model, and gates whether the test runs that model or not.

Eventually, we can stop doing this in the test and do it in the codegen from PTM-CLI instead (by fetching the operators from that tool, and writing out to the BUCK file which backend(s) this model is targeting). However, that will take some time to land, so in the spirit of expediency, this change is being proposed.

Discussed this offline with iseeyuan
ghstack-source-id: 126656877

Test Plan: Build + BSB.

Reviewed By: iseeyuan

Differential Revision: D27694781

fbshipit-source-id: f31a2dfd40803c02f4fd19c45a3cc6fb9bdf9697
2021-04-15 17:53:36 -07:00
Dhruv Matani
2c5579702a [PyTorch Mobile] Add module size to logged metadata (#53578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53578

We want to be able to log the loaded module size to the scuba table `qpl_metrics/pytorch`. Hence, adding the `model_size` field to the logged metadata when logging a module load success event.

ghstack-source-id: 123980964

Test Plan: xcheng16 How should this be tested?

Reviewed By: xcheng16, raziel

Differential Revision: D26902971

fbshipit-source-id: a7c2e9120706bd31f76f6572c8503d4acf8a89e2
2021-03-15 21:11:36 -07:00
Dhruv Matani
b26c0bb2b9 [PyTorch Mobile] Allow skipping operator exists check when bytecode model is loaded (#52814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52814

Currently, there is no way to load a model on a devvm (CPU) if that model has operators that the runtime doesn't support. This ends up happening (currently) for Metal GPU models, and potentially in the future for other backends that have backend-specific operators that don't have a registered implementation (even a dummy one) on CPU.

There are at least a couple reasons for why this is needed:

1. We want to extract operator list directly from the bytecode (instead of looking it up from `mobile_info.json).
2. We want to be able to trace the quantized operators that are invoked when loading the compressed weights for a model that has prepacked weights. xta0 root-caused this after husthyc discovered that there are untraced operators showing up when loading a Metal GPU model.

If we want to scale out to support different types of models, we absolutely need the ability to load a model on a devvm irrespective of what backend (device/etc...) it is targeted at.

ghstack-source-id: 123284366

Test Plan: The next diff in this stack is using the newly introduced methods.

Reviewed By: iseeyuan

Differential Revision: D26656266

fbshipit-source-id: eed9af2f7b55979e9c18b986b8c3b9a767153297
2021-03-07 02:56:12 -08:00
Martin Yuan
b5ae8e69a7 [Lite Interpreter] Support features from to_backend (#52870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52870

Add the missing parts to support to_backend modules by lite interpreter.
1. Add ISINSTANCE instruction support, which is used in to_backend for output type check.
2. Bypass lite interpreter's type parser by checking the qualified name. If it starts with "torch.jit", use the same type resolver as nn module (starting with "__torch__").

Tests
Mobile module is serialized and loaded in ```BackendTest.TestCompiler```. The results are compared to those from original torchscript module.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D26715351

Pulled By: iseeyuan

fbshipit-source-id: ad9d74ee81c6aa692ab9e5dd7a9003bae5d4f01f
2021-03-01 17:56:01 -08:00
Richard Barnes
26419815af Modernize for-loops (#52330)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52330

Test Plan: Sandcastle

Reviewed By: mruberry

Differential Revision: D26001961

fbshipit-source-id: e75cc8f1a8d30917b4d55df9e1a3c7836c271820
2021-02-23 17:32:33 -08:00
Martin Yuan
23c50a4a50 [PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432

ghstack-source-id: 120976584

torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.

CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.

This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26168913

fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
2021-02-03 21:57:19 -08:00
Frank Seide
87ad77eb4e T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: iseeyuan

Differential Revision: D25896212

fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5
2021-02-01 18:35:13 -08:00
Dhruv Matani
ebe26b81d2 [PyTorch Mobile] Enable partial loading of GPU models on linux CPU machines (#51236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51236

The problem we currently have with tracing is that GPU models can't load on devvm CPU machines. Here's why:

1. The Metal GPU ops don't exist so the validation that checks for missing ops kicks in and prevents loading
2. Even if the check for missing ops is commented out, the actual model contents can't be succssfully loaded (see t83364623 for details)

Hence, to work around these problems and allow tracing to detect GPU models, and skip actual tracing for these (as discussed in the meeting 2 weeks ago and based on recommendations from raziel, iseeyuan, and xta0), we're adding code to detect these GPU models based on the set of operators that show up in the file `extra/mobile_info.json`.

The code then skips tracing, and picks up the root operators from the model itself.

The diff below this one will be removed before landing since we don't want to check in the model - I've kept it here in case anyone wants to patch this diff in and run the command on their devvm locally.
ghstack-source-id: 120638092

Test Plan:
See {P168657729} for a successful run of tracing on a GPU model (person segmentation tier-0, v1001) provided by xta0

Also ran `buck test //xplat/pytorch_models/build/...` successfully.

Reviewed By: ljk53

Differential Revision: D26109526

fbshipit-source-id: 6119b0b59af8aae8b1feca0b8bc29f47a57a1a67
2021-01-29 01:00:08 -08:00
Dhruv Matani
ce0f335515 [PyTorch Mobile] Add an overload for deserialize() that doesn't accept the extra_files map. (#50932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50932

After the change to split `_load_for_mobile()` into multiple methods, one which takes in the `extra_files` map, and one which doesn't, we can change the implementation of the `deserialize()` method with different overloads as well. Suggested by raziel on D25968216 (bb909d27d5).

ghstack-source-id: 120185089

Test Plan: Build/Sandcastle.

Reviewed By: JacobSzwejbka

Differential Revision: D26014084

fbshipit-source-id: 914142137346a6246def1acf38a3204dd4c4f52f
2021-01-22 21:54:24 -08:00
Dhruv Matani
bb909d27d5 [PyTorch Mobile] Eliminate static default_extra_files_mobile from header import.h (#50795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50795

There's [a post](https://fb.workplace.com/groups/2148543255442743/permalink/2583012411995823/) about a customer having to pass in `-Wno-global-constructors` to disable warnings related to calling constructors for global objects. This is related to the initialization of `default_extra_files_mobile` in `import.h`.

It requires end users to pass in the compiler flag, since the definition is now in code (.cpp files) that they will be compiling.

In addition, it makes the API for `_load_for_mobile` non-re-entrant (i.e. can not be safely used concurrently from multiple threads without the caller taking a mutex/lock) if the `extra_files_mobile` argument is not explicitly passed in.

Instead, a better option would be to create different overloads; one which requires all 3 parameters, and one that can work with 1-2. This solves the problem without creating a static variable.

ghstack-source-id: 120127083

Test Plan: Build Lite Interpreter and sandcastle.

Reviewed By: raziel

Differential Revision: D25968216

fbshipit-source-id: fbd80dfcafb8ef7231aca301445c4a2ca9a08995
2021-01-21 21:22:48 -08:00
Chen Lai
e05882d2a4 Back out "reuse consant from jit" (#50521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50521

Original commit changeset: 9731ec1e0c1d

Test Plan:
- run `arc focus2 -b pp-ios //xplat/arfx/tracking/segmentation:segmentationApple -a ModelRunner --force-with-bad-commit `
- build via Xcode, run it on an iOS device
- Click "Person Segmentation"
- Crash observed without the diff patched, and the segmentation image is able to be loaded with this diff patched

Reviewed By: husthyc

Differential Revision: D25908493

fbshipit-source-id: eef072a8a3434b932cfd0646ee78159f72be5536
2021-01-14 09:50:40 -08:00
Andres Suarez
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
Chen Lai
d4c1684cf5 reuse consant from jit (#49916)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49916

Test Plan:
1. Build pytorch locally. `MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_CUDA=0 DEBUG=1 MAX_JOBS=16 python setup.py develop`
2. Run `python save_lite.py`
```
import torch

# ~/Documents/pytorch/data/dog.jpg
model = torch.hub.load('pytorch/vision:v0.6.0', 'shufflenet_v2_x1_0', pretrained=True)
model.eval()

# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
import pathlib
import tempfile
import torch.utils.mobile_optimizer

input_image = Image.open('~/Documents/pytorch/data/dog.jpg')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output[0], dim=0))

traced = torch.jit.trace(model, input_batch)
sum(p.numel() * p.element_size() for p in traced.parameters())
tf = pathlib.Path('~/Documents/pytorch/data/data/example_debug_map_with_tensorkey.ptl')

torch.jit.save(traced, tf.name)
print(pathlib.Path(tf.name).stat().st_size)
traced._save_for_lite_interpreter(tf.name)
print(pathlib.Path(tf.name).stat().st_size)
print(tf.name)

```

3. Run `python test_lite.py`
```
import torch
from torch.jit.mobile import _load_for_lite_interpreter
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms

input_image = Image.open('~/Documents/pytorch/data/dog.jpg')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
reload_lite_model = _load_for_lite_interpreter('~/Documents/pytorch/experiment/example_debug_map_with_tensorkey.ptl')

with torch.no_grad():
    output_lite = reload_lite_model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output_lite[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output_lite[0], dim=0))

```
4. Compare the result with pytorch in master and pytorch built locally with this change, and see the same output.
5. The model size was 16.1 MB and becomes 12.9 with this change.

Imported from OSS

Reviewed By: kimishpatel, iseeyuan

Differential Revision: D25731596

Pulled By: cccclai

fbshipit-source-id: 9731ec1e0c1d5dc76cfa374d2ad3d5bb10990cf0
2021-01-08 22:39:28 -08:00
Dhruv Matani
4a870f6518 [PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385

Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.

What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.

Also updated the logic in `converter`.

### Before this change:
1. Get operator List from Torch Script Model
2. Convert to bytecode mobile model

### After this change:
1. Convert to bytecode mobile model
2. Use this converted mobile model to get the list of operators for each method on the model

ghstack-source-id: 118796752

Test Plan:
Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.

Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.

{P147863234}

Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}

Reviewed By: iseeyuan

Differential Revision: D24690094

fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
2020-12-18 11:17:57 -08:00
Martin Yuan
2b61e4d84c Revert D25152559: T66557700 Support default argument values of a method
Test Plan: revert-hammer

Differential Revision:
D25152559 (6bde0ca6d3)

Original commit changeset: bbf52f1fbdbf

fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
2020-12-17 14:05:49 -08:00
Frank Seide
6bde0ca6d3 T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: raziel, iseeyuan

Differential Revision: D25152559

fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
2020-12-16 15:55:03 -08:00
Martin Yuan
a1fef453b6 Support extra files in _load_for_mobile (#47425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47425

Extra files can be exported in lite interpreter model, but it could not be loaded. This PR is to add the capability to load extra files from lite interpreter model. Because extra_files is a default argument, it should not affect the existing usage of _load_for_mobile. It's a simple assembly or a generic unordered_map. No additional dependency should be introduced and the size overhead should be small (to be tested).

Test Plan: Imported from OSS

Reviewed By: kwanmacher

Differential Revision: D24770266

Pulled By: iseeyuan

fbshipit-source-id: 7e8bd301ce734dbbf36ae56c9decb045aeb801ce
2020-11-06 20:26:54 -08:00