pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kimish Patel	d6d726f781	[Pytorch Backend delegation] Add api for backend lowering to query debug (#55462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462 handles and symbolicate exception callstack thrown from backend. Objective of this diff is to achieve improve error reporting when exceptions are raised from lowered backend. We would effectively like to get the same model level stack trace that you would get without having lowered some module to backend. For example: ``` class AA(nn.Module): def forward(self, x, y): return x + y class A(nn.Module): def __init__(...): self.AA0 = AA() def forward(self, x, y): return self.AA0.forward(x, y) + 3 class B(nn.Module): def forward(self, x): return x + 2 class C(nn.Module): def __init__(...): self.A0 = A() self.B0 = B() def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ``` If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we will likely see error stack like: ``` C++ exception with description "The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "<string>", line 3, in forward def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in forward def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` We would like to see the same error stack if we lowered C.A0 to some backend. With this diff we get something like: ``` Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) Traceback of TorchScript (most recent call last): File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE assert isinstance(_0, Tensor) return _0 File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE File "<string>", line 3, in FunctionName_UNKNOWN def forward(self, x, y): return x + y ~~~~~ <--- HERE ``` This is achieved in 3 parts: Part 1: A. BackendDebugInfoRecorder: During backend lowering, in `to_backend`, before calling the preprocess function corresponding to the backend. This will facilitate recording of debug info (such as source range + inlined callstack) for the lowered module. B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder. This initializes thread local pointer to BackendDebugInfoRecorder. C. generate_debug_handles: In preprocess function, the backend will call generate_debug_handles for each method being lowered separately. generate_debug_handles takes `Graph` of the method being lowered and returns a map of Node*-to-debug_handles. Backend is responsible for storing debug handles appropriately so as to raise exception (and later profiling) using debug handles when the exception being raised corresponds to particular Node that was lowered. Inside generate_debug_handles, we will query the current BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug handle manager will issue debug handles as well as record debug_handles-to-<source range, inlined callstack> map. D. Back in `to_backend`, once the preprocess function is has finished lowering the module, we will call `stopRecord` on BackendDebugInfoRecorder. This will return the debug info map. This debug info is then stored inside the lowered module. Part 2: Serialization: During serialization for bytecode (lite interpreter), we will do two things: 1. Extract all the source ranges that are contained inside debug_handles-to-<source range, inlined callstack> map for lowered module. This will be source range corresponding to debug handles, including what is there is inlined callstack. Since we replaced original module with lowered module, we wont be serializing code for the original module and thus no source range. That is why the source range will have to be stored separately. We will lump all the source ranges for all the lowered modules in one single debug_pkl file. 2. Then we will serialize debug_handles-to-<source range, inlined callstack> map. Now during deserialization we will be able to reconstruct debug_handles-to-<source range, inlined callstack> map. Given all debug_handles are unique we would not need any module information. Test Plan: Tests are added in test_backend.cpp Tests are added in test_backend.cpp Imported from OSS Differential Revision: D27621330 D27621330 Reviewed By: raziel Pulled By: kimishpatel fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890	2021-05-22 08:33:07 -07:00
Kimish Patel	f4a921600a	[PyTorch, Mobile] Serialization format change for source range (#54284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284 In order to bring mobile deployment, via lite interpreter, on feature parity with JIT, with respect model level debug information we must make model level debug information available to mobile runtime. At the moment, model level debug information is stored in SourceRange which associates node's of graph to where the come from in original python source code. This information is serialized as part of debug_pkl and deserialized when JIT loads the model and reads the model code. On lite interpreter, we do not have access to all the functionality of JIT and hence we cannot load model in the same way as JIT, by reading code, constructing module hierarchy and graph corresponding module methods etc. Instead in, lite interpreter, only bytecode corresonding to the compiled graph, Code, is saved. Thus in order to annotate OPs in the bytecode with equivalent SourceRange information we do the following: 1. During model serialization, we create a unique tag for each source range of the model. 2. Create a map of <SourceRange, tag> 3. During debug_pkl serialization we save tag along with SourceRange, on top of byte offset. 4. During bytecode generation, the methods of the top module are lowered. During this process methods are inlined. In the inlined graph, when the node of a graph is lowered to bytecode, we query node's source range and look it up against the map. 5. Resulting source range tag is serialized in module_debug_info. 6. During model deserialization, we read all the debug_pkl records in the archieve and create a map of <tag, SourceRange> 7. This map can be used to find source code information. During mobile runtime: 1. We read all the debug_pkl records and create <tag=debug_handle, SourceRange> map. 1.1 This map, MobileDebugInfo, is a member of mobile Module. 2. Interpreter catches appropriate exceptions and sets the thread local debug handle and rethrows the exception. 3. In Function's run method we catch exception and query current debug handle where the exception happened. 4. Query MobileDebugInfo with debug handle to retrieve source range and augment error with source range info. This information is still incomplete as it does not contain entire callstack. In the following diffs we will serialize InlinedCallStack directly. Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro, so that mobile builds can avoid building MobileDebugInfo, source range and source range pickler/unpickler. Later we will add path where, if building without debug support stack trace will contain only debug handles. They can be symbolicated later. Test Plan: Ported bunch of source range tests from test_jit.py. Added on more test in test_lite_interpreter.py Imported from OSS Reviewed By: raziel Differential Revision: D27174722 fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12	2021-05-04 09:19:27 -07:00
Scott Wolchok	b87d3fa432	[PyTorch][jit] Don't allow create() on singleton types (#56807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807 If I understand correctly, there's no reason to create your own instance of these global singleton types. ghstack-source-id: 127312270 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27973447 fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912	2021-04-30 10:28:50 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Scott Wolchok	3959d393b8	[PyTorch][JIT] Less shared_ptr use in dictConstruct (#54110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110 dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals. ghstack-source-id: 124150642 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27101782 fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a	2021-03-22 18:31:27 -07:00
Scott Wolchok	4a24c552cc	[PyTorch] Fix string copy in WARN path for both interpreters (#54076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076 If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around. ghstack-source-id: 124040891 Test Plan: existing tests spot-checked regular interpreter assembly; seems better Reviewed By: dhruvbird, walterddr Differential Revision: D27087204 fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b	2021-03-17 08:44:08 -07:00
Scott Wolchok	8f1af02f35	[PyTorch][mobile] Audit mobile interpreter for extra copies (#54031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54031 Similar to D27060762 (`665d5e2a4f`), caught some probably-unintended copies. ghstack-source-id: 124040889 Test Plan: CI? Reviewed By: walterddr, iseeyuan Differential Revision: D27061818 fbshipit-source-id: f4a77cb5c21cd3ebce7b7e82764e4361467bab91	2021-03-17 08:42:34 -07:00
Dhruv Matani	17495e0318	[PyTorch Mobile] Fix case when error messages are stripped, and stack value isn't popped off in lite-interpreter (#53201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53201 This resulted in [S22350](https://www.internalfb.com/intern/sevmanager/view/s/223540), which caused truoble on Android. 1. The Python has a call to `warnings.warn()`, which resulted in code generated to emit the `WARN` instruction on lite-interpreter. 2. The code for handling that instruction/op-code popped off the value in a call to the `TORCH_WARN()` macro. 3. This macro conditionally compiled out evaluation of the arguments if `STRIP_ERROR_MESSAGES` was defined, which resulted in the stack not getting popped, and the lite-interpreter returning the last pushed value on to the stack. I've attempted to re-produce it using this python code: {P243842428} ghstack-source-id: 122990001 (Note: this ignores all push blocking failures!) Test Plan: Created a new unit test to re-produce the failure in the test. Was able to do so locally using the following command: ``` buck test -c pt.strip_error_messages=1 //xplat/caffe2:test_s223540 ``` However, since `pt.strip_error_messages=0` for dev and continuous builds, I have had to check in a separate contbuild config to try and trigger this failure on contbuild. Reviewed By: iseeyuan Differential Revision: D26765662 fbshipit-source-id: 63c3c96d84ce6a9e5471f13d80165aa3718be9a2	2021-03-04 19:10:07 -08:00
Martin Yuan	b5ae8e69a7	[Lite Interpreter] Support features from to_backend (#52870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52870 Add the missing parts to support to_backend modules by lite interpreter. 1. Add ISINSTANCE instruction support, which is used in to_backend for output type check. 2. Bypass lite interpreter's type parser by checking the qualified name. If it starts with "torch.jit", use the same type resolver as nn module (starting with "__torch__"). Tests Mobile module is serialized and loaded in ```BackendTest.TestCompiler```. The results are compared to those from original torchscript module. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D26715351 Pulled By: iseeyuan fbshipit-source-id: ad9d74ee81c6aa692ab9e5dd7a9003bae5d4f01f	2021-03-01 17:56:01 -08:00
Martin Yuan	23c50a4a50	[PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432 ghstack-source-id: 120976584 torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class. CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed. This diff is to support the case of torchbind custom classes. 1. The class type can be resolved at import level. 2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested. 3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes. Test Plan: CI Reviewed By: raziel Differential Revision: D26168913 fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427	2021-02-03 21:57:19 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Scott Wolchok	ef1fa547ba	[PyTorch] Use expectRef() when calling listConstruct (#50062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50062 Avoids creating an extra shared_ptr. ghstack-source-id: 119325645 Test Plan: CI Reviewed By: ezyang Differential Revision: D25766631 fbshipit-source-id: f2ab8349dfea325054820fa2c1055180c740574e	2021-01-06 18:13:38 -08:00
Kimish Patel	4d26941a9b	Fix lite interpreter record function issue. (#47457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47457 This fixes two issues. 1. lite interpreter record_function is intended to be used only for root op profiling. At the moment if RECORD_FUNCTION is enabled via Dispatcher then it logs not just root ops but all ops. 2. Because interpreter sets op index that later gets picked up elsewhere (decoupled design), op index that is set in lite interpreter ends up getting used by all the record function calls not just root op. Thus we dont really get correct per op profiling. This diff also fixes this issue. Reviewed By: ilia-cher Differential Revision: D24763689 fbshipit-source-id: 6c1f8bcaec9fb5ebacb2743a5dcf7090ceb176b9	2020-12-02 11:24:45 -08:00
Scott Wolchok	bef460a803	[PyTorch] Return raw ptr from ThreadLocalDebugInfo::get() (#47796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47796 `ThreadLocalDebugInfo::get()` is a hot function. For example, it is called by `DefaultCPUAllocator::allocate()`. Most callers do not even bother to keep the returned `shared_ptr` around, proving that they have no lifetime issues currently. For the rest, it appears that the only way that the returned pointer could become invalid is if they then called a function that swapped out `ThreadLocalDebugInfo` using `ThreadLocalStateGuard`. There are very few such paths, and it doesn't look like any current callers of `ThreadLocalDebugInfo::get()` needed a `shared_ptr` at all. ghstack-source-id: 116979577 Test Plan: 1) reviewers to double-check audit of safety 2) run framework overhead benchmarks Reviewed By: dzhulgakov Differential Revision: D24902978 fbshipit-source-id: d684737cc2568534cac7cd3fb8d623b971c2fd28	2020-11-18 20:37:17 -08:00
Linbin Yu	b28422d444	add overload name for str cmp (#39607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39607 add overload name for strcmp macro to prevent duplicated op names in lite interpreter also reformatted some other files Test Plan: verified these op schema are changed ``` -aten::eq(str a, str b) -> (bool) +aten::eq.str(str a, str b) -> (bool) -aten::ne(str a, str b) -> (bool) +aten::ne.str(str a, str b) -> (bool) -aten::lt(str a, str b) -> (bool) +aten::lt.str(str a, str b) -> (bool) -aten::gt(str a, str b) -> (bool) +aten::gt.str(str a, str b) -> (bool) -aten::le(str a, str b) -> (bool) +aten::le.str(str a, str b) -> (bool) -aten::ge(str a, str b) -> (bool) +aten::ge.str(str a, str b) -> (bool) ``` Reviewed By: iseeyuan Differential Revision: D21913049 fbshipit-source-id: 518db068c8c5b0efd19223f0bd94fc3351335dc4	2020-06-06 23:21:35 -07:00
Xingying Cheng	b08a4aaf3b	[PyTorch] Fix operator perf observer index issue. Summary: Fix operator perf observer index issue. Test Plan: make sure that the operator index is populated correctly, ran benchmarking for pytext_mobile_inference, see result: https://www.internalfb.com/intern/aibench/details/598900068317693 Reviewed By: linbinyu Differential Revision: D21779222 fbshipit-source-id: 0fc3561d83d10cfabd73e1e6b6ee240ce0bafd80	2020-05-28 21:52:24 -07:00
Xingying Cheng	262f70c986	[PyTorch] Remove module and operator observer macros. (#38489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38489 Remove module and operator observer macros. ghstack-source-id: 104290763 Test Plan: a. Verify that QPL is being sent while testing FB4A BI Cloaking: {F236982877} b. Verify that AI Benchmark is working on both module and operator level: https://our.intern.facebook.com/intern/aibench/details/808056762618979 c. Verify that macosx segmentation effect by running buck run xplat/arfx/tracking/segmentation/tools:person_segmentation_demoAppleMac#macosx-x86_64: {F236982853} Reviewed By: ljk53 Differential Revision: D21540838 fbshipit-source-id: 516f84ef5673d4ceed38ae152440a5cbacc6ddaa	2020-05-18 13:28:01 -07:00
Ilia Cherniavskii	43dd8760d7	Move ThreadLocalDebugInfo to c10 (#37774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774 Move ThreadLocalDebugInfo from ATen to C10 Test Plan: Imported from OSS Differential Revision: D21384249 Pulled By: ilia-cher fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2	2020-05-11 19:27:41 -07:00
Ilia Cherniavskii	b4946b96c6	Don't use Profiler key in lite interpreter (#37962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37962 Temporarily re-enable RecordFunction in lite interpreter when profiler key is not set, this allows the profiler to work without profiled wrappers in the build Test Plan: CI Reviewed By: smessmer, linbinyu Differential Revision: D21409120 fbshipit-source-id: 6f0311c8eb55537a03b8bdac69def18a496ec672	2020-05-08 10:47:10 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Michael Suo	b53e6bfd49	[jit] normalize `getMethod` (#37472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472 Our convention is for `findX` to return an optional version and `getX` to assert that the X is there. Fix up `getMethod` to be consistent with this convention. Test Plan: Imported from OSS Differential Revision: D21297543 Pulled By: suo fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d	2020-05-06 15:22:25 -07:00
Xiang Gao	3880f14b64	Canonicalize includes in torch, and add tests for it (#36303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36303 Test Plan: Imported from OSS Differential Revision: D20943003 Pulled By: ezyang fbshipit-source-id: 81fcbaccc1a7eec422bd8347d196bb66a5467884	2020-04-23 08:09:21 -07:00
Martin Yuan	f999d600d0	Fix the typo in operator name string (#36296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36296 When there's no overload name, the operator name string should be "name", instead of "name.". Test Plan: Imported from OSS Differential Revision: D20966759 Pulled By: iseeyuan fbshipit-source-id: b4b31923c7ec5cdca8ac919bd6a84ba51afb6cd1	2020-04-10 12:56:16 -07:00
Martin Yuan	82087ee7f6	Add DICT_CONSTRUCT and NAMED_TUPLE_CONSTRUCT to lite interpreter (#36015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36015 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D20853995 Pulled By: iseeyuan fbshipit-source-id: 153f76d223f9ffc71e2259b741a7e5d78ae63f22	2020-04-04 09:52:58 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Martin Yuan	b9fbec96e6	Support LIST_UNPACK and TUPLE_SLICE in lite interpreter. (#35241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35241 Test Plan: Imported from OSS Differential Revision: D20609439 Pulled By: iseeyuan fbshipit-source-id: 4f352b8641c203aaf9f2204e4080876bd4d47b0c	2020-03-23 18:21:26 -07:00
James Reed	ab76a8206f	[JIT][mobile] Support built-in Function call in lite interpreter (#34676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676 Test Plan: Imported from OSS Differential Revision: D20427938 Pulled By: jamesr66a fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b	2020-03-13 18:24:18 -07:00
Hong Xu	027d7f7ba5	Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623 The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid of it entirely. Close #34502 Test Plan: Imported from OSS Differential Revision: D20420112 Pulled By: albanD fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50	2020-03-13 12:27:22 -07:00
Martin Yuan	01edb7450f	[Lite Trainer] Add necessary registrations for MNIST model (#33717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33717 Because of the special treatment of operator names for lite interpreter, all the operators used in lite interpreter are still prepended by "_". Add the necessary registrations for MNIST model. All the ops with autograd capability are included in torch_mobile_train. After rebase the selective build from D19649074 can be utilized to strip the unused ops. Note that this diff is for feasibility test. The training accuracy are not covered in the test. ghstack-source-id: 97780066 Test Plan: ``` buck run xplat/caffe2/fb/lite_trainer:lite_trainer -c pt.disable_gen_tracing=1 -c pt.static_dispatch=0 -- --model=/path/MnistModel.bc ``` {F227898221} Reviewed By: dreiss Differential Revision: D19743201 fbshipit-source-id: cacadd76f3729faa0018d147a69466bbf54312fd	2020-03-06 15:49:03 -08:00
Martin Yuan	ccf4d69b75	[Lite Interpreter] Enable __setstate__ (#33294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294 1. Serialize bytecode of __setstate__ and run it when loading the model. 2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile. Test Plan: Imported from OSS Differential Revision: D20162898 Pulled By: iseeyuan fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c	2020-03-05 15:24:21 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Martin Yuan	758ad516f3	[Lite interpreter] Pass shared_ptr properly (#33667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33667 Pass shared_ptr properly according to C++ guidances. Thank kimishpatel for pointing it out. Test Plan: Imported from OSS Differential Revision: D20111001 Pulled By: iseeyuan fbshipit-source-id: 213a0f950a7f3b9199d789dc0155911f6102d77a	2020-02-25 21:40:05 -08:00
Martin Yuan	5782758b54	Add instructions and operators for new bytecode format of PyText model (#33555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555 A quick fix for the PyText model (in internal production) on the new bytecode format. Test Plan: Imported from OSS Differential Revision: D20008266 Pulled By: iseeyuan fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440	2020-02-20 15:05:37 -08:00
Zachary DeVito	7f2c25b6fa	Move special ops into interpreter (#32889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889 Common primitive ops that have special inputs make it very hard to serialize the bytecode for mobile because information about how the op behaves is hidden in the Node. This changes how we handle the following ops so that they are encoded as their own interpreter bytecodes. ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, , int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` This leaves a state where the _only_ remaining Node*-consuming builtins are things that are only introduced during JIT optimization and will not appear in mobile code. Serialization of bytecode can now be made to directly write the CodeImpl object without modification. Test Plan: Imported from OSS Differential Revision: D19673157 Pulled By: zdevito fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26	2020-02-18 15:07:01 -08:00
Xingying Cheng	4493b10500	[PyTorch] Gate out mobile operator logging observer. Summary: Introduce separate gating for mobile operator logging observer. Reviewed By: ljk53 Differential Revision: D19665993 fbshipit-source-id: b81a228c55110a02edb8c2b6f9fd02e750b2ad69	2020-01-31 17:25:53 -08:00
Zachary DeVito	14593f077f	remove list specialization from ivalue (#30734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734 What are specialized lists? The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types. e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>. Why do we have specialized lists? When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>, std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain these same types. Conversion was just unwrapping the IValue, very easy and cheap. What is the problem with specialized lists? We end up with significant special cases through the compiler. Other types like Dict are not specialized. So in the Pickler, for instance, there is a single piece of logic to handle their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't match Python, leading to problems along translation boundaries. Our pickle serialization is slightly different than python, so it is harder to load objects from our IValue serialization as Python values. They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++ bindings to TorchScript. This would entail having a single torch::List class (untemplated) that can be used to construct inputs. This is made much harder if the underlying ivalue needs to be different depending on the type inside the list. The ideal case would be to have a constructor like ``` template<typename T> List(std::vector<T> foo); ``` It would then set up the type tags correctly based on type T, without the need for passing tags. Do specialized lists improve perf? Not in a way we have been able to measure. Our major concern initially was having to translate a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern for aten::_convolution which takes a number of mostly-constant lists of integers. However, when we measure the effect of actually having to do this conversion for an aten::_convolution, it does not take measurable time (benchmark results below). This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code. What are the issues removing them? This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly the same. The only visible change is that toTensorListRef and family have turned into toTensorVector because they now return by value a copy of the list as a vector. Further PRs can then clean up the complexity issues that arose from speclization. This will likely involve removing the isTensorList/isIntList functions, and refactoring the code that used them to work generically. At some point we will also change serialization to no longer write specialized lists in the pickle binary. This is forward incompatible, so will go in its own PR. Benchmark: ``` import torch import torch.nn as nn import torch.nn.functional as F import time class MnistNet(nn.Module): def __init__(self): super(MnistNet, self).__init__() self.conv1 = nn.Conv2d(1, 1, kernel_size=1) self.conv2 = nn.Conv2d(1, 1, kernel_size=1) def forward(self, x): for i in range(10): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x model = MnistNet() x = torch.rand(1, 1, 1, 1) r = torch.jit.trace(model, x ) r(x) r(x) r(x) r(x) print(torch.jit.last_executed_optimized_graph()) while True: b = time.time() for i in range(100): r(x) e = time.time() print(e - b) ``` Results (no observable difference): ``` Before (actual conv) 0.13251137733459473 0.13260436058044434 0.13276338577270508 0.1327497959136963 0.13250041007995605 0.13270330429077148 0.13290190696716309 0.13265132904052734 0.13274288177490234 0.1326758861541748 0.13253355026245117 0.13254785537719727 0.13260746002197266 0.13285017013549805 0.13264012336730957 0.132490873336792 0.13280034065246582 0.13243484497070312 0.1325232982635498 0.1326127052307129 0.13264131546020508 0.13274383544921875 0.13298296928405762 0.1326909065246582 ------------------- After (actual conv) 0.13127517700195312 0.13150334358215332 0.13092470169067383 0.13102364540100098 0.13134360313415527 0.13155555725097656 0.13314104080200195 0.13151955604553223 0.13160037994384766 0.1315293312072754 0.13137340545654297 0.13148093223571777 0.131455659866333 0.1327371597290039 0.13134026527404785 0.13152337074279785 0.13151192665100098 0.13165974617004395 0.13403725624084473 0.13251852989196777 0.13135504722595215 0.1315624713897705 0.1317615509033203 0.1314380168914795 0.13157200813293457 -------------------- The following replace the convolution operator with a no-op, to show that even if the conv op was made faster, then we still would not see a difference: Before (fake conv) 0.0069539546966552734 0.0069522857666015625 0.007120847702026367 0.007344722747802734 0.007689952850341797 0.007932662963867188 0.00761723518371582 0.007501363754272461 0.007532835006713867 0.007141828536987305 0.007174253463745117 0.007114410400390625 0.007071495056152344 ------------------ After (fake conv) 0.007458209991455078 0.007337093353271484 0.007268190383911133 0.007313251495361328 0.007306575775146484 0.007468700408935547 0.0073091983795166016 0.007308483123779297 0.007538318634033203 0.007356882095336914 0.007464170455932617 0.007372140884399414 ``` Test Plan: Imported from OSS Differential Revision: D18814702 Pulled By: zdevito fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6	2020-01-12 18:28:25 -08:00
Xingying Cheng	78254eab45	Add mobile operator observer for qpl logging. Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a). Test Plan: Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent: 1. buck install -r fb4a 2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params: a. sample_rate: 1.0 b. enabled: true c. use_bytedoc_pytorch_model: true d. use_bytedoc_caffe2_model: false e. use_full_jit: false 3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage; 4. Click on the ads, wait for the offsite ads loads; 5. Click back to news feed; 6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged: {F223250762} Reviewed By: ljk53 Differential Revision: D18131224 fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0	2019-12-06 11:55:32 -08:00
Martin Yuan	6980cb2519	Add overload name to JIT prim operators, version 2 (#29960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960 Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators. Test Plan: Imported from OSS Differential Revision: D18555484 fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34	2019-11-16 23:59:07 -08:00
Zachary DeVito	a5b4d78c6d	Revert D18499600: Add overload name to JIT prim operators. Test Plan: revert-hammer Differential Revision: D18499600 Original commit changeset: a1b49e64c908 fbshipit-source-id: 73e27b72f53799c0133850d2352ae8cd8a82d87c	2019-11-15 18:36:17 -08:00
Martin Yuan	ff4e782e79	Add overload name to JIT prim operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29656 Test Plan: Imported from OSS Differential Revision: D18499600 fbshipit-source-id: a1b49e64c908d16d40a6ddb048182d7bbe80bcd6	2019-11-15 16:22:47 -08:00
Martin Yuan	04cd777ed4	Create BUCK build for lite-interpreter (#27546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27546 Add files in csrc/jit/mobile folder to torch_core, as a first step to have light interpreter built in BUCK. Next the files will be independent of torch_core (T54912812) ghstack-source-id: 91523987 Test Plan: buck build -c pytorch.enable_rtti=1 -c project.ignore= -c ndk.app_platform=android-23 -c user.libcxx_cflags=-DFOLLY_USE_LIBCPP=1 -c user.libcxx_cxxflags=-DFOLLY_USE_LIBCPP=1 -c ndk.cxx_runtime=libcxx -c user.ndk_cxxflags=-g0 //xplat/experimental/pytorch/mobile:lite_predictorAndroid#android-armv7 && adb push buck-out/gen/xplat/experimental/pytorch/mobile/lite_predictorAndroid#android-armv7 /data/local/tmp/ In adb shell: data/local/tmp/lite_predictorAndroid\#android-armv7 add_it.bc buck build -c project.ignore= @//fbcode/mode/dev-asan //xplat/experimental/pytorch/mobile:lite_predictor Reviewed By: ljk53 Differential Revision: D17717547 fbshipit-source-id: 4c00a35eb231968d05d0d7b56bcfd5dc0258d4bb	2019-10-08 15:20:30 -07:00
Martin Yuan	19ab5381c3	Add OPN instruction and vararg operator table (#27104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104 * The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter. * (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic"). * A vararg operator table is built with void(int input_size, Stack& stack) functions. ## Unit test LiteInterpreterConv covers OPN instruction and conv operator. Test Plan: Imported from OSS Differential Revision: D17762853 fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83	2019-10-04 09:35:53 -07:00
Martin Yuan	7fc06ea541	Bytecode export flow (#25187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187 The bytecode export flow: dump the bytecode format for the light weighted interpreter. * The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested). * Main API: torch::jit::script::Module::save(filename, extra_files, bool bytecode_format = false). * Both bytecode and module object are exported in pickle format. * The module object (in data.pkl) is the same as the original JIT model. * The serializer is dependent on pickle only (no protobuf or Json). * The major functionality is forked in ScriptModuleSerializer2::serialize(). * The test loader is test_bc_export.cpp. * Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants). * Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) . * Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148). The output layout looks like: * folders of methods. * In each method folder (for example, forward/): * bytecode.pkl: instructions and operators * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder. * data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript. Test Plan: Imported from OSS Differential Revision: D17076411 fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046	2019-09-25 16:35:45 -07:00

44 Commits