pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Bo Wu	bf610f08b0	Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions" Summary: as title Test Plan: ``` buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform ... ############## Start inline_cvr_post_imp_model Test Results Analysis ############## I1226 22:03:56.789000 3346280 test_driver.py:139 UNKNOWN ] Test finished in 808.2743511786684 seconds. +-------------------------+---------+------------------------+-----------------+ \| Test Case \| Status \| Message \| Model Entity ID \| +-------------------------+---------+------------------------+-----------------+ \| SmallWorld_release_test \| Success \| finished successfully. \| 987987491 \| +-------------------------+---------+------------------------+-----------------+ I1226 22:03:56.790000 3346280 test_driver.py:143 UNKNOWN ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework I1226 22:03:56.792000 3346280 test_driver.py:160 UNKNOWN ] Calling cleanup I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385 UNKNOWN ] Stopping launched jobs 1 I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager ``` Reviewed By: seemethere Differential Revision: D33325936 fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e	2021-12-27 09:11:46 -08:00
Shunting Zhang	911d527b87	Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339 When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message. Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName . Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want. Code under scripts/shunting are just my own experimental code. I can split them out if requested. ghstack-source-id: 146221879 Test Plan: buck test mode/opt //caffe2/test:jit Reviewed By: gmagogsfm Differential Revision: D33282878 fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d	2021-12-24 00:25:40 -08:00
David Berard	aa9fbb9ae9	[JIT] check stack size after calling operator (#68788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788 In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema) Test Plan: Run this in debug mode and verify that it doesn't throw an assert ``` import torch class Thing(torch.nn.Module): torch.jit.export def en(self, x: torch.Tensor): return torch.add(x, 2.0) def forward(self, x: torch.Tensor, y: torch.Tensor): a = torch.mm(x, y) b = torch.nn.functional.gelu(a) c = self.en(b) return c.std_mean() if __name__ == '__main__': unsc = Thing() thing = torch.jit.script(unsc) x = torch.randn(4, 4) y = torch.randn(4, 4) std, mean = thing.forward(x, y) print(std, mean) print(str(thing.forward.graph)) ``` Reviewed By: gchanan Differential Revision: D32625256 Pulled By: davidberard98 fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5	2021-12-07 11:43:50 -08:00
Scott Wolchok	3e45739543	[PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326 Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local. ghstack-source-id: 144641680 Test Plan: {F684148304} CI Reviewed By: zhxchen17 Differential Revision: D32812841 fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0	2021-12-03 11:09:05 -08:00
Scott Wolchok	2c84b010e6	[PyTorch] Use toObjectRef in JIT interpreter (#69324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324 This slightly shrinks runImpl. Before: - Move pointer out of IValue - Clear the IValue to none - Do our thing with the Object - destroy the intrusive_ptr on the C stack - destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway) After: - Grab the pointer out of IValue - Do our thing with the Object - Decref the pointer in the IValue on the JIT stack as we assign over it We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well. ghstack-source-id: 144638920 Test Plan: Inspected assembly to verify shorter runImpl Tried to microbenchmark (D32809454) but can't show a difference. Reviewed By: gchanan Differential Revision: D32812252 fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5	2021-12-03 11:07:16 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
Scott Wolchok	7cd62621fb	[PyTorch] Adopt faster Tuple::create (#65381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381 The previous diff adds a way to make Tuples of size 3 or less more efficiently. This diff makes it easier to hit that path and updates a bunch of callsites to hit it. ghstack-source-id: 142065832 Test Plan: CI Reviewed By: ezyang Differential Revision: D31069538 fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628	2021-11-02 10:10:31 -07:00
Zhengxu Chen	5ef62c88a9	[jit] Replace get_executor() with call() in abstract Function interface. (#65969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969 ghstack-source-id: 141759210 Test Plan: no behavior change. Reviewed By: anjali411 Differential Revision: D31326151 fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4	2021-10-28 13:11:29 -07:00
Giuseppe Ottaviano	72803dbcfd	[caffe2] Fix invalid vector accesses and polar() call (#66757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757 `InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future. Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++). Test Plan: JIT tests pass. Reviewed By: zhxchen17 Differential Revision: D31715587 fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33	2021-10-19 00:29:54 -07:00
Chen Lai	8d5b95019d	[PyTorch Edge] Support default args with out arg, flag off (#63540 ) Summary: 1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag. 2. Add two unittests to cover this type of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540 ghstack-source-id: 137211562 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg ``` Reviewed By: raziel, iseeyuan, tugsbayasgalan Differential Revision: D30414156 fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f	2021-09-02 01:36:16 -07:00
Zhengxu Chen	ac99d63f83	[jit] Make operation call accept Stack& instead Stack* (#63414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414 Misuse of raw pointer in here where stack is never nullable. ghstack-source-id: 136938318 Test Plan: compiles. Imported from OSS Reviewed By: ejguan Differential Revision: D30375410 fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee	2021-08-30 11:49:20 -07:00
Don Jang	e7724bb100	[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348 This change addresses singlaiiit's comment on D30241792 (`61b49c8e41`), which makes the JIT interpreter's behavior consistent between `future` is set and not. Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path. Reviewed By: singlaiiit Differential Revision: D30347782 fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8	2021-08-16 17:32:13 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
Don Jang	61b49c8e41	[JIT] Add a flag to rethrow caught exception in jit interpreter (#63073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073 It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase. This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter. Reviewed By: Krovatkin Differential Revision: D30241792 fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c	2021-08-13 08:44:24 -07:00
Richard Barnes	4fdb9579fa	irange-ify 12 (#62120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879713 fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136	2021-08-09 15:31:51 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Zhengxu Chen	6643df2680	[jit] Use computed loop to dispatch to next instruction in interpreter. (#60211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60211 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29211283 fbshipit-source-id: 2f87b5a78d4fc00ce11ed509fc15db35332690b6	2021-06-30 17:44:26 -07:00
Richard Barnes	3979cb0656	irange for size_t (#55320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27572577 fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03	2021-06-03 01:04:13 -07:00
Zhengxu Chen	2b0ec9c3cf	Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783 This reverts commit `fc804b5def`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28617037 Pulled By: zhxchen17 fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14	2021-05-24 18:23:21 -07:00
Edward Yang	fc804b5def	Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles. Test Plan: revert-hammer Differential Revision: D28133579 (`034a238bab`) Original commit changeset: e7e30e961513 fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e	2021-05-21 08:18:40 -07:00
Zhengxu Chen	034a238bab	[jit] Implement ScriptProfile to collect instruction profiles. (#57397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397 Introduces two main classes in C++ runtime: ScriptProfile is the implementation for enalbing and disabling interpreter profiling in C++. This should be only used from Python, and we will add corresponding Python API in the next diff. InstructionSpan is a utility class to instrument execution of each single instruction. A start timestamp is recorded in the consturctor, and an end timestamp is recorded in the destructor. During destruction, this will send runtime data to all enabled ScriptProfile instances. Test Plan: build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic' Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28133579 fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b	2021-05-20 14:11:03 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fc9c486044	Add enabling default instructions flag for mobile (#57778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57778 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28268997 Pulled By: tugsbayasgalan fbshipit-source-id: 5571b233d03d3aa80c820ee4245b4d0d3b70f924	2021-05-10 17:26:05 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b0c27b44cf	Enable backward/forward compatibility for TS runtime (#57498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57498 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28162448 Pulled By: tugsbayasgalan fbshipit-source-id: 5c21ced42a22aca7cee089e876e9d98d32f68955	2021-05-07 15:41:45 -07:00
Luca Wehrstedt	36e47af58b	Pass reference to parent future in callbacks (#57635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635 Note: this PR looks massive, but it's just one simple change, codemodded many times. In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't). ghstack-source-id: 128296580 Test Plan: CI Reviewed By: wanchaol Differential Revision: D28178783 fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081	2021-05-07 03:59:18 -07:00
Zhengxu Chen	8b38458011	[jit] Break interpreter.cpp into smaller files. (#56546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56546 A code move for CodeImpl and Frame to a subdirectory runtime/interpreter, so that it's easier to reuse them and navigate the interpreter code. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D28133580 fbshipit-source-id: 8de89a4e8e637836625e1ac1db95f0a3353da670	2021-05-06 16:43:57 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Tugsbayasgalan Manlaibaatar	2041cd6707	Enable forward/backward compatibility in TS mobile (#56079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27828149 Pulled By: tugsbayasgalan fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da	2021-04-23 16:55:18 -07:00
Tugsbayasgalan Manlaibaatar	6de1d9b2d0	Fix bug in emitUse to drop all values that are marked as drop (#56652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56652 Previous code doesn't drop prim::Constant values even when they are marked as drop. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27927413 fbshipit-source-id: 67cd52cf292e111be2830ccf93b0e7b089e49001	2021-04-23 12:42:51 -07:00
Mike Ruberry	c0ac0fef4e	Revert D27448156: irange for size_t Test Plan: revert-hammer Differential Revision: D27448156 (`041b4431b2`) Original commit changeset: 585da57d4de9 fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365	2021-04-03 19:14:00 -07:00
Richard Barnes	041b4431b2	irange for size_t (#55163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27448156 fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1	2021-04-02 23:22:29 -07:00
Edward Yang	e70f3d1189	Nasty little hack to preserve NotImplementedError raised in interpreter (#54627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54627 This is the simplest little fix to get interpreter to preserve NotImplementedError, so that the test suite doesn't start choking on meta tensors not working in interpreter. It is sound and correct but doesn't work for other c10::Error subclasses with special handling. A more proper fix is requested at https://github.com/pytorch/pytorch/issues/54612 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: wenleix, ngimel Differential Revision: D27328666 Pulled By: ezyang fbshipit-source-id: 483bef062de5a907d20e2d9e25eafe2d5197cf8d	2021-03-27 11:53:06 -07:00
Scott Wolchok	3959d393b8	[PyTorch][JIT] Less shared_ptr use in dictConstruct (#54110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110 dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals. ghstack-source-id: 124150642 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27101782 fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a	2021-03-22 18:31:27 -07:00
Scott Wolchok	4a24c552cc	[PyTorch] Fix string copy in WARN path for both interpreters (#54076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076 If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around. ghstack-source-id: 124040891 Test Plan: existing tests spot-checked regular interpreter assembly; seems better Reviewed By: dhruvbird, walterddr Differential Revision: D27087204 fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b	2021-03-17 08:44:08 -07:00
Scott Wolchok	665d5e2a4f	[PyTorch][JIT] Audit interpreter for extra copies (#54029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54029 I found what appear to be some missed moves and/or extra copies in the JIT interpreter. ghstack-source-id: 123958682 Test Plan: Existing CI for correctness Ran AdIndexer inline_cvr local_ro model benchmark with static_runtime off via `env bin=/tmp/ptvsc2_predictor_bench.StaticDispatchModeFile static_runtime=0 caffe2=0 scripts/swolchok/static_runtime/inline_cvr/run_local_ro.sh` before: ``` I0315 14:25:23.916893 3075680 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01635. Iters per second: 983.914 I0315 14:26:05.536207 3080560 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01689. Iters per second: 983.395 I0315 14:26:47.510561 3083335 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.02697. Iters per second: 973.737 I0315 14:27:29.024830 3086767 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01326. Iters per second: 986.918 I0315 14:28:10.849496 3091323 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.023. Iters per second: 977.517 ``` after: ``` I0315 14:17:43.280469 3046242 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.997838. Iters per second: 1002.17 I0315 14:18:24.244606 3046861 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00173. Iters per second: 998.269 I0315 14:19:05.208899 3051998 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00187. Iters per second: 998.136 I0315 14:19:46.103854 3055392 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00073. Iters per second: 999.27 I0315 14:20:27.011411 3056062 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.999121. Iters per second: 1000.88 ``` (This was just a convenient workload I had handy; the plan of record is to use static runtime for inline_cvr inference AIUI.) Reviewed By: dhruvbird, walterddr Differential Revision: D27060762 fbshipit-source-id: 5567206d7c2d9ae99776ce5524caf09ec2035e87	2021-03-16 15:09:09 -07:00
jiej	4d94ee566e	Ge v1 (#52136 ) Summary: This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136 Reviewed By: pbelevich Differential Revision: D26693978 Pulled By: Krovatkin fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52	2021-02-28 00:53:13 -08:00
jiej	dd1c2a06b7	refactor profiling optional (#47667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47667 Test Plan: Imported from OSS Reviewed By: anjali411, ngimel Differential Revision: D25255572 Pulled By: Krovatkin fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f	2021-01-22 14:45:28 -08:00
Scott Wolchok	4a0d17ba2d	[PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228 `fastmod -m 'expect(<((at\|c10)::)?\w+Type>\s*)->' 'expectRef${1}.'` Presuming it builds, this is a safe change: the result of `expect()` wasn't being saved anywhere, so we didn't need it, so we can take a reference instead of a new `shared_ptr`. ghstack-source-id: 119782961 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837374 fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08	2021-01-13 16:13:55 -08:00
Thomas Viehmann	ea087e2d92	JIT: guard DifferentiableGraph node (#49433 ) Summary: This adds guarding for DifferentiableGraph nodes in order to not depend on Also bailing out on required gradients for the CUDA fuser. Fixes https://github.com/pytorch/pytorch/issues/49299 I still need to look into a handful of failing tests, but maybe it can be a discussion basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433 Reviewed By: ngimel Differential Revision: D25681374 Pulled By: Krovatkin fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296	2021-01-08 20:01:27 -08:00
Scott Wolchok	ef1fa547ba	[PyTorch] Use expectRef() when calling listConstruct (#50062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50062 Avoids creating an extra shared_ptr. ghstack-source-id: 119325645 Test Plan: CI Reviewed By: ezyang Differential Revision: D25766631 fbshipit-source-id: f2ab8349dfea325054820fa2c1055180c740574e	2021-01-06 18:13:38 -08:00
Scott Wolchok	480a756194	[PyTorch] IValue::toTensor can now return const Tensor& (#48868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48868 Building on the previous diff, we can make `toTensor()` return a `const Tensor&`, which should make it easier to avoid reference counting. ghstack-source-id: 119327372 Test Plan: internal benchmarks. Reviewed By: bwasti Differential Revision: D25325379 fbshipit-source-id: ca699632901691bcee432f595f75b0a4416d55dd	2021-01-06 08:40:50 -08:00
Yanan Cao	7518f54611	Add flag torch_jit_disable_warning_prints to allow disabling all warnings.warn (#49313 ) Summary: Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn. This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49313 Reviewed By: SplitInfinity Differential Revision: D25534274 Pulled By: gmagogsfm fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2	2020-12-15 15:22:41 -08:00
Ilia Cherniavskii	db5e5b439c	Extra sampling of record function events [resend] (#49114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49114 resend of https://github.com/pytorch/pytorch/pull/48289 Test Plan: see 48289 Reviewed By: robieta Differential Revision: D25443365 Pulled By: ilia-cher fbshipit-source-id: c15ac312222bb4d744e10199ed79801cccae8227	2020-12-11 12:53:37 -08:00
Bram Wasti	f4226b5c90	[static runtime] add static subgraph fusion pass (#49185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49185 This diff adds a fusion feature that will let us use static runtime for parts of the graph. This will prove useful in cases where fully eliminating control flow is hard etc. TODO: [x] factor out into separate fusion file [x] add python test case [x] add graph that isn't fully lowered test case [x] add graph that has weird list/tuple outputs test case the loop example looks quite good: ``` graph(%a.1 : Tensor, %b.1 : Tensor, %iters.1 : int): %12 : bool = prim::Constant[value=1]() # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 %c.2 : Tensor = prim::StaticSubgraph_0(%a.1, %b.1) %c : Tensor = prim::Loop(%iters.1, %12, %c.2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:110:4 block0(%i : int, %c.12 : Tensor): %c.10 : Tensor = prim::StaticSubgraph_1(%a.1, %c.12, %b.1) -> (%12, %c.10) return (%c) with prim::StaticSubgraph_0 = graph(%0 : Tensor, %4 : Tensor): %5 : int = prim::Constant[value=2]() %6 : Tensor = aten::mul(%4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:12 %2 : int = prim::Constant[value=1]() %c.2 : Tensor = aten::add(%0, %6, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:109:8 return (%c.2) with prim::StaticSubgraph_1 = graph(%1 : Tensor, %7 : Tensor, %8 : Tensor): %9 : int = prim::Constant[value=1]() %c.4 : Tensor = aten::add(%7, %8, %9) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:111:12 %5 : int = prim::Constant[value=2]() %c.7 : Tensor = aten::mul_(%c.4, %5) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:112:8 %2 : int = prim::Constant[value=1]() %c.10 : Tensor = aten::sub_(%c.7, %1, %2) # /data/users/bwasti/fbsource/fbcode/buck-out/dev/gen/caffe2/test/static_runtime#binary,link-tree/test_static_runtime.py:113:8 return (%c.10) ``` (Note: this ignores all push blocking failures!) Test Plan: buck test mode/no-gpu //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/no-gpu caffe2/test:static_runtime Reviewed By: bertmaher Differential Revision: D25385702 fbshipit-source-id: 2f24af4f11d92a959167facd03fbd24f464a6098	2020-12-10 14:03:11 -08:00
Elias Ellison	70853c5021	Dont use symbolic shapes check (#47810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47810 `bindSymbolicShapes` wasn't checking device or dtype at all, so it wasn't correct. It also isn't being used anywhere (num_profiles is always 1 and we don't use symbolic shapes). We shouldn't have it on until we are actually using symoblic shapes. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286214 Pulled By: eellison fbshipit-source-id: 10fb175d0c75bd0159fb63aafc3b59cc5fd6c5af	2020-12-10 12:14:58 -08:00
jiej	a6fa3b2682	adding profile_ivalue (#47666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47666 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25255573 Pulled By: Krovatkin fbshipit-source-id: 5d8753e4040a3d96105d28d26728125947c7a638	2020-12-09 15:29:15 -08:00
Mike Ruberry	9f7fb54693	Revert D25111515: Extra sampling of record function events Test Plan: revert-hammer Differential Revision: D25111515 (`09b974c2d5`) Original commit changeset: 0d572a3636fe fbshipit-source-id: d558d8052924d937d86db7dd40dc6388e6d28823	2020-12-09 08:37:17 -08:00
Ilia Cherniavskii	09b974c2d5	Extra sampling of record function events (#48289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48289 Adding extra sampling step when dispatching RecordFunction. (Note: this ignores all push blocking failures!) Reviewed By: swolchok Differential Revision: D25111515 Pulled By: ilia-cher fbshipit-source-id: 0d572a3636fe649a47ec47901826bbfc08368937	2020-12-09 02:29:13 -08:00
Chen Lai	416dc68341	[Pytorch][Annotation] Update inlined callstack with module instance info (#47416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47416 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D24752846 Pulled By: cccclai fbshipit-source-id: 94d3c18c56161d1de3a16bb7c93502fedf71644c	2020-12-03 10:44:46 -08:00
Meghan Lele	fc1153a8be	[JIT] Fix clang-tidy warnings in jit/runtime (#47992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47992 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25258645 Pulled By: SplitInfinity fbshipit-source-id: b3e4576400c101b247e80cb4044fc04471f39a47	2020-12-02 12:35:42 -08:00
Scott Wolchok	3ceec73db9	[PyTorch] Lazily construct guts of RecordFunction (#47550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47550 I saw over 5% time spent in RecordFunction's ctor during one of our framework overhead benchmarks in `perf`. Inspecting assembly, it looks like we just create a lot of RecordFunctions and the constructor has to initialize a relatively large number of member variables. This diff takes advantage of the observation that RecordFunction does nothing most of the time by moving its state onto the heap and only allocating it if needed. It does add the requirement that profiling is actually active to use RecordFunction accessors, which I hope won't be a problem. ghstack-source-id: 117498489 Test Plan: Run framework overhead benchmarks. Savings ranging from 3% (InPlace_ndim_1) to 7.5% (empty_ndim_3) wall time. Reviewed By: ilia-cher Differential Revision: D24812213 fbshipit-source-id: 823a1e2ca573d9a8d7c5b7bb3972987faaacd11a	2020-12-01 13:07:17 -08:00

1 2

87 Commits