pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xuehai Pan	d5cdc36943	[BE][10/16] fix typos in torch/ (torch/csrc/jit/) (#156320 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156320 Approved by: https://github.com/albanD ghstack dependencies: #156318	2025-07-02 22:55:29 +00:00
Nikita Shulga	08e716fc70	[BE] Fix `-Wextra-semi` warning (#153887 ) Introduced by https://github.com/pytorch/pytorch/pull/153645 Semicolon is not needed after closing curly bracket defining a class method. Not sure why CI did not catch it, but my local builds are now erroring out with ``` [19/97] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dead_code_elimination.cpp.o In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/jit/passes/dead_code_elimination.cpp:4: /Users/nshulga/git/pytorch/pytorch/torch/csrc/jit/ir/alias_analysis.h:356:64: warning: extra ';' after member function definition [-Wextra-semi] 356 \| ValueAndMemoryLocationSet(const AliasDb* db) : aliasDb_(db){}; \| ^ ``` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/153887 Approved by: https://github.com/wdvr, https://github.com/davidberard98	2025-05-19 22:25:03 +00:00
David Berard	a237831bc2	[JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value> (#153645 ) Summary: TL;DR: make DCE faster by replacing a Set<Value> with a MemoryLocations sparse bitset (representing all the memory locations stored by the collection of all values in the set). Details The goal of this PR is to optimize this function from AliasDb: ``` bool AliasDb::writesToAlias(Node* n, const ValueSet& vs) const { const auto writtenTo = getWrites(n); if (writtenTo.empty()) { return false; } MemoryLocations locs; for (const auto v : vs) { auto it = elementMap_.find(v); if (it != elementMap_.end()) { const auto& vlocs = memoryDAG_->getMemoryLocations(it->second); if (writtenTo.intersects(vlocs)) { return true; } } } return false; } ``` In the DCE use case, we have a ValueSet of live values, into which we insert `Value`s; and sometimes need to check whether a node mutates any of the live values using `writesToAlias`. Looping through all the values in the ValueSet and indexing into the elementMap_ is slow; so if we can pre-compute the MemoryLocations set, this speeds up the function. In some large model examples, I see ~15-25x speedups from this change. Implementation: To avoid exposing too many details of AliasDb, I introduce a friend class `ValueAndMemoryLocationSet`, which is an insert-only set of Values, which also maintains the corresponding MemoryLocations. Then in AliasDb, I use `ValueAndMemoryLocationSet` if we're using AliasDb for analysis, and otherwise use a `Set<Value>` if we don't have AliasDb. Test Plan: Rely on unit tests. Differential Revision: D74827086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153645 Approved by: https://github.com/eellison	2025-05-19 21:04:59 +00:00
cyy	5b3b2b9cc7	[7/N] Fix clang-tidy warnings in jit (#131996 ) Follows #131986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131996 Approved by: https://github.com/ezyang	2024-07-29 01:21:18 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Ivan Kobzarev	2fc73622f8	[jit] Support Awaitable type (#90863 ) We want to make TorchRec sharded models TorchScriptable. TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212). In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W. At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization. To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`. In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`. Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`. Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, args)`. The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called. For example (more examples in this PR `test/jit/test_await.py`) ``` def delayed(z: Tensor) -> Tensor: return Tensor 3 @torch.jit.script def fn(x: Tensor): aw: Await[int] = torch.jit._awaitable(delayed, 99) a = torch.eye(2) b = torch.jit._awaitable_wait(aw) return a + b + x ``` Functions semantics: `_awaitable(func -> Callable[Tuple[...], W], args, *kwargs) -> Await[W]` Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call. `_awaitable_wait(Await[W]) -> W` Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first. `_awaitable_nowait(W) -> Await[W]` Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases. Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863 Approved by: https://github.com/davidberard98	2023-01-30 17:38:59 +00:00
David Berard	6830573c5a	[JIT] Revert SchemaInfo usage in AliasDb (#82475 ) Temporary revert until we investigate performance issues in #82343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82475 Approved by: https://github.com/goldenxuett	2022-07-29 21:26:01 +00:00
goldenxuett	9a5fa15ea8	[JIT] Remove BatchNorm and InstanceNorm special cases from AliasDB and replace with SchemaInfo is_mutable checks (#81785 ) - Generalized AnalyzeImpl cases for batchNorm and InstanceNorm in alias_analysis.cpp using schema_info. - Tested by ensuring all aliasDB special case checks for batchNorm and instanceNorm pass as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81785 Approved by: https://github.com/davidberard98	2022-07-23 05:50:39 +00:00
Wei-Sheng Chin	767facdb18	Expose Lint for AliasDb (#81579 ) This helps external users (e.g., me) to write their graph fuser and passes. Otherwise, uses of AliasDb could throw when calling Lint. Fixes #79963 @davidberard98, please take a look. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/81579 Approved by: https://github.com/davidberard98	2022-07-20 19:43:25 +00:00
John Clow	cf6499e5e8	Update docs to say that wildcard only aliases other wildcards (#81341 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81341 Approved by: https://github.com/davidberard98	2022-07-14 21:11:37 +00:00
Elias Ellison	ab6395fc65	Add api for recursively analyzing function calls (#73329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73329 There is a quantization use case for having better alias analysis with function calls remaining. This does the relatively dumb approach of getting the inlined graph of each function call, and then analyzing that subgraph. Since we need a unique single analysis of every `Value*`, for every function call make a copy of the graph for every analysis past the first. This is relatively slow, but given the limited use case here should work well enough (and is no slower than calling the inlining pass). cc vkuzo Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D34451424 Pulled By: eellison fbshipit-source-id: b7c7e54679d723f5ded1e11ffb32eb6d2176431d (cherry picked from commit 81a42b31522b890311a3f512448b372c4ebbefd1)	2022-02-28 17:44:45 +00:00
jiej	2d110d514f	Nvfuser code bump 2_1_2022 (#72127 ) Summary: Things changed in this PR that requires review: 1. aten/src/ATen/core/interned_strings.h 2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation 3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry 4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported nvfuser code update: 1. codegen improvements and performance tuning 2. integration bug fixes for shape expression logic 3. kernel segmentation update to address perf regression from horizontal fusion 4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor Things reverted from local changes: aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127 Reviewed By: HamidShojanazeri Differential Revision: D34113233 Pulled By: jbschlosser fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74 (cherry picked from commit `e009bc5c4e`)	2022-02-15 00:43:16 +00:00
Scott Wolchok	3ad971798f	[PyTorch][JIT] use a better hash table in alias analysis (#69854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69854 ghstack-source-id: 148315147 Test Plan: Time reported to start up static runtime on ctr_mobile_feed local_ro net is 8.8s instead of 9.5s Reviewed By: suo, d1jang Differential Revision: D33039733 fbshipit-source-id: 218dc7ff9aa421a352b71952ec77757368095860 (cherry picked from commit `7586712948`)	2022-02-05 02:15:26 +00:00
Scott Wolchok	1bbea3c3a2	[PyTorch][JIT] Support mayContainAlias(Value, ArrayRef<Value>) (#69853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853 We can implement this overload more efficiently. ghstack-source-id: 146924693 Test Plan: patched alias_analysis tests Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s. Reviewed By: mikeiovine Differential Revision: D33039731 fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397	2022-01-12 16:53:54 -08:00
Elias Ellison	9bccb31306	Remove precise tuple construct flag (#71121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D33515234 Pulled By: eellison fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8	2022-01-11 22:12:36 -08:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Scott Wolchok	9767282643	[jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344 Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them. ghstack-source-id: 140484162 Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements Reviewed By: ejguan Differential Revision: D31027363 fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb	2021-10-13 14:47:30 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Peng Wu	838d3079ad	Lazily initialize alias db in remove_mutation opt (#55949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55949 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27793881 fbshipit-source-id: eebde5b5142d8fecfee4756604d313b0da809882	2021-04-19 09:45:33 -07:00
Lemo	ca3ce77746	Dump torch::jit::AliasDb objects as Graphviz files (#50452 ) Summary: This PR adds a simple debugging helper which exports the AliasDb state as a [GraphViz](http://www.graphviz.org/) graph definition. The generated files can be viewed with any Graphviz viewer (including online based, for example http://viz-js.com) Usage: 1. Call `AliasDb::dumpToGraphvizFile()` from a debugger. Using gdb for example: `call aliasDb_->dumpToGraphvizFile("alias.dot")` 2. Add explicit calls to `AliasDb::dumpToGraphvizFile()`, which returns `true` if it succeeds. An example output file is attached: [example.zip](https://github.com/pytorch/pytorch/files/5805840/example.zip) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50452 Reviewed By: ngimel Differential Revision: D25980222 Pulled By: eellison fbshipit-source-id: 47805a0a81ce73c6ba859340d37b9a806f9000d5	2021-01-22 13:38:47 -08:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Michael Suo	9e32a1f5cd	[wip] update graph fuser aliasdb in-place (#37106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37106 Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Test Plan: Imported from OSS Differential Revision: D21219179 Pulled By: suo fbshipit-source-id: 8ae5397b3a0ad90edec2fbc555647091f1ad5284	2020-04-30 22:21:35 -07:00
Michael Suo	5efd10518f	[jit] speed up alias analysis (#36345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36345 During compilation, we spend a huge amount of time in alias analyis. This PR does a few things to speed it up. 1. Separate the analysis into two phases: one where we build up the necessary data structures, and the other where we service aliasing queries. This allows us to defer building indices/maintaining index consistency until after the "buildup" phase is done. 2. Properly memoize/dynamic program the memory locations lookups. 3. Done naively, setting wildcards invalidates the above memoization, trigger costly recomputation. So I added a cache-aware `setWildcards`. Sadly that means you need alias analysis to reach into the guts of memorydag, but the speedup is worth it. Sadly, these changes are kind of coupled for correctness reasons, so they're all here at once. I used this model (thanks IlyaOvodov) as a provisional benchmark. You can get it here: https://www.dropbox.com/s/jlyygn6yygj1jkx/yolov3.zip. Unzip at run `python test_timing.py`. Baseline: (752.076s) right before `6bc8ffe824` After optimizing before inlining: (699.593s) After deferring cache construction: (426.180s) After cache-aware `setWildcards`: (193.678s) So a nice 75% speedup to overall compilation. There's a lot more to do in other places of the compilation pipeline though. Followup to this PR specifically: Everything that fans out from the `analyze` call is the "buildup" phase of AliasDB construction. This should be factored into a separate analysis pass to statically distinguish the two phases (right now we just null out stuff to accomplish the same thing dynamically). Test Plan: Imported from OSS Differential Revision: D20952727 Pulled By: suo fbshipit-source-id: 099f797222d7e71e5c04991584adc2c7eab5a70f	2020-04-30 18:27:41 -07:00
Elias Ellison	4f3af09162	[JIT] Incremental updates to Alias Db in Mutation Remover pass (#35421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35421 This PR makes it so that we don't have to rebuild the entire alias db each time we remove a node in alias analysis. Test Plan: Imported from OSS Differential Revision: D20922470 Pulled By: eellison fbshipit-source-id: 9f43ed6dc743bf8a6b84a4aa38cff7059d46741d	2020-04-08 15:00:44 -07:00
Elias Ellison	0475d7b08d	[JIT] optimize mutableType calls (#35474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35474 I had previously tried to optimize getMutableTypePtr calls by not recursing through container types, but it turns out there are a few uses of container types which refine their contained elements. This attempt was in #35301 Now I am optimizing calls by caching TypePtr -> Mutable TypePtr conversions. Now that we are doing caching none of the functions marked as const are really const anymore. Previously many of the const functions actually mutated internal state, such as rebuildWriteCache. one kind of annoying thing is that there is a general api for querying mutability isMutableType that doesn't use the cache, and one internal that does, isMutableTypeInternal. It would be nice if I could call isMutableType within alias analysis and it would dispatch to the internal function, but I'm not sure how to do that. getMutableTypePtr showed up as 12% of the first run of FairSeq, so this is a function worth optimizing. Test Plan: Imported from OSS Differential Revision: D20873493 Pulled By: eellison fbshipit-source-id: 1b42bb58ba4142c118a6bc47a26978cd7fd0ac79	2020-04-06 13:31:51 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Elias Ellison	5b2f8cef08	[JIT] Functional Graph Pass (#33020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33020 This is a pass to create functional blocks. The other PRs in the stack help avoid some of the limitations that are are often found in graphs. It's possible that this would work well with a graph that is frozen. Follow up work items that will help this pass: - We don't currently have any capacity in alias analysis to tell whether a Value that came from the wildcard set "re-escapes" back into the wildcard set. - More comments on the semantics of the graph and correctness conditions - We could consider using dynamic dag if the perf of this is a limitation. - potential make Functional Graphs Functional Blocks instead, so that we do not repeatedly copy constants, also to make IR read easier. Test Plan: Imported from OSS Differential Revision: D20603188 Pulled By: eellison fbshipit-source-id: 6822a6e65f4cc2676f8f6445fe8aa1cb858ebeeb	2020-03-24 23:44:18 -07:00
Shihao Xu	7d01888a75	[JIT] Register rpc.rpc_async(..) as a JIT operator (#33329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329 # Use case ``` torch.jit.script def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor): # type: (str, str, Tensor) -> None rpc._rpc_async_torchscript( dst_worker_name, user_callable_qual_name, args=(tensor,) ) ``` # Problem ``` torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported: File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722 args = args if args else () kwargs = kwargs if kwargs else {} fut = _invoke_rpc_torchscript(to, qualified_name, args, *kwargs) ~~~~~~ <--- HERE return fut ``` # Solution Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list. # Plan This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments. - Register "prim::rpc_async" as a `Symbol` in "interned_string.h" - Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue). - Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator. - Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp". Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco. ``` #ifdef USE_DISTRIBUTED new code here #endif ``` Test Plan: Items that need to be confirmed in the test cases https://fb.quip.com/DCvdA9ZLjeO0 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit ``` Differential Revision: D5738300 fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074	2020-03-03 19:57:42 -08:00
Zino Benaissa	cab8772c6c	Freezing Torchscript modules (#32178 ) Summary: This patch enables folding GetAttr nodes with their corresponding values. _jit_pass_freeze_module API returns a new TorchScipt module where all function calls and get attributes are inlined. Usage: frozen_model = torch._C._freeze_module(scrited_model._c) frozen_model.forward(...) This API currently optimizes the forward method. We will follow up to to preserve and optimize methods and attributes that are annotated as torch.jit.interface. Several future improvements to JIT optimizations are required to maximize clean up/de-sugar the graph and eliminate redundancies. Ideally, we want to produce a graph that can easily be lowered to GLOW and other low-level backends. __ Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178 Differential Revision: D19419640 Pulled By: bzinodev fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b	2020-03-02 11:38:36 -08:00
Michael Suo	db4a24e008	[jit] remove some unused/redundant files (#33806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806 as title Test Plan: Imported from OSS Differential Revision: D20122117 Pulled By: suo fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008	2020-02-27 17:16:12 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00

33 Commits