pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	95dbbf713e	[Distributed] [9/N] Fix clang-tidy warnings in torch/csrc/distributed/rpc (#130109 ) Follows #125102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130109 Approved by: https://github.com/ezyang	2024-07-16 04:23:42 +00:00
Kazuaki Ishizaki	2973994259	fix typo in comments under torch/csrc/distributed (#96062 ) This PR fixes typos in comments and messages of `.cpp` and `.hpp` files under `torch/csrc/distributed` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/96062 Approved by: https://github.com/ngimel	2023-03-07 02:56:41 +00:00
Luca Wehrstedt	03a5c6ea99	Remove LazyStreamContext (1 out of 2) (#59298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59298 After recent changes, LazyStreamContext had in fact always become eager, and was in fact equivalent to a vector of streams. So it makes more sense now to remove this abstraction and use a more self-descriptive type. This PR migrates the RequestCallback internals. The next PR migrates the TensorPipe agent. ghstack-source-id: 130583774 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28789175 fbshipit-source-id: fa581a50f9a6a1e42c2ad8c808a9b099bea7433e	2021-06-04 06:53:46 -07:00
Luca Wehrstedt	45012da298	Migrate from shared_ptr to intrusive_ptr for Future (#57636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57636 The "preferred" pointer holder for Future is `intrusive_ptr` (e.g., `then` returns an `intrusive_ptr`, `toFuture` returns `intrusive_ptr`, ...). However in RPC we often wrap it with `shared_ptr`. This probably dates back to when we had a separate Future type, before the merge. At the boundary between RPC and JIT this difference becomes a bit annoying, as conversions between the pointer types are needed. I think it would be simpler and more consistent to always use `intrusive_ptr`, also in RPC. This PR was produced mainly by find-and-replace, plus a couple of manual fixes. ghstack-source-id: 128296581 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D28187972 fbshipit-source-id: d4609273a1550b4921910e85d2198e02f31c905b	2021-05-07 03:59:20 -07:00
Pavel Belevich	9f89b53d7d	Synchronize RRef.to_here() CUDA Streams properly (#54932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54932 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27684022 Pulled By: pbelevich fbshipit-source-id: 2bae51ab6649258d0219ca4e9dbbf45ac6a76c28	2021-04-13 23:24:38 -07:00
Shen Li	f9f758e349	Apply clang-format to rpc cpp files (#50236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50236 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25847892 Pulled By: mrshenli fbshipit-source-id: b4af1221acfcaba8903c629869943abbf877e04e	2021-01-08 11:47:43 -08:00
Shen Li	171648edaa	Completely Remove FutureMessage from RPC agents (#50028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50028 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753887 Pulled By: mrshenli fbshipit-source-id: 40718349c2def262a16aaa24c167c0b540cddcb1	2021-01-07 19:50:53 -08:00
Shen Li	422e348619	Don't run user function until all UserRRefs in the args are confirmed (#34497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34497 Use a thread_local table to intercept UserRRefs created during user function args deserialization, and then wait for confirmations of those UserRRefs before launching the given user function. Differential Revision: D20347464 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 087484a2d2f03fbfb156752ab25653f39b412a07	2020-03-16 18:30:06 -07:00
Omkar Salpekar	78b81dad83	[Dist Autograd][Better Engineering] Enhanced Error Reporting in Dist Autograd/RPC (#34179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34179 Fixes: https://github.com/pytorch/pytorch/issues/27644 Test Plan: Asserted `test_backward_autograd_engine_error` throws an exception with node information. Differential Revision: D20238150 fbshipit-source-id: a49b279b77416a7e0e09043aa44ed616023d8e70	2020-03-04 10:13:49 -08:00
Jeremy Lilley	e7e6d56b77	Allow async work in rpc RequestCallback processing. (#30637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30637 RequestCallback api currently forces work to be always synchronous, which, as we scale, means we're going to need to throw large number of (mostly blocked) threads at the rpc problem. For some activities like dependent autograd rpcs, there's not a necessary reason to block in these threads. In this change, the RequestCallback api is updated to return a shared_ptr<FutureMessage> rather than a Message: std::shared_ptr<FutureMessage> operator()(Message& request) const; With a futures-style api, RPC ops that wish to be async can then be async, while short-lived blocking functions (or Python UDFs) can just block. In this change, we keep all of the current ops as synchronous (i.e. we block and then return a completed FutureMessage). We also update the rpc_agents in a manner compatible with this sort of parallelism. Here, we only want to incur overhead when we use the async behavior. Some modest extra cost seems unavoidable here (e.g. the allocation for the std::make_shared<>), but we can trivially detect the synchronous/completed case in the rpc_agent and avoid the extra thread-switches/etc. in that case. ghstack-source-id: 95287026 Test Plan: - Basic: buck test mode/dev-nosan caffe2/test/... - Additional testcase in ThriftRpcAgentTest for deferred work. Differential Revision: D18774322 fbshipit-source-id: cf49922a71707cfb1726de16f93af23b160385d8	2019-12-10 16:11:05 -08:00
Pritam Damania	77bb41c965	Rename dist_autograd_context and dist_autograd_container. (#29696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29696 The paths distributed/autograd/context/dist_autograd_context.h and distributed/autograd/context/dist_autograd_container.h were repetitive. Therefore renaming these to distributed/autograd/context/context.h and distributed/autograd/context/container.h ghstack-source-id: 93850266 Test Plan: waitforbuildbot Differential Revision: D18467624 fbshipit-source-id: bbf3905396f553006851af296c880c1bd106ec47	2019-11-14 14:49:34 -08:00
Rohan Varma	3fb9bbc99b	refactor and move createException function (#29605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29605 Adds a wrapper around the existing createException function that allows passing of an error string, instead of a regular C++ exception. This allows us to createExceptions for errors that aren't necessarilu c++ exceptions. This function is used by https://github.com/pytorch/pytorch/pull/29601 and https://github.com/pytorch/pytorch/pull/26336. ghstack-source-id: 93819039 Test Plan: Unit tests pass Differential Revision: D18439216 fbshipit-source-id: 70b6a2e4f107304e322cdd2630847ad0071bc0c1	2019-11-13 14:53:22 -08:00
Pieter Noordhuis	49fba35208	Run clang-format for torch/distributed/rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27531 Test Plan: Imported from OSS Differential Revision: D17808206 Pulled By: pietern fbshipit-source-id: 7d23327bfba42dab4b60779c9f03b7952ff0db7a	2019-11-05 06:25:30 -08:00
Yanli Zhao	56eb4f7daa	Add autograd hook for python rpc call (#28312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28312 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) ghstack-source-id: 92240367 Test Plan: unit tests Differential Revision: D18017554 fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d	2019-10-19 07:38:14 -07:00
Yanli Zhao	af88537483	Back out "Add autograd hook for python rpc call" Summary: Original commit changeset: 070324c57312 Test Plan: revert Reviewed By: pritamdamania87 Differential Revision: D18011308 fbshipit-source-id: 4185e4c6f51c1d11b23b8ab44e6e958b09f27c53	2019-10-18 11:53:39 -07:00
Yanli Zhao	56c4215fcc	Add autograd hook for python rpc call (#27576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27576 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) ghstack-source-id: 92154535 Test Plan: unit tests Differential Revision: D17819153 fbshipit-source-id: 37d8a85855bf591f2f2da48d475a06e870a30ea1	2019-10-18 10:11:45 -07:00
Pritam Damania	3bccd3fc0d	Distributed Autograd - FAST mode backward pass implementation. (#27022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27022 This change implements the "FAST" mode distributed autograd backward pass as described in https://github.com/pytorch/pytorch/issues/23110. At a high level the backward pass works as follows: 1. We start by computing dependencies on the node that calls `torch.distributed.backward`. 2. This node computes the dependencies starting from the root nodes provided in the backward call and all the 'send' functions present in the current autograd context. The "FAST" mode assumes all 'send' functions are part of the autograd computation. 3. Once the dependency computation is done, the distributed autograd engine calls the local autograd engine to execute the autograd graph. Note that the autograd graph on a single node is not necessarily connected because of inter-node communication. As a result, we have special handling to ensure the local autograd engine ensures we execute the entire graph starting from the provided roots and all 'send' functions on the node. 4. When the local autograd engine hits a 'recv' function, it performs an async RPC to send the gradients over to the appropriate node and stores a future in the autograd context to keep track of this RPC. 5. On the destination node, the appropriate 'send' function is looked up and enqueued on the local autograd engine. If this is the first time the node is hearing about this autograd context id on the backward pass, then the node computes dependencies for the local autograd engine. 6. As part of compute dependencies, the distributed autograd engine discovers all leaf nodes and ensures those are passed as 'outputs' to the local autograd engine. This avoids running the 'AccumulateGrad' function. 7. The gradients computed for the leaf nodes are then actually accumulated in `DistAutogradContext` for the appropriate autograd context id. 8. The distributed autograd engine waits for the local autograd engine to complete and also waits for all the 'Futures' (stored in 4.) for respective RPCs to finish. We have made the following changes to the local autograd engine for this purpose: 1. Expose GraphTask and NodeTask so that the distributed autograd engine can use them. 2. Expose a `execute_with_graph_task` API which gives the distributed engine to build a GraphTask and pass it to the local autograd engine. 3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build a `NodeTask` for a 'send' function and enqueue it on the local autograd engine. In addition to this a few general improvements: 1. Added a `PropagateGradients` RPC call for the 'recv' function to pass gradients to the appropriate node during the backward pass. 2. Use IValues as much as possible in serialization for RpcWithAutograd. 3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate exception instead of just returning the message. This is inline with what most Future.wait() APIs do. 4. Added a `get_gradients(context_id)` API which allows users to retrieve a map from Tensor to respective gradient for the provided context_id on the local node. ghstack-source-id: 91794926 Test Plan: unit tests. Differential Revision: D17652615 fbshipit-source-id: 96f65c52adb2706ee29f4b49e1655afaa0a3bec3	2019-10-12 09:47:49 -07:00
Pritam Damania	fe4170bda8	Add send and recv backward functions for builtin operators RPC. (#25527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527 Master GH issue: https://github.com/pytorch/pytorch/issues/23110. This change builds upon https://github.com/pytorch/pytorch/pull/24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. ghstack-source-id: 91240466 Test Plan: unit tests. Differential Revision: D17148077 fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233	2019-10-03 01:18:46 -07:00

18 Commits