From 85320336799e38411d15c0e159b41248cda01218 Mon Sep 17 00:00:00 2001 From: Howard Huang Date: Wed, 9 Jul 2025 09:57:55 -0700 Subject: [PATCH] RPC tutorial audit (#157938) Fix [T228333894](https://www.internalfb.com/intern/tasks/?t=228333894) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157938 Approved by: https://github.com/AlannaBurke --- docs/source/rpc.md | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/docs/source/rpc.md b/docs/source/rpc.md index 77f8ec439ae..38e9354f70d 100644 --- a/docs/source/rpc.md +++ b/docs/source/rpc.md @@ -8,16 +8,14 @@ higher-level API to automatically differentiate models split across several machines. ```{warning} -APIs in the RPC package are stable. There are multiple ongoing work items -to improve performance and error handling, which will ship in future releases. +APIs in the RPC package are stable and in maintenance mode. ``` ```{warning} -CUDA support was introduced in PyTorch 1.9 and is still a **beta** feature. +CUDA support is a **beta** feature. Not all features of the RPC package are yet compatible with CUDA support and thus their use is discouraged. These unsupported features include: RRefs, -JIT compatibility, dist autograd and dist optimizer, and profiling. These -shortcomings will be addressed in future releases. +JIT compatibility, dist autograd and dist optimizer, and profiling. ``` ```{note} @@ -102,13 +100,6 @@ device lists on source and destination workers do not match. In such cases, applications can always explicitly move the input tensors to CPU on the caller and move it to the desired devices on the callee if necessary. -```{warning} - TorchScript support in RPC is a prototype feature and subject to change. Since - v1.5.0, ``torch.distributed.rpc`` supports calling TorchScript functions as - RPC target functions, and this will help improve parallelism on the callee - side as executing TorchScript functions does not require GIL. -``` - ```{eval-rst} .. autofunction:: rpc_sync .. autofunction:: rpc_async @@ -159,9 +150,7 @@ multiple different transports (TCP, of course, but also shared memory, NVLink, InfiniBand, ...) and can automatically detect their availability and negotiate the best transport to use for each pipe. -The TensorPipe backend has been introduced in PyTorch v1.6 and is being actively -developed. At the moment, it only supports CPU tensors, with GPU support coming -soon. It comes with a TCP-based transport, just like Gloo. It is also able to +The TensorPipe backend comes with a TCP-based transport, just like Gloo. It is also able to automatically chunk and multiplex large tensors over multiple sockets and threads in order to achieve very high bandwidths. The agent will be able to pick the best transport on its own, with no intervention required. @@ -301,6 +290,4 @@ to use [the profiler](https://pytorch.org/docs/stable/autograd.html#profiler) to - [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html) - [Implementing a Parameter Server using Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html) - [Combining Distributed DataParallel with Distributed RPC Framework](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html) (covers **RemoteModule** as well) -- [Profiling RPC-based Workloads](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html) - [Implementing batch RPC processing](https://pytorch.org/tutorials/intermediate/rpc_async_execution.html) -- [Distributed Pipeline Parallel](https://pytorch.org/tutorials/intermediate/dist_pipeline_parallel_tutorial.html)