Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222
Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use.
ghstack-source-id: 106225711
Test Plan: Export to GitHub, build locally and try out the docs.
Differential Revision: D22116494
fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216
The `rpc.functions.async_execution` decorator specifies that the
wrapped function is guaranteed to return a `torch.futures.Future`.
The decorator adds a `_wrapped_async_rpc_function` attribute to
the wrapper function. The caller retrieves this information and
then sets `isAsyncFunction` argument accordingly which is later
added to PythonCall RPC message as a field. On the callee side,
if the PythonCall carries an asynchronous function, it will cast
the function's return value to a jit::PythonFutureWrapper object,
and then install response creation and communication as a callback
on the that jit::PythonFutureWrapper.
For applications, this feature is useful when a function needs to
wait for IO or additional singaling. In those cases, marking the
user function as `rpc.functions.async_execution` will prevent it
from blocking one thread on callee for too long.
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision: D21779962
fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
Summary:
xref gh-32838, gh-34032
This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.
Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`
I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419
Differential Revision: D21337640
Pulled By: ezyang
fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37666
Add `:orphan:` to avoid "WARNING: document isn't included in any toctree".
Test Plan: Imported from OSS
Differential Revision: D21351053
Pulled By: mrshenli
fbshipit-source-id: 6ff67c418fc1de410c7dc39ad9a0be5c30d07122
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081
Before this commit, applications have to do the following to configure
number of threads in ProcessGroup RPC backend:
```
op = ProcessGroupRpcBackendOptions()
op.rpc_timeout = rpc_timeout
op.init_method = init_method
op.num_send_recv_threads = 32
init_rpc(...., rpc_backend_options=op)
```
After this commit, it can be simplified to:
```
init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32))
```
Fixes#34075
Test Plan: Imported from OSS
Differential Revision: D20227344
Pulled By: mrshenli
fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491
Our RPC API docs presents the APIs well but misses a general
introduction to the APIs. Readers might be a little lost the first
time landing this page. This commits reorganizes the APIs into
four components from user's perspective, RPC, RRef, dist autograd,
and dist optimizer. It also adds an intro to each and briefly
discribes why we provide those.
Test Plan: Imported from OSS
Differential Revision: D18723294
Pulled By: mrshenli
fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.
ghstack-source-id: 94673884
ghstack-source-id: 94673884
Test Plan: Unit tests pass.
Reviewed By: mrshenli
Differential Revision: D18661775
fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.
ghstack-source-id: 94415336
Test Plan: Unit tests pass.
Differential Revision: D5578006
fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052
Some of the examples provided in `rpc/api.py` were not updated along
with the code changes, this PR updates them. Also removes the
`dist.ProcessGroup` information since `init_rpc` now initializes a default
process group.
ghstack-source-id: 94273004
Test Plan: Unit tests pass
Differential Revision: D18582596
fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30160
The path torch.distributed.rpc.api is an implementation detail, which
should not be used by applications to import RPC APIs. Instead, all
RPC APIs are exposed directly as torch.distributed.rpc.*. This
commit makes the API doc consistent with the above expectation.
Test Plan: Imported from OSS
Differential Revision: D18616359
Pulled By: mrshenli
fbshipit-source-id: 8207f7d36c24cf55af737c03a27fd1896c231641
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762
Rename this API as discussed, since it's use cases extend beyond only
model parallelism.
ghstack-source-id: 94020627
Test Plan: Unit tests pass
Differential Revision: D18491743
fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57
Summary:
Small fixes to rpc docs:
- mark as experimental and subject to change
- Reference the distributed autograd design document in pytorch notes page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29857
Differential Revision: D18526252
Pulled By: rohan-varma
fbshipit-source-id: e09757fa60a9f8fe9c76a868a418a1cd1c300eae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927
With the docs page now up, we can update the links in the design doc
to point to the docs page.
ghstack-source-id: 94055423
Test Plan: waitforbuildbot
Differential Revision: D18541878
fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520