Commit Graph

30 Commits

Author SHA1 Message Date
Luca Wehrstedt
2393bab036 [TensorPipe] Update documentation (#40222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222

Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use.
ghstack-source-id: 106225711

Test Plan: Export to GitHub, build locally and try out the docs.

Differential Revision: D22116494

fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1
2020-06-19 04:26:49 -07:00
Shihao Xu
f3f30d4354 [JIT x RPC] Consolidate RRef type class and RRef impl class (#35694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35694

close https://github.com/pytorch/pytorch/issues/35110

Differential Revision: D7881729

fbshipit-source-id: eedda8f1b7510491886d469efeed4e002bb8b991
2020-06-18 07:46:38 -07:00
Shen Li
a05ef17e46 Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216

The `rpc.functions.async_execution` decorator specifies that the
wrapped function is guaranteed to return a `torch.futures.Future`.
The decorator adds a `_wrapped_async_rpc_function` attribute to
the wrapper function. The caller retrieves this information and
then sets `isAsyncFunction` argument accordingly which is later
added to PythonCall RPC message as a field. On the callee side,
if the PythonCall carries an asynchronous function, it will cast
the function's return value to a jit::PythonFutureWrapper object,
and then install response creation and communication as a callback
on the that jit::PythonFutureWrapper.

For applications, this feature is useful when a function needs to
wait for IO or additional singaling. In those cases, marking the
user function as `rpc.functions.async_execution` will prevent it
from blocking one thread on callee for too long.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D21779962

fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
2020-06-02 23:21:25 -07:00
Edward Yang
4fef3763dd Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778)
Summary:
Original PR: https://github.com/pytorch/pytorch/pull/37419

cc mattip suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778

Differential Revision: D21385774

Pulled By: ezyang

fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be
2020-05-04 14:32:35 -07:00
Michael Suo
20f7e62b1d Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings
Test Plan: revert-hammer

Differential Revision:
D21337640

Original commit changeset: d4ad198780c3

fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb
2020-05-04 10:57:55 -07:00
mattip
f10fbcc820 Split up documentation into subpages and clean up some warnings (#37419)
Summary:
xref gh-32838, gh-34032

This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.

Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`

I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419

Differential Revision: D21337640

Pulled By: ezyang

fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
2020-05-04 09:39:22 -07:00
Shen Li
49c8a37a0d Fix doc-gen warnings in RPC (#37666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37666

Add `:orphan:` to avoid "WARNING: document isn't included in any toctree".

Test Plan: Imported from OSS

Differential Revision: D21351053

Pulled By: mrshenli

fbshipit-source-id: 6ff67c418fc1de410c7dc39ad9a0be5c30d07122
2020-05-01 12:17:15 -07:00
Shen Li
049dede3be Move rpc.rst back to the source folder to preserve existing doc URLs (#36675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36675

Test Plan: Imported from OSS

Differential Revision: D21048628

Pulled By: mrshenli

fbshipit-source-id: 3cb1b35ddc1f40c673b0db9048d77dfa024be1e7
2020-04-16 08:12:34 -07:00
Rohan Varma
1f06db2579 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-01 02:01:34 -07:00
Shen Li
3c48aadd98 Update descriptions for transmitting CUDA tensors (#34888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888

Test Plan: Imported from OSS

Differential Revision: D20491408

Pulled By: mrshenli

fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c
2020-03-17 17:43:48 -07:00
Shen Li
800bdcf000 Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887

Test Plan: Imported from OSS

Differential Revision: D20491409

Pulled By: mrshenli

fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332
2020-03-17 17:43:42 -07:00
Shen Li
3af0dffe84 Use double quotes in C++ to stay consistent with Python RPC docs (#34095)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34095

Test Plan: Imported from OSS

Differential Revision: D20227343

Pulled By: mrshenli

fbshipit-source-id: 69c556beee1f9e944eb1053b5ff0ac368dd99c60
2020-03-03 16:44:30 -08:00
Shen Li
f1085a8e41 Improve ProcessGroup RpcBackendOptions Constructor API (#34081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081

Before this commit, applications have to do the following to configure
number of threads in ProcessGroup RPC backend:

```
op = ProcessGroupRpcBackendOptions()
op.rpc_timeout = rpc_timeout
op.init_method = init_method
op.num_send_recv_threads = 32
init_rpc(...., rpc_backend_options=op)
```

After this commit, it can be simplified to:

```
init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32))
```

Fixes #34075

Test Plan: Imported from OSS

Differential Revision: D20227344

Pulled By: mrshenli

fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7
2020-03-03 16:43:29 -08:00
Shen Li
62f93443e5 Explain RPC behavior when using Tensor as arg or return value
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968

Test Plan: Imported from OSS

Differential Revision: D19321380

Pulled By: mrshenli

fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac
2020-01-09 16:42:24 -08:00
Rohan Varma
dbc8b00816 Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077)
Summary:
We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation.
<img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077

Differential Revision: D18928162

Pulled By: rohan-varma

fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723
2019-12-11 11:39:57 -08:00
Shen Li
ec5e471647 Reorganize rpc API doc and add introduction (#30491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491

Our RPC API docs presents the APIs well but misses a general
introduction to the APIs. Readers might be a little lost the first
time landing this page. This commits reorganizes the APIs into
four components from user's perspective, RPC, RRef, dist autograd,
and dist optimizer. It also adds an intro to each and briefly
discribes why we provide those.

Test Plan: Imported from OSS

Differential Revision: D18723294

Pulled By: mrshenli

fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb
2019-11-28 15:34:18 -08:00
Rohan Varma
1350b99de4 Add local shutdown to process group agent (#30330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330

This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.

ghstack-source-id: 94673884
ghstack-source-id: 94673884

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18661775

fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
2019-11-27 22:34:08 -08:00
Shen Li
a9f3f48f88 Revert D5578006: Add local shutdown to process group agent
Test Plan: revert-hammer

Differential Revision:
D5578006

Original commit changeset: 6258879fb44c

fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1
2019-11-22 11:31:04 -08:00
Rohan Varma
c478a92b93 Add local shutdown to process group agent (#30020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.

ghstack-source-id: 94415336

Test Plan: Unit tests pass.

Differential Revision: D5578006

fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
2019-11-22 10:03:00 -08:00
Shen Li
aa1e99e983 Fix two links in RPC API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30259

Test Plan: Imported from OSS

Differential Revision: D18644749

Pulled By: mrshenli

fbshipit-source-id: ff515d2588cd59e0d87f020a01885156a6644450
2019-11-21 19:32:22 -08:00
Shen Li
e0325011e4 Add link to RRef protocol in RPC doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30218

Test Plan: Imported from OSS

Differential Revision: D18638881

Pulled By: mrshenli

fbshipit-source-id: ca6fae6f8cea8cdcc33d275dd71a347fbb5dd45c
2019-11-21 16:22:35 -08:00
Shen Li
2803261a23 Update API doc for wait_all_workers after rename
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30179

Test Plan: Imported from OSS

Differential Revision: D18623092

Pulled By: mrshenli

fbshipit-source-id: 1bbffc7476f256c156783274f7ef51342820edcd
2019-11-20 16:12:30 -08:00
Rohan Varma
de05114618 polish examples in docstrings and update docs to reflect correct use of (#30052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052

Some of the examples provided in `rpc/api.py` were not updated along
with the code changes, this PR updates them. Also removes the
`dist.ProcessGroup` information since `init_rpc` now initializes a default
process group.
ghstack-source-id: 94273004

Test Plan: Unit tests pass

Differential Revision: D18582596

fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f
2019-11-20 15:30:38 -08:00
Shen Li
ff7afede92 Stop showing .api as an API path component in RPC docs (#30160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30160

The path torch.distributed.rpc.api is an implementation detail, which
should not be used by applications to import RPC APIs. Instead, all
RPC APIs are exposed directly as torch.distributed.rpc.*. This
commit makes the API doc consistent with the above expectation.

Test Plan: Imported from OSS

Differential Revision: D18616359

Pulled By: mrshenli

fbshipit-source-id: 8207f7d36c24cf55af737c03a27fd1896c231641
2019-11-20 12:04:10 -08:00
Pritam Damania
5d69bc1eda Add docs for distributed optimizer. (#29971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29971

ghstack-source-id: 94132160

Test Plan: waitforbuildbot

Differential Revision: D18554631

fbshipit-source-id: c4485f7cff5159f423d0f35d1caf71074b62dc28
2019-11-18 18:51:26 -08:00
Pritam Damania
ab93b3df60 Polish distributed autograd docs. (#29942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29942

1) Added links to the design.
2) Fixed function signautres.
3) Expanded examples
ghstack-source-id: 94162372

Test Plan: waitforbuildbot

Differential Revision: D18547103

fbshipit-source-id: 067ba166c107ed14085af8ee3306d3f8a9dcebe7
2019-11-18 18:13:08 -08:00
Rohan Varma
639133d6d1 rename init_model_parallel to init_rpc (#29762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762

Rename this API as discussed, since it's use cases extend beyond only
model parallelism.
ghstack-source-id: 94020627

Test Plan: Unit tests pass

Differential Revision: D18491743

fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57
2019-11-18 06:07:44 -08:00
Rohan Varma
455b5c1a7d minor updates to rpc docs (#29857)
Summary:
Small fixes to rpc docs:
- mark as experimental and subject to change
- Reference the distributed autograd design document in pytorch notes page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29857

Differential Revision: D18526252

Pulled By: rohan-varma

fbshipit-source-id: e09757fa60a9f8fe9c76a868a418a1cd1c300eae
2019-11-15 22:28:08 -08:00
Pritam Damania
eb29276623 Update distributed autograd design doc with appropriate links. (#29927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927

With the docs page now up, we can update the links in the design doc
to point to the docs page.
ghstack-source-id: 94055423

Test Plan: waitforbuildbot

Differential Revision: D18541878

fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520
2019-11-15 21:10:53 -08:00
Rohan Varma
06ef4a757d Add docs for RPC, dist autograd, and RRef modules (#29276)
Summary:
Closes https://github.com/pytorch/pytorch/issues/28983. Documentation for `torch.distributed.rpc` and `torch.distributed.autograd` modules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation.

Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: https://github.com/pytorch/pytorch/issues/23110, https://github.com/pytorch/pytorch/issues/26759
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29276

Differential Revision: D18478754

Pulled By: rohan-varma

fbshipit-source-id: e9a7089baf5275304e5408d319eb9bf98e53fff8
2019-11-14 14:32:03 -08:00