pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shen Li	8e47fcba5f	Update docs for RPC async_execution (#45458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45458 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973366 Pulled By: mrshenli fbshipit-source-id: 3697f07fa972db21746aa25eaf461c1b93293f58	2020-09-28 20:48:12 -07:00
Pritam Damania	3b7e4f89b2	Add deprecation warning to PG backend and make TP backend stable. (#45356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45356 In this PR, I'm adding a warning to the PG backend mentioning it would be deprecated in the future. In addition to this I removed the warning from the TP backend that it is a beta feature. ghstack-source-id: 112940501 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D23940144 fbshipit-source-id: d44054aa1e4ef61004a40bbe0ec45ff07829aad4	2020-09-25 15:41:00 -07:00
Rohan Varma	070fe15e4c	Add link to profiling recipe from rpc main docs (#45235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45235 This is so that users know that the profiler works as expected with RPC and they can learn how to use it to profile RPC-based workloads. ghstack-source-id: 112773748 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23777888 fbshipit-source-id: 4805be9b949c8c7929182f291a6524c3c6a725c1	2020-09-23 22:02:38 -07:00
Shen Li	a12fe1a242	Minor RPC doc fixes (#43337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43337 Test Plan: Imported from OSS Reviewed By: osalpekar Differential Revision: D23242698 Pulled By: osalpekar fbshipit-source-id: 7757fc43824423e3a6efd4da44c69995f64a6015	2020-08-20 14:17:07 -07:00
Shen Li	5006d24302	Make TensorPipe the default backend for RPC (#43246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43246 Test Plan: Imported from OSS Reviewed By: osalpekar Differential Revision: D23206042 Pulled By: osalpekar fbshipit-source-id: 258481ea9e753cd36c2787183827ca3b81d678e3	2020-08-20 14:17:02 -07:00
Pritam Damania	a6b69fdd33	Add DDP+RPC tutorial to RPC docs page. (#42828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42828 ghstack-source-id: 109855425 Test Plan: waitforbuildbot Reviewed By: jlin27 Differential Revision: D23037016 fbshipit-source-id: 250f322b652b86257839943309b8f0b8ce1bb25b	2020-08-13 19:41:06 -07:00
Luca Wehrstedt	dde3d5f4a8	[RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41200 In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs. Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role. ghstack-source-id: 107458630 Test Plan: Docs only Differential Revision: D22462158 fbshipit-source-id: 0d72fea11bcaab6d662184bbe7270529772a5e9b	2020-07-09 15:33:07 -07:00
Shen Li	0edbe6b063	Add a link in RPC doc page to point to PT Distributed overview (#41108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108 Test Plan: Imported from OSS Differential Revision: D22440751 Pulled By: mrshenli fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb	2020-07-08 14:00:05 -07:00
Jessica Lin	2e6e8d557c	Update docs feature classifications (#39966 ) Summary: Update the following feature classifications in docs to align with the changes: 1. [High Level Autograd APIs](https://pytorch.org/docs/stable/autograd.html#functional-higher-level-api): Beta (was experimental) 2. [Eager Mode Quantization](https://pytorch.org/docs/stable/quantization.html): Beta (was experimental) 3. [Named Tensors](https://pytorch.org/docs/stable/named_tensor.html): Prototype (was experimental) 4. [TorchScript/RPC](https://pytorch.org/docs/stable/rpc.html#rpc): Prototype (was experimental) 5. [Channels Last Memory Layout](https://pytorch.org/docs/stable/tensor_attributes.html#torch-memory-format): Beta (was experimental) 6. [Custom C++ Classes](https://pytorch.org/docs/stable/cpp_index.html): Beta (was experimental) 7. [Torch.Sparse](https://pytorch.org/docs/stable/sparse.html): Beta (was experimental) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39966 Differential Revision: D22213217 Pulled By: jlin27 fbshipit-source-id: dc49337cbc7026ed8dcac506fc60029dc3add854	2020-06-24 15:35:59 -07:00
Shihao Xu	7c07c39845	[torch.distributed.rpc] Install method docstrings from PyRRef to RRef (#40461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461 It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable. Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type. As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11. {F241283111} ghstack-source-id: 106472496 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_rref_str buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_return_local_rrefs buck test mode/dev-nosan //caffe2/torch/fb/distributed/model_parallel/tests:test_elastic_averaging -- 'test_elastic_averaging_center \(caffe2\.torch\.fb\.distributed\.model_parallel\.tests\.test_elastic_averaging\.TestElasticAveragingCenter\)' P134031188 Differential Revision: D7933834 fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247	2020-06-23 19:58:36 -07:00
Jessica Lin	7c737eab59	Remove table of contents at the top of rpc.rst (#40205 ) Summary: mattip - Can we remove the table of contents created by the `.. contents:: :local: :depth: 2` since this page isn't one of the large documentation pages (https://github.com/pytorch/pytorch/issues/38010) and is simply a landing page for the Distributed RPC Framework? Changes made in this original PR: `f10fbcc820 (diff-250b9b23fd6f1a5c15aecdb72afb9d7d)` cc mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/40205 Differential Revision: D22194943 Pulled By: jlin27 fbshipit-source-id: 4e42845daf2784a17ad81645fe3b838385656bba	2020-06-23 19:45:11 -07:00
Shen Li	3ca05500fa	Improve RPC documents (#40296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40296 1. Added a link to parameter server tutorial 2. Explained current states for TorchScript support Test Plan: Imported from OSS Differential Revision: D22142647 Pulled By: mrshenli fbshipit-source-id: ffd697dd64a3aa874cf3f3488122ed805903370d	2020-06-19 15:34:49 -07:00
Luca Wehrstedt	2393bab036	[TensorPipe] Update documentation (#40222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222 Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use. ghstack-source-id: 106225711 Test Plan: Export to GitHub, build locally and try out the docs. Differential Revision: D22116494 fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1	2020-06-19 04:26:49 -07:00
Shihao Xu	f3f30d4354	[JIT x RPC] Consolidate RRef type class and RRef impl class (#35694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35694 close https://github.com/pytorch/pytorch/issues/35110 Differential Revision: D7881729 fbshipit-source-id: eedda8f1b7510491886d469efeed4e002bb8b991	2020-06-18 07:46:38 -07:00
Shen Li	a05ef17e46	Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216 The `rpc.functions.async_execution` decorator specifies that the wrapped function is guaranteed to return a `torch.futures.Future`. The decorator adds a `_wrapped_async_rpc_function` attribute to the wrapper function. The caller retrieves this information and then sets `isAsyncFunction` argument accordingly which is later added to PythonCall RPC message as a field. On the callee side, if the PythonCall carries an asynchronous function, it will cast the function's return value to a jit::PythonFutureWrapper object, and then install response creation and communication as a callback on the that jit::PythonFutureWrapper. For applications, this feature is useful when a function needs to wait for IO or additional singaling. In those cases, marking the user function as `rpc.functions.async_execution` will prevent it from blocking one thread on callee for too long. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D21779962 fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941	2020-06-02 23:21:25 -07:00
Edward Yang	4fef3763dd	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/37419 cc mattip suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778 Differential Revision: D21385774 Pulled By: ezyang fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be	2020-05-04 14:32:35 -07:00
Michael Suo	20f7e62b1d	Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings Test Plan: revert-hammer Differential Revision: D21337640 Original commit changeset: d4ad198780c3 fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb	2020-05-04 10:57:55 -07:00
mattip	f10fbcc820	Split up documentation into subpages and clean up some warnings (#37419 ) Summary: xref gh-32838, gh-34032 This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages. Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py` I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419 Differential Revision: D21337640 Pulled By: ezyang fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f	2020-05-04 09:39:22 -07:00
Shen Li	49c8a37a0d	Fix doc-gen warnings in RPC (#37666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37666 Add `:orphan:` to avoid "WARNING: document isn't included in any toctree". Test Plan: Imported from OSS Differential Revision: D21351053 Pulled By: mrshenli fbshipit-source-id: 6ff67c418fc1de410c7dc39ad9a0be5c30d07122	2020-05-01 12:17:15 -07:00
Shen Li	049dede3be	Move rpc.rst back to the source folder to preserve existing doc URLs (#36675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36675 Test Plan: Imported from OSS Differential Revision: D21048628 Pulled By: mrshenli fbshipit-source-id: 3cb1b35ddc1f40c673b0db9048d77dfa024be1e7	2020-04-16 08:12:34 -07:00
Rohan Varma	1f06db2579	Refactored rpc docs (#35109 ) Summary: Reorganize as per jlin27 's comments. Screenshots added in comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109 Differential Revision: D20788774 Pulled By: rohan-varma fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766	2020-04-01 02:01:34 -07:00
Shen Li	3c48aadd98	Update descriptions for transmitting CUDA tensors (#34888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888 Test Plan: Imported from OSS Differential Revision: D20491408 Pulled By: mrshenli fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c	2020-03-17 17:43:48 -07:00
Shen Li	800bdcf000	Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887 Test Plan: Imported from OSS Differential Revision: D20491409 Pulled By: mrshenli fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332	2020-03-17 17:43:42 -07:00
Shen Li	3af0dffe84	Use double quotes in C++ to stay consistent with Python RPC docs (#34095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34095 Test Plan: Imported from OSS Differential Revision: D20227343 Pulled By: mrshenli fbshipit-source-id: 69c556beee1f9e944eb1053b5ff0ac368dd99c60	2020-03-03 16:44:30 -08:00
Shen Li	f1085a8e41	Improve ProcessGroup RpcBackendOptions Constructor API (#34081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081 Before this commit, applications have to do the following to configure number of threads in ProcessGroup RPC backend: ``` op = ProcessGroupRpcBackendOptions() op.rpc_timeout = rpc_timeout op.init_method = init_method op.num_send_recv_threads = 32 init_rpc(...., rpc_backend_options=op) ``` After this commit, it can be simplified to: ``` init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32)) ``` Fixes #34075 Test Plan: Imported from OSS Differential Revision: D20227344 Pulled By: mrshenli fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7	2020-03-03 16:43:29 -08:00
Shen Li	62f93443e5	Explain RPC behavior when using Tensor as arg or return value Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968 Test Plan: Imported from OSS Differential Revision: D19321380 Pulled By: mrshenli fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac	2020-01-09 16:42:24 -08:00
Rohan Varma	dbc8b00816	Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077 ) Summary: We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation. <img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077 Differential Revision: D18928162 Pulled By: rohan-varma fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723	2019-12-11 11:39:57 -08:00
Shen Li	ec5e471647	Reorganize rpc API doc and add introduction (#30491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491 Our RPC API docs presents the APIs well but misses a general introduction to the APIs. Readers might be a little lost the first time landing this page. This commits reorganizes the APIs into four components from user's perspective, RPC, RRef, dist autograd, and dist optimizer. It also adds an intro to each and briefly discribes why we provide those. Test Plan: Imported from OSS Differential Revision: D18723294 Pulled By: mrshenli fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb	2019-11-28 15:34:18 -08:00
Rohan Varma	1350b99de4	Add local shutdown to process group agent (#30330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately. ghstack-source-id: 94673884 ghstack-source-id: 94673884 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18661775 fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2	2019-11-27 22:34:08 -08:00
Shen Li	a9f3f48f88	Revert D5578006: Add local shutdown to process group agent Test Plan: revert-hammer Differential Revision: D5578006 Original commit changeset: 6258879fb44c fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1	2019-11-22 11:31:04 -08:00
Rohan Varma	c478a92b93	Add local shutdown to process group agent (#30020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times. ghstack-source-id: 94415336 Test Plan: Unit tests pass. Differential Revision: D5578006 fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02	2019-11-22 10:03:00 -08:00
Shen Li	aa1e99e983	Fix two links in RPC API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30259 Test Plan: Imported from OSS Differential Revision: D18644749 Pulled By: mrshenli fbshipit-source-id: ff515d2588cd59e0d87f020a01885156a6644450	2019-11-21 19:32:22 -08:00
Shen Li	e0325011e4	Add link to RRef protocol in RPC doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30218 Test Plan: Imported from OSS Differential Revision: D18638881 Pulled By: mrshenli fbshipit-source-id: ca6fae6f8cea8cdcc33d275dd71a347fbb5dd45c	2019-11-21 16:22:35 -08:00
Shen Li	2803261a23	Update API doc for wait_all_workers after rename Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30179 Test Plan: Imported from OSS Differential Revision: D18623092 Pulled By: mrshenli fbshipit-source-id: 1bbffc7476f256c156783274f7ef51342820edcd	2019-11-20 16:12:30 -08:00
Rohan Varma	de05114618	polish examples in docstrings and update docs to reflect correct use of (#30052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052 Some of the examples provided in `rpc/api.py` were not updated along with the code changes, this PR updates them. Also removes the `dist.ProcessGroup` information since `init_rpc` now initializes a default process group. ghstack-source-id: 94273004 Test Plan: Unit tests pass Differential Revision: D18582596 fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f	2019-11-20 15:30:38 -08:00
Shen Li	ff7afede92	Stop showing .api as an API path component in RPC docs (#30160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30160 The path torch.distributed.rpc.api is an implementation detail, which should not be used by applications to import RPC APIs. Instead, all RPC APIs are exposed directly as torch.distributed.rpc.*. This commit makes the API doc consistent with the above expectation. Test Plan: Imported from OSS Differential Revision: D18616359 Pulled By: mrshenli fbshipit-source-id: 8207f7d36c24cf55af737c03a27fd1896c231641	2019-11-20 12:04:10 -08:00
Pritam Damania	5d69bc1eda	Add docs for distributed optimizer. (#29971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29971 ghstack-source-id: 94132160 Test Plan: waitforbuildbot Differential Revision: D18554631 fbshipit-source-id: c4485f7cff5159f423d0f35d1caf71074b62dc28	2019-11-18 18:51:26 -08:00
Pritam Damania	ab93b3df60	Polish distributed autograd docs. (#29942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29942 1) Added links to the design. 2) Fixed function signautres. 3) Expanded examples ghstack-source-id: 94162372 Test Plan: waitforbuildbot Differential Revision: D18547103 fbshipit-source-id: 067ba166c107ed14085af8ee3306d3f8a9dcebe7	2019-11-18 18:13:08 -08:00
Rohan Varma	639133d6d1	rename init_model_parallel to init_rpc (#29762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762 Rename this API as discussed, since it's use cases extend beyond only model parallelism. ghstack-source-id: 94020627 Test Plan: Unit tests pass Differential Revision: D18491743 fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57	2019-11-18 06:07:44 -08:00
Rohan Varma	455b5c1a7d	minor updates to rpc docs (#29857 ) Summary: Small fixes to rpc docs: - mark as experimental and subject to change - Reference the distributed autograd design document in pytorch notes page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29857 Differential Revision: D18526252 Pulled By: rohan-varma fbshipit-source-id: e09757fa60a9f8fe9c76a868a418a1cd1c300eae	2019-11-15 22:28:08 -08:00
Pritam Damania	eb29276623	Update distributed autograd design doc with appropriate links. (#29927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927 With the docs page now up, we can update the links in the design doc to point to the docs page. ghstack-source-id: 94055423 Test Plan: waitforbuildbot Differential Revision: D18541878 fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520	2019-11-15 21:10:53 -08:00
Rohan Varma	06ef4a757d	Add docs for RPC, dist autograd, and RRef modules (#29276 ) Summary: Closes https://github.com/pytorch/pytorch/issues/28983. Documentation for `torch.distributed.rpc` and `torch.distributed.autograd` modules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation. Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: https://github.com/pytorch/pytorch/issues/23110, https://github.com/pytorch/pytorch/issues/26759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29276 Differential Revision: D18478754 Pulled By: rohan-varma fbshipit-source-id: e9a7089baf5275304e5408d319eb9bf98e53fff8	2019-11-14 14:32:03 -08:00

42 Commits