pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Can Balioglu	e1db2f13ce	Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166 This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started. ghstack-source-id: 149778566 Test Plan: Run the existing unit tests. Reviewed By: rohan-varma Differential Revision: D34371226 fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b (cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)	2022-02-24 02:33:05 +00:00
Wanchao Liang	9b53d3194c	Implement gather primitive for ProcessGroupNCCL (#66745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66745 This PR implement NCCL gather and add gather to ProcessGroupNCCL using nccl send/recv api. NCCL doesn’t directly provide primitives for gather, so we need to be implemented on top of NCCL’s send/recv API. 1. In ProcessGroupNCCL.cpp, the outputTensors are first flattened, then inputTensors and outputFlattened are passed by the collective class to gather() function in nccl.cpp. 1. In nccl.cpp, gather is implemented using ncclSend/ncclRecv: all the ranks send inputTensor to the root rank, and the root rank uses a for loop to receive these inputTensors. ghstack-source-id: 147754838 Test Plan: test_gather_ops test_gather_checks test_gather_stress Reviewed By: pritamdamania87 Differential Revision: D29616361 fbshipit-source-id: b500d9b8e67113194c5cc6575fb0e5d806dc7782 (cherry picked from commit `d560ee732e`)	2022-01-27 19:37:55 +00:00
Shen Li	7bc220e060	Update distributed.rst for ProcessGroup Extensions (#71482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71482 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D33745986 Pulled By: mrshenli fbshipit-source-id: fe2d0491901bf00be09deb5c556bc1e1d359b725 (cherry picked from commit `be5104bfd7`)	2022-01-25 00:30:08 +00:00
mrshenli	b8c3693281	Remove autograd-enabled collective APIs from distributed docs (#69011 ) Summary: These APIs are not yet officially released and are still under discussion. Hence, this commit removes those APIs from docs and will add them back when ready. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69011 Reviewed By: fduwjj Differential Revision: D32703124 Pulled By: mrshenli fbshipit-source-id: ea049fc7ab6b0015d38cc40c5b5daf47803b7ea0	2021-11-29 18:14:50 -08:00
Yi Wang	7f25c3e666	Update distributed.rst to show that CUDA send/recv on GPU is supported (#65601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65601 I believe this feature was supported one year ago: https://github.com/pytorch/pytorch/pull/44921 #Closes: https://github.com/pytorch/pytorch/issues/65525 ghstack-source-id: 138918961 Test Plan: N/A Reviewed By: pritamdamania87, mingzhe09088 Differential Revision: D31163535 fbshipit-source-id: 9321a0a5137a3e265e2b54bd78730ac28c7acd55	2021-09-24 12:30:10 -07:00
Kiuk Chung	9d95d48567	(torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910 Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such: ``` $ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py ``` An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port. For details see: https://github.com/pytorch/pytorch/issues/63874. This change does a couple of things: 1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic. 1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function. 1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0). 1. Adds a bunch of unittests to cover the different code paths NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue. Test Plan: Unittests. Reviewed By: cbalioglu Differential Revision: D30529984 fbshipit-source-id: 267aea6d4dad73eb14a2680ac921f210ff547cc5	2021-08-25 22:57:43 -07:00
Howard Huang	cdc027679b	Add compare_set in distributed docs (#61351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61351 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29588206 Pulled By: H-Huang fbshipit-source-id: 9db48e7b6de29503275f10616470ad2d66b075f9	2021-07-08 12:30:32 -07:00
Rohan Varma	2f395f3b54	[reland] Document debugability features in torch.distributed (#59726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59726 Reland of https://github.com/pytorch/pytorch/pull/59604 with indentation fix ghstack-source-id: 130979356 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D29001923 fbshipit-source-id: 225d9dc5054c223b453f3b39749e2b62f61b9a2c	2021-06-09 16:40:11 -07:00
Luca Wehrstedt	f1786b293d	Revert D28972444: [pytorch][PR] Document debugability features in torch.distributed Test Plan: revert-hammer Differential Revision: D28972444 (`a9d2810817`) Original commit changeset: da5e8ee84f0d fbshipit-source-id: 94d3b3b75ddec74ea5b2b76f6a7519dc921ee2a7	2021-06-09 03:04:36 -07:00
Rohan Varma	a9d2810817	Document debugability features in torch.distributed (#59604 ) Summary: Adds comprehensive documentation around debugability features added to `torch.distributed` recently, including the `monitored_barrier` and TORCH_DISTRIBUTED_DEBUG env variable. ![dist_one](https://user-images.githubusercontent.com/8039770/121102672-0f052180-c7b3-11eb-974c-81dbbe102cb6.png) ![dist_two](https://user-images.githubusercontent.com/8039770/121102734-39ef7580-c7b3-11eb-94f7-c75469351440.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59604 Reviewed By: jbschlosser, SciPioneer Differential Revision: D28972444 Pulled By: rohan-varma fbshipit-source-id: da5e8ee84f0d6f252c703c4d70ff2a0d5817cc4e	2021-06-08 23:52:19 -07:00
Rohan Varma	071d49a970	Document monitored barrier (#58322 ) Summary: Will not land before the release, but would be good to have this function documented in master for its use in distributed debugability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58322 Reviewed By: SciPioneer Differential Revision: D28595405 Pulled By: rohan-varma fbshipit-source-id: fb00fa22fbe97a38c396eae98a904d1c4fb636fa	2021-05-21 19:04:57 -07:00
Rohan Varma	52bb8120b8	Mention distributed profiling in documentation (#58286 ) Summary: Added a simple section indicating distributed profiling is expected to work similar to other torch operators, and is supported for all communication backends out of the box. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58286 Reviewed By: bdhirsh Differential Revision: D28436489 Pulled By: rohan-varma fbshipit-source-id: ce1905a987c0ede8011e8086a2c30edc777b4a38	2021-05-14 09:43:00 -07:00
Alexander Golynski	bc30c3165c	Update docs for get_future support (#58107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58107 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28387374 Pulled By: agolynski fbshipit-source-id: 70052afbb0b07ba341ea55f7ec30f7d9759b7bd4	2021-05-12 18:29:28 -07:00
Wanchao Liang	270d675f86	update distributed doc table for alltoall nccl (#54277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54277 alltoall already supported in nccl backend, so update the doc to reflect it. Test Plan: Imported from OSS Reviewed By: divchenko Differential Revision: D27172904 Pulled By: wanchaol fbshipit-source-id: 9afa89583d56b247b2017ea2350936053eb30827	2021-03-19 15:35:10 -07:00
Stas Bekman	924c15c962	[doc] reorg dist init and non-init functions (#52976 ) Summary: This PR proposes to improve the distributed doc: * [x] putting the init functions together * [x] moving post-init functions into their own sub-section as they are only available after init and moving that group to after all init sub-sections If this is too much, could we at least put these 2 functions together: ``` .. autofunction:: init_process_group .. autofunction:: is_initialized ``` as they are interconnected. and the other functions are not alphabetically sorted in the first place. Thank you. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52976 Reviewed By: albanD Differential Revision: D26993933 Pulled By: mrshenli fbshipit-source-id: 7cacbe28172ebb5849135567b1d734870b49de77	2021-03-12 08:48:18 -08:00
Joe Zhu	f2b43ddbf4	Update api doc for enabling TcpStore on Windows (#51847 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c	2021-02-11 14:44:03 -08:00
Nikita Shulga	76c6e12a5c	Minor spelling updates (#52149 ) Summary: Add space between 'e.g.' and 'build' 'pacakge'->'package' Pull Request resolved: https://github.com/pytorch/pytorch/pull/52149 Reviewed By: osalpekar Differential Revision: D26405824 Pulled By: malfet fbshipit-source-id: 386390d3f31a9fc268b05902b9dca1deeaf626f9	2021-02-11 12:36:27 -08:00
Emilio Castillo	233e4ebdb6	Implement autograd functions for c10d communication operations (#40762 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40702, Fixes https://github.com/pytorch/pytorch/issues/40690 Currently wip. But I would appreciate some feedback. Functions should be double-differentiable. Contrary to `b35cdc5200/torch/nn/parallel/_functions.py` This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct? Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/40762 Reviewed By: glaringlee Differential Revision: D24758889 Pulled By: mrshenli fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce	2021-01-26 07:52:51 -08:00
Rohan Varma	d6b5f3ad98	Add object-based collective APIs to public docs (#48909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48909 Adds these new APIs to the documentation ghstack-source-id: 117965961 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25363279 fbshipit-source-id: af6889d377f7b5f50a1a77a36ab2f700e5040150	2020-12-07 14:30:25 -08:00
Rohan Varma	362d9a932e	Remove object-based collective APIs from public docs (#46075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46075 Removes these from public docs for now as we are still iterating/formalizing these APIs. Will add them back once they are part of a PyTorch release. ghstack-source-id: 113928700 Test Plan: CI Reviewed By: mrshenli Differential Revision: D24211510 fbshipit-source-id: 3e36ff6990cf8e6ef72b6e524322ae06f9097aa2	2020-10-09 09:24:51 -07:00
Rohan Varma	154347d82f	Fix distributed documentation for asynchronous collective Work objects (#45709 ) Summary: Closes https://github.com/pytorch/pytorch/issues/42247. Clarifies some documentation related to `Work` object semantics (outputs of async collective functions). Clarifies the difference between CPU operations and CUDA operations (on Gloo or NCCL backend), and provides an example where the difference in CUDA operation's wait() semantics is necessary to understand for correct code. ![sync](https://user-images.githubusercontent.com/8039770/94875710-6f64e780-040a-11eb-8fb5-e94fd53534e5.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45709 Reviewed By: ngimel Differential Revision: D24171256 Pulled By: rohan-varma fbshipit-source-id: 6365a569ef477b59eb2ac0a8a9a1c1f34eb60e22	2020-10-07 19:59:51 -07:00
Omkar Salpekar	3799ba83e5	[Docs] Adding Store API Docs (#45543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45543 This PR adds documentation for the c10d Store to the public docs. Previously these docs were missing although we exposed a lightly-used (but potentially useful) Python API for our distributed key-value store. ghstack-source-id: 113409195 Test Plan: Will verify screenshots by building the docs. Reviewed By: pritamdamania87 Differential Revision: D24005598 fbshipit-source-id: 45c3600e7c3f220710e99a0483a9ce921d75d044	2020-10-02 11:16:56 -07:00
gunandrose4u	47debdca42	Document change for DDP enabled on Windows platform (#45392 ) Summary: Document change for DDP enabled on Windows platform Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392 Reviewed By: gchanan Differential Revision: D23962344 Pulled By: mrshenli fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c	2020-09-28 13:22:42 -07:00
Rohan Varma	fbea2ee917	broadcast_object API for c10d (#43887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as https://github.com/pytorch/pytorch/pull/42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111180436 Reviewed By: mrshenli Differential Revision: D23422577 fbshipit-source-id: fa700abb86eff7128dc29129a0823e83caf4ab0e	2020-09-01 18:54:17 -07:00
Shen Li	2f52748515	Publish all_gather_object and gather_object docs (#43772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43772 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23398495 Pulled By: rohan-varma fbshipit-source-id: 032e1d628c0c0f2dec297226167471698c56b605	2020-08-31 13:28:00 -07:00
Shen Li	0edbe6b063	Add a link in RPC doc page to point to PT Distributed overview (#41108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108 Test Plan: Imported from OSS Differential Revision: D22440751 Pulled By: mrshenli fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb	2020-07-08 14:00:05 -07:00
Shen Li	b982a6a247	Expose torch.distributed.is_available() API (#37021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37021 Test Plan: Imported from OSS Differential Revision: D21164318 Pulled By: mrshenli fbshipit-source-id: 08a446af342cbe54f3eb4994956ffa7ef4922bcf	2020-04-21 18:38:46 -07:00
Orion Reblitz-Richardson	2d8dbcd3ef	Remove python2 and 3.5 from requirements.txt, README and docs (#35677 ) Summary: Some more cleanup now that we no longer support python2 or 3.5 on master and eventually PyTorch 1.6 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35677 Differential Revision: D20838097 Pulled By: orionr fbshipit-source-id: 95d553a1e8769f3baa395e0bc6d4ce7cd93236e9	2020-04-03 11:05:43 -07:00
Feng Tian	762270c51f	add c10d dynamic loading mechanism and unit test (#28068 ) Summary: The original behavior of pytorch c10d only supports built-in c10d backends, such as nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically loading 3rd party communication libraries which are derived from ProcessGroup base class. related RFC is in: https://github.com/pytorch/pytorch/issues/27955 Through this way, user just need specify a 3rd party c10d backend name when invoking torch.distributed.init_process_group(). The proposed logic will try to load corresponding c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068 Differential Revision: D19174838 Pulled By: agolynski fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62	2020-04-02 15:46:51 -07:00
Dhiraj D Kalamkar	945d7a7408	Add All-to-all comms support to distributed module and MPI backend (#32361 ) Summary: As described in https://github.com/pytorch/pytorch/issues/32345, a prototype implementation to add an alltoall communication primitive to torch.distributed module and ProcessGroup abstract interface. Also, implements alltoall in ProcessGroupMPI backend. mnaumovfb JianpingChen066 dmudiger srinivas212 Jianhui-Li mshiryaev ftian1 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini xush6528 osalpekar Pull Request resolved: https://github.com/pytorch/pytorch/pull/32361 Reviewed By: mrshenli Differential Revision: D20635481 Pulled By: srinivas212 fbshipit-source-id: 3dd0af800ce55d02f02813cde550e3a0f1a287d2	2020-04-01 08:57:12 -07:00
Yuichiro Ueno	aadd0fda8b	Document reduce_scatter collective operation (#35274 ) Summary: I don't know why reduce_scatter collective operation is not documented so I add it to the document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35274 Differential Revision: D20645850 Pulled By: mrshenli fbshipit-source-id: 0a4458bff1a4e15a4593dd4dcc25e4e0f6e2265d	2020-03-25 13:36:29 -07:00
Xiang Gao	df8d6eeb19	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-20 20:06:37 -07:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
zou3519	23bffc4f14	Fix most documentation warnings (#27782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782 Warnings show up when running `make html` to build documentation. All of the warnings are very reasonable and point to bugs in our docs. This PR attempts to fix most of those warnings. In the future we will add something to the CI that asserts that there are no warnings in our docs. Test Plan: - build and view changes locally Differential Revision: D17887067 Pulled By: zou3519 fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166	2019-10-13 10:34:01 -07:00
Yuxin Wu	23f963e4a8	Update distributed.rst (#23289 ) Summary: Different backend is supported since https://github.com/pytorch/pytorch/pull/18595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289 Differential Revision: D16528229 Pulled By: soumith fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0	2019-07-26 16:55:52 -07:00
Pieter Noordhuis	95e822622b	Enhance interpretation of GLOO_SOCKET_IFNAME (#22978 ) Summary: With this change you can now list multiple interfaces separated by comma. ProcessGroupGloo creates a single Gloo context for every device in the list (a context represents a connection to every other rank). For every collective that is called, it will select the context in a round robin fashion. The number of worker threads responsible for executing the collectives is set to be twice the number of devices. If you have a single physical interface, and wish to employ increased parallelism, you can also specify `GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`. This makes ProcessGroupGloo use 4 connections per rank, 4 I/O threads, and 8 worker threads responsible for executing the collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978 ghstack-source-id: 87006270 Differential Revision: D16339962 fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1	2019-07-25 04:52:38 -07:00
Seungwon Park	6c7135decb	fix typo: pytoch -> pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719 Differential Revision: D15080095 Pulled By: ezyang fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af	2019-04-25 06:40:40 -07:00
Teng Li	2d3cf98b49	Making dist.get_default_group private for PT1 release (#14767 ) Summary: When I wrote the frontend API, it is designed on not letting users use the default_group directly on any functions. It should really be private. All collectives are supposed to either use group.WORLD, or anything that comes out of new_group. That was the initial design. We need to make a TODO on removing group.WORLD one day. It exists for backward compatibility reasons and adds lots of complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14767 Reviewed By: pietern Differential Revision: D13330655 Pulled By: teng-li fbshipit-source-id: ace107e1c3a9b3910a300b22815a9e8096fafb1c	2018-12-04 19:22:24 -08:00
Pieter Noordhuis	3648c269e9	Misc distributed documentation updates (#14605 ) Summary: * s/environmental/environment/g * Casing (CUDA, InfiniBand, Ethernet) * Don't embed torch.multiprocessing.spawn but link to it (not part of the package) * spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605 Differential Revision: D13273480 Pulled By: pietern fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1	2018-11-29 21:51:43 -08:00
Teng Li	2b7345bcd5	PT1 distributed doc update (#14530 ) Summary: Removed an incorrect section. We don't support this. I wrote this from my memory :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530 Differential Revision: D13253471 Pulled By: teng-li fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15	2018-11-29 17:50:47 -08:00
Teng Li	a38ed0268e	PT1 Stable Release Distributed Documentation (#14444 ) Summary: The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080 Tested by previewing the sphinx generated webpages. All look good. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444 Differential Revision: D13227675 Pulled By: teng-li fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86	2018-11-28 00:34:11 -08:00
Tongzhou Wang	044d00516c	Rename DistBackend -> Backend (#11830 ) Summary: Also add docs for get_backend, Backend, and reduce_op fixes #11803 cc The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11830 Differential Revision: D9927991 Pulled By: SsnL fbshipit-source-id: a2ffb70826241ba84264f36f2cb173e00b19af48	2018-11-07 11:58:12 -08:00
Teng Li	3d5fd12488	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 ) Summary: This is the new documentation for c10d release, and it also deprecates the old torch.distributed document. This PR depends on https://github.com/pytorch/pytorch/pull/11405 and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450 Differential Revision: D9765504 Pulled By: teng-li fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db	2018-09-11 02:10:28 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Soumith Chintala	d4f6c84041	fix nccl distributed documentation	2018-05-17 18:03:54 -04:00
Teng Li	f5beff334b	Added distributed docs on NCCL2 backend/functions and launch module (#6579 )	2018-04-15 21:53:10 -04:00
Scott Sievert	3821fca0c6	DOC: i{send, recv} message order with MPI backend	2017-09-14 20:38:11 -04:00
Brett Koonce	08b4770adf	minor spelling, intialize->initialize	2017-09-14 15:13:01 -04:00
Soumith Chintala	4fec5f658b	add Bilinear to docs, fix reference	2017-09-11 20:12:27 -04:00
jekbradbury	7aa6bc516f	add "Basics" section to distributed docs (#2433 )	2017-08-24 17:07:20 -04:00
Kai Arulkumaran	11a14fd0fd	Clarifications on setting up torch.distributed (#2475 )	2017-08-18 09:21:04 -04:00
Adam Paszke	4f035f14de	Add a support matrix for distributed backends	2017-07-21 14:19:46 -04:00
Soumith Chintala	81fd2bf2d0	fix some language / typos	2017-07-12 14:47:36 -04:00
Adam Paszke	8915e2710c	Refactor scatter/gather and add distributed docs	2017-07-12 14:47:36 -04:00

1 2 3

104 Commits