pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Rohan Varma	071d49a970	Document monitored barrier (#58322 ) Summary: Will not land before the release, but would be good to have this function documented in master for its use in distributed debugability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58322 Reviewed By: SciPioneer Differential Revision: D28595405 Pulled By: rohan-varma fbshipit-source-id: fb00fa22fbe97a38c396eae98a904d1c4fb636fa	2021-05-21 19:04:57 -07:00
Rohan Varma	52bb8120b8	Mention distributed profiling in documentation (#58286 ) Summary: Added a simple section indicating distributed profiling is expected to work similar to other torch operators, and is supported for all communication backends out of the box. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58286 Reviewed By: bdhirsh Differential Revision: D28436489 Pulled By: rohan-varma fbshipit-source-id: ce1905a987c0ede8011e8086a2c30edc777b4a38	2021-05-14 09:43:00 -07:00
Alexander Golynski	bc30c3165c	Update docs for get_future support (#58107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58107 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28387374 Pulled By: agolynski fbshipit-source-id: 70052afbb0b07ba341ea55f7ec30f7d9759b7bd4	2021-05-12 18:29:28 -07:00
Wanchao Liang	270d675f86	update distributed doc table for alltoall nccl (#54277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54277 alltoall already supported in nccl backend, so update the doc to reflect it. Test Plan: Imported from OSS Reviewed By: divchenko Differential Revision: D27172904 Pulled By: wanchaol fbshipit-source-id: 9afa89583d56b247b2017ea2350936053eb30827	2021-03-19 15:35:10 -07:00
Stas Bekman	924c15c962	[doc] reorg dist init and non-init functions (#52976 ) Summary: This PR proposes to improve the distributed doc: * [x] putting the init functions together * [x] moving post-init functions into their own sub-section as they are only available after init and moving that group to after all init sub-sections If this is too much, could we at least put these 2 functions together: ``` .. autofunction:: init_process_group .. autofunction:: is_initialized ``` as they are interconnected. and the other functions are not alphabetically sorted in the first place. Thank you. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52976 Reviewed By: albanD Differential Revision: D26993933 Pulled By: mrshenli fbshipit-source-id: 7cacbe28172ebb5849135567b1d734870b49de77	2021-03-12 08:48:18 -08:00
Joe Zhu	f2b43ddbf4	Update api doc for enabling TcpStore on Windows (#51847 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c	2021-02-11 14:44:03 -08:00
Nikita Shulga	76c6e12a5c	Minor spelling updates (#52149 ) Summary: Add space between 'e.g.' and 'build' 'pacakge'->'package' Pull Request resolved: https://github.com/pytorch/pytorch/pull/52149 Reviewed By: osalpekar Differential Revision: D26405824 Pulled By: malfet fbshipit-source-id: 386390d3f31a9fc268b05902b9dca1deeaf626f9	2021-02-11 12:36:27 -08:00
Emilio Castillo	233e4ebdb6	Implement autograd functions for c10d communication operations (#40762 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40702, Fixes https://github.com/pytorch/pytorch/issues/40690 Currently wip. But I would appreciate some feedback. Functions should be double-differentiable. Contrary to `b35cdc5200/torch/nn/parallel/_functions.py` This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct? Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/40762 Reviewed By: glaringlee Differential Revision: D24758889 Pulled By: mrshenli fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce	2021-01-26 07:52:51 -08:00
Rohan Varma	d6b5f3ad98	Add object-based collective APIs to public docs (#48909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48909 Adds these new APIs to the documentation ghstack-source-id: 117965961 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25363279 fbshipit-source-id: af6889d377f7b5f50a1a77a36ab2f700e5040150	2020-12-07 14:30:25 -08:00
Rohan Varma	362d9a932e	Remove object-based collective APIs from public docs (#46075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46075 Removes these from public docs for now as we are still iterating/formalizing these APIs. Will add them back once they are part of a PyTorch release. ghstack-source-id: 113928700 Test Plan: CI Reviewed By: mrshenli Differential Revision: D24211510 fbshipit-source-id: 3e36ff6990cf8e6ef72b6e524322ae06f9097aa2	2020-10-09 09:24:51 -07:00
Rohan Varma	154347d82f	Fix distributed documentation for asynchronous collective Work objects (#45709 ) Summary: Closes https://github.com/pytorch/pytorch/issues/42247. Clarifies some documentation related to `Work` object semantics (outputs of async collective functions). Clarifies the difference between CPU operations and CUDA operations (on Gloo or NCCL backend), and provides an example where the difference in CUDA operation's wait() semantics is necessary to understand for correct code. ![sync](https://user-images.githubusercontent.com/8039770/94875710-6f64e780-040a-11eb-8fb5-e94fd53534e5.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45709 Reviewed By: ngimel Differential Revision: D24171256 Pulled By: rohan-varma fbshipit-source-id: 6365a569ef477b59eb2ac0a8a9a1c1f34eb60e22	2020-10-07 19:59:51 -07:00
Omkar Salpekar	3799ba83e5	[Docs] Adding Store API Docs (#45543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45543 This PR adds documentation for the c10d Store to the public docs. Previously these docs were missing although we exposed a lightly-used (but potentially useful) Python API for our distributed key-value store. ghstack-source-id: 113409195 Test Plan: Will verify screenshots by building the docs. Reviewed By: pritamdamania87 Differential Revision: D24005598 fbshipit-source-id: 45c3600e7c3f220710e99a0483a9ce921d75d044	2020-10-02 11:16:56 -07:00
gunandrose4u	47debdca42	Document change for DDP enabled on Windows platform (#45392 ) Summary: Document change for DDP enabled on Windows platform Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392 Reviewed By: gchanan Differential Revision: D23962344 Pulled By: mrshenli fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c	2020-09-28 13:22:42 -07:00
Rohan Varma	fbea2ee917	broadcast_object API for c10d (#43887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as https://github.com/pytorch/pytorch/pull/42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111180436 Reviewed By: mrshenli Differential Revision: D23422577 fbshipit-source-id: fa700abb86eff7128dc29129a0823e83caf4ab0e	2020-09-01 18:54:17 -07:00
Shen Li	2f52748515	Publish all_gather_object and gather_object docs (#43772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43772 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23398495 Pulled By: rohan-varma fbshipit-source-id: 032e1d628c0c0f2dec297226167471698c56b605	2020-08-31 13:28:00 -07:00
Shen Li	0edbe6b063	Add a link in RPC doc page to point to PT Distributed overview (#41108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108 Test Plan: Imported from OSS Differential Revision: D22440751 Pulled By: mrshenli fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb	2020-07-08 14:00:05 -07:00
Shen Li	b982a6a247	Expose torch.distributed.is_available() API (#37021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37021 Test Plan: Imported from OSS Differential Revision: D21164318 Pulled By: mrshenli fbshipit-source-id: 08a446af342cbe54f3eb4994956ffa7ef4922bcf	2020-04-21 18:38:46 -07:00
Orion Reblitz-Richardson	2d8dbcd3ef	Remove python2 and 3.5 from requirements.txt, README and docs (#35677 ) Summary: Some more cleanup now that we no longer support python2 or 3.5 on master and eventually PyTorch 1.6 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35677 Differential Revision: D20838097 Pulled By: orionr fbshipit-source-id: 95d553a1e8769f3baa395e0bc6d4ce7cd93236e9	2020-04-03 11:05:43 -07:00
Feng Tian	762270c51f	add c10d dynamic loading mechanism and unit test (#28068 ) Summary: The original behavior of pytorch c10d only supports built-in c10d backends, such as nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically loading 3rd party communication libraries which are derived from ProcessGroup base class. related RFC is in: https://github.com/pytorch/pytorch/issues/27955 Through this way, user just need specify a 3rd party c10d backend name when invoking torch.distributed.init_process_group(). The proposed logic will try to load corresponding c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068 Differential Revision: D19174838 Pulled By: agolynski fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62	2020-04-02 15:46:51 -07:00
Dhiraj D Kalamkar	945d7a7408	Add All-to-all comms support to distributed module and MPI backend (#32361 ) Summary: As described in https://github.com/pytorch/pytorch/issues/32345, a prototype implementation to add an alltoall communication primitive to torch.distributed module and ProcessGroup abstract interface. Also, implements alltoall in ProcessGroupMPI backend. mnaumovfb JianpingChen066 dmudiger srinivas212 Jianhui-Li mshiryaev ftian1 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini xush6528 osalpekar Pull Request resolved: https://github.com/pytorch/pytorch/pull/32361 Reviewed By: mrshenli Differential Revision: D20635481 Pulled By: srinivas212 fbshipit-source-id: 3dd0af800ce55d02f02813cde550e3a0f1a287d2	2020-04-01 08:57:12 -07:00
Yuichiro Ueno	aadd0fda8b	Document reduce_scatter collective operation (#35274 ) Summary: I don't know why reduce_scatter collective operation is not documented so I add it to the document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35274 Differential Revision: D20645850 Pulled By: mrshenli fbshipit-source-id: 0a4458bff1a4e15a4593dd4dcc25e4e0f6e2265d	2020-03-25 13:36:29 -07:00
Xiang Gao	df8d6eeb19	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-20 20:06:37 -07:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
zou3519	23bffc4f14	Fix most documentation warnings (#27782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782 Warnings show up when running `make html` to build documentation. All of the warnings are very reasonable and point to bugs in our docs. This PR attempts to fix most of those warnings. In the future we will add something to the CI that asserts that there are no warnings in our docs. Test Plan: - build and view changes locally Differential Revision: D17887067 Pulled By: zou3519 fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166	2019-10-13 10:34:01 -07:00
Yuxin Wu	23f963e4a8	Update distributed.rst (#23289 ) Summary: Different backend is supported since https://github.com/pytorch/pytorch/pull/18595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289 Differential Revision: D16528229 Pulled By: soumith fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0	2019-07-26 16:55:52 -07:00
Pieter Noordhuis	95e822622b	Enhance interpretation of GLOO_SOCKET_IFNAME (#22978 ) Summary: With this change you can now list multiple interfaces separated by comma. ProcessGroupGloo creates a single Gloo context for every device in the list (a context represents a connection to every other rank). For every collective that is called, it will select the context in a round robin fashion. The number of worker threads responsible for executing the collectives is set to be twice the number of devices. If you have a single physical interface, and wish to employ increased parallelism, you can also specify `GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`. This makes ProcessGroupGloo use 4 connections per rank, 4 I/O threads, and 8 worker threads responsible for executing the collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978 ghstack-source-id: 87006270 Differential Revision: D16339962 fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1	2019-07-25 04:52:38 -07:00
Seungwon Park	6c7135decb	fix typo: pytoch -> pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719 Differential Revision: D15080095 Pulled By: ezyang fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af	2019-04-25 06:40:40 -07:00
Teng Li	2d3cf98b49	Making dist.get_default_group private for PT1 release (#14767 ) Summary: When I wrote the frontend API, it is designed on not letting users use the default_group directly on any functions. It should really be private. All collectives are supposed to either use group.WORLD, or anything that comes out of new_group. That was the initial design. We need to make a TODO on removing group.WORLD one day. It exists for backward compatibility reasons and adds lots of complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14767 Reviewed By: pietern Differential Revision: D13330655 Pulled By: teng-li fbshipit-source-id: ace107e1c3a9b3910a300b22815a9e8096fafb1c	2018-12-04 19:22:24 -08:00
Pieter Noordhuis	3648c269e9	Misc distributed documentation updates (#14605 ) Summary: * s/environmental/environment/g * Casing (CUDA, InfiniBand, Ethernet) * Don't embed torch.multiprocessing.spawn but link to it (not part of the package) * spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605 Differential Revision: D13273480 Pulled By: pietern fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1	2018-11-29 21:51:43 -08:00
Teng Li	2b7345bcd5	PT1 distributed doc update (#14530 ) Summary: Removed an incorrect section. We don't support this. I wrote this from my memory :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530 Differential Revision: D13253471 Pulled By: teng-li fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15	2018-11-29 17:50:47 -08:00
Teng Li	a38ed0268e	PT1 Stable Release Distributed Documentation (#14444 ) Summary: The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080 Tested by previewing the sphinx generated webpages. All look good. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444 Differential Revision: D13227675 Pulled By: teng-li fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86	2018-11-28 00:34:11 -08:00
Tongzhou Wang	044d00516c	Rename DistBackend -> Backend (#11830 ) Summary: Also add docs for get_backend, Backend, and reduce_op fixes #11803 cc The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11830 Differential Revision: D9927991 Pulled By: SsnL fbshipit-source-id: a2ffb70826241ba84264f36f2cb173e00b19af48	2018-11-07 11:58:12 -08:00
Teng Li	3d5fd12488	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 ) Summary: This is the new documentation for c10d release, and it also deprecates the old torch.distributed document. This PR depends on https://github.com/pytorch/pytorch/pull/11405 and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450 Differential Revision: D9765504 Pulled By: teng-li fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db	2018-09-11 02:10:28 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Soumith Chintala	d4f6c84041	fix nccl distributed documentation	2018-05-17 18:03:54 -04:00
Teng Li	f5beff334b	Added distributed docs on NCCL2 backend/functions and launch module (#6579 )	2018-04-15 21:53:10 -04:00
Scott Sievert	3821fca0c6	DOC: i{send, recv} message order with MPI backend	2017-09-14 20:38:11 -04:00
Brett Koonce	08b4770adf	minor spelling, intialize->initialize	2017-09-14 15:13:01 -04:00
Soumith Chintala	4fec5f658b	add Bilinear to docs, fix reference	2017-09-11 20:12:27 -04:00
jekbradbury	7aa6bc516f	add "Basics" section to distributed docs (#2433 )	2017-08-24 17:07:20 -04:00
Kai Arulkumaran	11a14fd0fd	Clarifications on setting up torch.distributed (#2475 )	2017-08-18 09:21:04 -04:00
Adam Paszke	4f035f14de	Add a support matrix for distributed backends	2017-07-21 14:19:46 -04:00
Soumith Chintala	81fd2bf2d0	fix some language / typos	2017-07-12 14:47:36 -04:00
Adam Paszke	8915e2710c	Refactor scatter/gather and add distributed docs	2017-07-12 14:47:36 -04:00

1 2

94 Commits