Commit Graph

31 Commits

Author SHA1 Message Date
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Tongzhou Wang
540ef9b1fc Add distributed get_backend (#11715)
Summary:
I have no idea how to run distributed tests locally so I'll let CI do this. Hopefully everything still works with `IntEnum`.

cc mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11715

Reviewed By: pietern

Differential Revision: D9889646

Pulled By: SsnL

fbshipit-source-id: 1e2a487cb6fe0bd4cc67501c9d72a295c35693e2
2018-09-18 10:56:24 -07:00
Teng Li
0988bbad2d C10d release to torch.distributed for PT1 (#11405)
Summary:
The old `torch.distributed` will go to `torch.distributed.deprecated`
The old DDP will go to `torch.nn.parallel.deprecated`

Now `torch.nn.parallel.DDP` will use c10d DDP
Now `torch.distributed` will use C10d frontend API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405

Reviewed By: pietern

Differential Revision: D9733733

Pulled By: teng-li

fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08
2018-09-10 23:27:22 -07:00
Tongzhou Wang
8e33451e2e Make torch.cuda.* take device objects; Update distributed docs (#10833)
Summary:
Commits:

1. Make `torch.cuda.*` take device objects
2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833

Differential Revision: D9514241

Pulled By: SsnL

fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e
2018-08-27 15:24:42 -07:00
Tongzhou Wang
db7b7f1359 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686

Differential Revision: D9399874

Pulled By: SsnL

fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa
2018-08-20 10:40:55 -07:00
Tongzhou Wang
3f603eeee8 some improvements on distributed docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666

Differential Revision: D9395242

Pulled By: SsnL

fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956
2018-08-19 17:40:28 -07:00
Ailing Zhang
371a786b18 Errors out when Openmpi < 2.x.x with distributed. (#10015)
Summary:
This PR fixes #9418 .
Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version.
I've tested on 2.1.1 and 3.0.0 and they work well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015

Reviewed By: soumith

Differential Revision: D9088103

Pulled By: ailzhang

fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045
2018-07-31 12:24:40 -07:00
Teng Li
f5beff334b Added distributed docs on NCCL2 backend/functions and launch module (#6579) 2018-04-15 21:53:10 -04:00
Teng Li
5c65466b86 Release NCCL distributed backend from experimental (#4921)
* Release NCCL distributed backend from experimental

* fix typo
2018-01-30 16:21:21 +01:00
Teng Li
a3b098dcf9 Adding is process_group initialized support (#4618) 2018-01-12 22:56:54 +01:00
Teng Li
926ed2b280 Implemented NCCL Distributed Backend for PyTorch with new dist APIs (#3435)
* Implemented NCCL Distributed Backend for PyTorch with new dist APIs

* Let FindNCCL to determine the NCCL version

* Let NCCL2 Backend use ATEN instead deprecated THPP

* Let distributed parallel model use a single reduction thread for NCCL backend

* Caching the sockets, bug fix, refactoring, and addressed Adam's comments

* Make BcastNcclID take a single param and bug fix for all_gather

* Removed barrier function, added warning for users, and not exposing experimental func to users

* Use the simplest single bucket working solution for distriubted data parallel model with rebase

* Cleanup, fixes and further addressed Adam's comments

* Used PySequence_Fast in distributed csrc

* Removed the limitation that each group is only bound to a given device sequence

* Used THPObjectPtr for PySequence_Fast
2017-11-29 15:57:02 -05:00
Adam Paszke
2a8603c5e1 Make distributed recv return sender rank 2017-09-25 12:11:52 -04:00
Scott Sievert
dd27997aeb DOC: adding note about distributed MPI backend (#2750) 2017-09-15 13:47:35 -04:00
Zhou Mo
2c07f88ea3 Fix typos. 2017-08-25 14:27:07 -04:00
Gregory Chanan
50c208a50b Revert "Fix typos."
This reverts commit 4622b33952.
2017-08-10 13:57:00 -04:00
Zhou Mo
4622b33952 Fix typos. 2017-08-08 11:05:38 -04:00
Adam Paszke
575a4a98e0 Remove assertions with side effects 2017-07-20 01:45:57 -04:00
Adam Paszke
8915e2710c Refactor scatter/gather and add distributed docs 2017-07-12 14:47:36 -04:00
Adam Paszke
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
Adam Paszke
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
Adam Paszke
5a0d5ec058 Add more checks in torch.distributed 2017-06-12 21:58:38 -04:00
Janusz Marcinkiewicz
34804e9600 Refactor file and tcp init methods
* Add sanity checks
 * Refactor InitMethodFile and TCPInitMethod to more logical functions
 * Update few error messages
 * Add passing parameters by **kwargs, so now order of parameters is not relevant
 * Review comments
2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
c41555fb0a Add rank parameter; Fix MW mode initalization 2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
e685277299 Add address discovery; Bug fixes; 2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
09c0d9c51c Add multiple initalization methods for DataChannels 2017-06-02 23:42:11 +02:00
Adam Paszke
79232c24e2 Fixes after rebase 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
ac1f68127a Add barrier, scatter, gather and allGather implementations + groups (#34) 2017-01-31 01:58:09 +01:00
Adam Paszke
60d1852c7b Major improvements to master-worker mode
* Fixed all undefined symbol errors
* Implemented storage interface and THStorage class
* RPC improvements
* Code refactor
2017-01-31 01:58:09 +01:00
Adam Paszke
ea876eb6d5 Add initial bindings for master-worker mode 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
5e6fcd02b5 Implement data channel groups (#25) 2017-01-31 01:58:09 +01:00
Adam Paszke
55632d81d2 Add Python wrappers for process group mode 2017-01-31 01:58:09 +01:00