Commit Graph

34 Commits

Author SHA1 Message Date
Peter Goldsborough
7ddc6f84c4 NULL -> nullptr (#11047)
Summary:
How did we get so many uses of `NULL` again?

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047

Differential Revision: D9566799

Pulled By: goldsborough

fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
2018-08-30 16:25:42 -07:00
Edward Yang
227635142f Delete THD master_worker (#10731)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731

Differential Revision: D9423675

Pulled By: ezyang

fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc
2018-08-22 08:54:36 -07:00
Kaiyu Shi
342dbcc35a Remove legacy redundant codes (#9252)
Summary:
Fix #9167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9252

Differential Revision: D8774644

Pulled By: soumith

fbshipit-source-id: 0b004f497026bca3b101c577e78aec22bdc3df51
2018-07-09 16:55:28 -07:00
Soumith Chintala
dc186cc9fe
Remove NO_* and WITH_* across codebase, except in setup.py (#8555)
* remove legacy options from CMakeLists

* codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY

* cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA

* removed NO_* variables and hotpatch them only in setup.py

* fix lint
2018-06-15 12:29:48 -04:00
Peter Goldsborough
04a3616de0 Replace std::size_t with size_t (#8093) 2018-06-04 11:10:44 -04:00
Luca Antiga
0703357723 Don't build THD/master_worker if not explicitly requested (#7081) 2018-04-29 13:17:09 -04:00
Zachary DeVito
d985cf46f1
Add workaround to fix include warnings in Python 2 builds. (#6716) 2018-04-24 12:30:19 -07:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
Christian Sarofeen
6db9f6dc78 Enable half communication for distributed (#4091) 2017-12-13 13:00:12 +01:00
Teng Li
926ed2b280 Implemented NCCL Distributed Backend for PyTorch with new dist APIs (#3435)
* Implemented NCCL Distributed Backend for PyTorch with new dist APIs

* Let FindNCCL to determine the NCCL version

* Let NCCL2 Backend use ATEN instead deprecated THPP

* Let distributed parallel model use a single reduction thread for NCCL backend

* Caching the sockets, bug fix, refactoring, and addressed Adam's comments

* Make BcastNcclID take a single param and bug fix for all_gather

* Removed barrier function, added warning for users, and not exposing experimental func to users

* Use the simplest single bucket working solution for distriubted data parallel model with rebase

* Cleanup, fixes and further addressed Adam's comments

* Used PySequence_Fast in distributed csrc

* Removed the limitation that each group is only bound to a given device sequence

* Used THPObjectPtr for PySequence_Fast
2017-11-29 15:57:02 -05:00
Trevor Killeen
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
Edward Z. Yang
3696300fcf Include Python.h less using a new stub header.
In many "non-Python" headers, we include Python.h because we need
to declare a pointer to PyObject, and solely because of that.  It
would be a lot better if we had a simpler version of Python.h that
just declared PyObject available for pointers, without anything
else.  This is what torch/csrc/utils/python_stub.h does.

The good thing about not including Python.h is that it is easy to
be warning-less; no more ugly insertions of Python.h on headers
where it has no good reason to be.

This makes PyTorch warning clean again.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
Adam Paszke
2a8603c5e1 Make distributed recv return sender rank 2017-09-25 12:11:52 -04:00
Adam Paszke
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
Adam Paszke
b37f18be53 Free GIL when entering THD functions 2017-06-12 21:58:38 -04:00
Adam Paszke
095ddc7d08 THD updates and bug fixes
* Add keepdim
* Fix DataChannel signature
* Fix incorrect locking
* Use current stream in DataChannelGloo
2017-06-12 21:58:38 -04:00
Adam Paszke
c6c9e61169 Implement THD tensor copies 2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
c41555fb0a Add rank parameter; Fix MW mode initalization 2017-06-02 23:42:11 +02:00
Adam Paszke
447d9287bf Refactor multicast and change env init method 2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
e685277299 Add address discovery; Bug fixes; 2017-06-02 23:42:11 +02:00
Adam Paszke
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz
09c0d9c51c Add multiple initalization methods for DataChannels 2017-06-02 23:42:11 +02:00
Adam Paszke
181d2f41bd Add initial Python wrappers for THDTensors 2017-06-02 23:42:11 +02:00
Adam Paszke
4ebf3ff46d Add base for CUDA allReduce and broadcast in DataChannelGloo 2017-05-01 01:49:10 -07:00
Adam Paszke
7e8830c3d5 Initial gloo bindings 2017-05-01 01:49:09 -07:00
Sam Gross
4c1cdb6148 Refactor Python string utility function 2017-04-28 21:25:26 +02:00
Adam Paszke
79232c24e2 Fixes after rebase 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
962084c8e8 Add Data Channel receive from any source (#52) 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
76520512e7 DataChannel tests rewrite (#42); DataChannel isend and irecv implementation (#44) 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
ac1f68127a Add barrier, scatter, gather and allGather implementations + groups (#34) 2017-01-31 01:58:09 +01:00
Adam Paszke
60d1852c7b Major improvements to master-worker mode
* Fixed all undefined symbol errors
* Implemented storage interface and THStorage class
* RPC improvements
* Code refactor
2017-01-31 01:58:09 +01:00
Adam Paszke
ea876eb6d5 Add initial bindings for master-worker mode 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
5e6fcd02b5 Implement data channel groups (#25) 2017-01-31 01:58:09 +01:00
Adam Paszke
55632d81d2 Add Python wrappers for process group mode 2017-01-31 01:58:09 +01:00