Commit Graph

105 Commits

Author SHA1 Message Date
Aapo Kyrola
42279a610c use Pieter-MPI and fb.distributed
Summary:
Remove MPI and use fb.distributed rendezvous and Pieter's new Ops.

One now can pass a 'rendezvous' struct to data_parallel_model to initiate distributed SyncSGD. Provided rendezvoud implementation uses the kv-store handler of fb.distributed to disseminate information about other hosts. We can easily add other rendezvous, such as file-based, but that is topic of another diff.

Removing MPI allowed also simplifiying of Xray startup scripts, which are included in this diff.

When accepted, I will work on a simple example code so others can use this stuff as well. Also Flow implementation will be topic of next week.

Differential Revision: D4180012

fbshipit-source-id: 9e74f1fb43eaf7d4bb3e5ac6718d76bef2dfd731
2016-11-29 15:18:36 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
44509f9f91 fbsync: mostly lint changes, added mkl files 2016-10-11 22:45:06 -07:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00