pytorch/torch/lib/c10d
Max Wang 268859ce0d Fix CUDA stream syncing bug in allgather and reduce_scatter (#19631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19631
ghimport-source-id: edc47e77d6ef03e966944ff98eefc22f2574eeaa

Reviewed By: mrshenli

Differential Revision: D15110077

Pulled By: mxw

fbshipit-source-id: 27a68308ade5ea511e2ea568a071eedb5d21c1ba
2019-04-27 08:35:56 -07:00
..
bin Revert "remove use of tmp_install" (#15847) 2019-01-08 16:30:19 -08:00
example FileStore auto deletes file and FileStore::add bug fix (#13708) 2018-11-14 01:34:22 -08:00
test Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
CMakeLists.txt Remove GLOO usage when USE_GLOO is OFF 2019-03-20 09:31:53 -07:00
FileStore.cpp Canonicalize all includes in PyTorch. (#14849) 2018-12-08 19:38:30 -08:00
FileStore.hpp FileStore auto deletes file and FileStore::add bug fix (#13708) 2018-11-14 01:34:22 -08:00
NCCLUtils.hpp Working async version of AllGather, test fix and compiler warnings, and CI (#10932) 2018-08-28 12:40:14 -07:00
PrefixStore.cpp Canonicalize all includes in PyTorch. (#14849) 2018-12-08 19:38:30 -08:00
PrefixStore.hpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
ProcessGroup.cpp Fix a few instances of notifying on a CV while holding the lock (#18857) 2019-04-05 08:41:53 -07:00
ProcessGroup.hpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
ProcessGroupGloo.cpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
ProcessGroupGloo.hpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
ProcessGroupMPI.cpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
ProcessGroupMPI.hpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
ProcessGroupNCCL.cpp Fix CUDA stream syncing bug in allgather and reduce_scatter (#19631) 2019-04-27 08:35:56 -07:00
ProcessGroupNCCL.hpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
README.md Revert "remove use of tmp_install" (#15847) 2019-01-08 16:30:19 -08:00
Store.cpp Make Store::setTimeout take milliseconds (#16278) 2019-01-29 16:15:25 -08:00
Store.hpp Make Store::setTimeout take milliseconds (#16278) 2019-01-29 16:15:25 -08:00
TCPStore.cpp TCP init method race condition fix (#15684) 2019-01-18 02:29:38 -08:00
TCPStore.hpp TCP init method race condition fix (#15684) 2019-01-18 02:29:38 -08:00
Types.hpp Add support for reduce-scatter in c10d (#18844) 2019-04-26 13:46:57 -07:00
Utils.cpp Fix c10d checking errno unconditionally (#15986) 2019-01-14 16:02:05 -08:00
Utils.hpp Propagate ProcessGroup timeout to Store (#16571) 2019-04-09 12:36:28 -07:00

THD refactor

This is a work in progress. It is separate from the main THD directory to avoid disrupting THD users or have to deal with backwards compat early on. Once this gets to a usable state, we'll add Python bindings and a compat layer.

See https://github.com/pytorch/pytorch/issues/7434 for the main issue.

This tree is intentionally not part of the main build and will be buildable/testable in isolation, as long as ATen is available in <repository root>/torch/lib/tmp_install.

To build and install ATen here, navigate to the root of this repository and run:

tools/build_pytorch_libs.sh --with-cuda ATen