pytorch/torch/lib/c10d
Pieter Noordhuis 5d4624a1d9 Fix return temporary as reference in MPI backend (#11947)
Summary:
The MPI async work class returned a temporary as reference, which is
invalid (hat tip to colesbury for noticing it). This change fixes that and
uses a std::exception_ptr to hold on to the exception if applicable, and
then returns the reference by throwing it and returning it, like the
existing code path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947

Differential Revision: D10019928

Pulled By: pietern

fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019
2018-09-24 20:17:38 -07:00
..
bin Process group base class and Gloo implementation (#7628) 2018-05-23 09:02:18 -07:00
cmake Adding python binding for MPI process group (#10199) 2018-08-14 15:56:33 -07:00
example Working async version of AllGather, test fix and compiler warnings, and CI (#10932) 2018-08-28 12:40:14 -07:00
private Update include paths to use c10d prefix everywhere 2018-07-12 17:55:22 -07:00
test Support torch.distributed.barrier in gloo backend 2018-09-20 09:25:59 -07:00
CMakeLists.txt Make C10d support CPU only build (#11513) 2018-09-11 22:10:34 -07:00
CUDAUtils.cpp Fix some more warnings (#11257) 2018-09-05 11:10:27 -07:00
CUDAUtils.hpp Fix some more warnings (#11257) 2018-09-05 11:10:27 -07:00
Def.hpp Add c10d/Def.hpp placeholder (#8711) 2018-06-20 15:03:58 -07:00
FileStore.cpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
FileStore.hpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
NCCLUtils.hpp Working async version of AllGather, test fix and compiler warnings, and CI (#10932) 2018-08-28 12:40:14 -07:00
PrefixStore.cpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
PrefixStore.hpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
ProcessGroup.cpp Process group base class and Gloo implementation (#7628) 2018-05-23 09:02:18 -07:00
ProcessGroup.hpp Add message tag parameter to send/recv 2018-09-14 10:55:37 -07:00
ProcessGroupGloo.cpp Get rid of most usages of Type.tensor. (#12002) 2018-09-24 10:16:18 -07:00
ProcessGroupGloo.hpp Defer lazyInitCUDA() until needed (#11893) 2018-09-20 12:12:42 -07:00
ProcessGroupMPI.cpp Fix return temporary as reference in MPI backend (#11947) 2018-09-24 20:17:38 -07:00
ProcessGroupMPI.hpp Fix return temporary as reference in MPI backend (#11947) 2018-09-24 20:17:38 -07:00
ProcessGroupNCCL.cpp Add message tag parameter to send/recv 2018-09-14 10:55:37 -07:00
ProcessGroupNCCL.hpp Add message tag parameter to send/recv 2018-09-14 10:55:37 -07:00
README.md Port interface of store base class from Caffe2 (#7439) 2018-05-10 16:04:19 -07:00
Store.cpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
Store.hpp bumping up the default store timeout (#11409) 2018-09-07 23:55:23 -07:00
TCPStore.cpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
TCPStore.hpp Adding setTimeout option in Store (#11265) 2018-09-06 12:55:50 -07:00
Types.hpp Support torch.distributed.barrier in gloo backend 2018-09-20 09:25:59 -07:00
Utils.cpp Run clang-format on c10d (#7791) 2018-05-23 11:26:35 -07:00
Utils.hpp Get rid of most usages of Type.tensor. (#12002) 2018-09-24 10:16:18 -07:00

THD refactor

This is a work in progress. It is separate from the main THD directory to avoid disrupting THD users or have to deal with backwards compat early on. Once this gets to a usable state, we'll add Python bindings and a compat layer.

See https://github.com/pytorch/pytorch/issues/7434 for the main issue.

This tree is intentionally not part of the main build and will be buildable/testable in isolation, as long as ATen is available in <repository root>/torch/lib/tmp_install.

To build and install ATen here, navigate to the root of this repository and run:

tools/build_pytorch_libs.sh --with-cuda ATen