pytorch/torch/csrc/distributed/c10d
Yuxin Wu c8ed84ad06 Fix a static initialization order fiasco in c10d (#90149)
The `TORCH_LIBRARY_IMPL` registrations in `OpsImpl.cpp` needs to happen after `ProcessGroup` is registered as a torch class -- which happens in `Ops.cpp`. However, the order of the registrations is undefined between the two files.

If the registration in `OpsImpl.cpp` runs before `Ops.cpp`, we get a crash at program launch similar to #83255 . This happens in our internal build.

This PR moves `OpsImpl.cpp` to the end of `Oops.cpp`. Because according to the omniscient lord of chatGPT:
<img width="600" alt="2022-12-04_19-25" src="https://user-images.githubusercontent.com/1381301/205542847-3535b319-3c2a-4e8e-bc11-27913f6afb39.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90149
Approved by: https://github.com/kwen2501, https://github.com/H-Huang, https://github.com/soumith
2022-12-12 08:21:54 +00:00
..
quantization Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Backend.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Backend.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
c10d.h
comm.cpp [BE]fix DDP when the number of output features is zero (#87793) 2022-11-01 15:27:40 +00:00
comm.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
debug.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
debug.h Enable NCCL_DESYNC_DEBUG when TORCH_DISTRIBUTED_DEBUG=DETAIL (#83881) 2022-08-23 17:57:16 +00:00
default_comm_hooks.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
default_comm_hooks.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
error.h
exception.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
exception.h
FileStore.cpp Add reference counter in FileStore (#85601) 2022-10-07 17:59:29 +00:00
FileStore.hpp Add reference counter in FileStore (#85601) 2022-10-07 17:59:29 +00:00
GlooDeviceFactory.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
GlooDeviceFactory.hpp
HashStore.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
HashStore.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
init.cpp [21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813) 2022-12-08 23:39:26 +00:00
logger.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
logger.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
logging.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
logging.h
NCCLUtils.cpp Adopt ncclRemoteError (#85887) 2022-09-30 09:17:49 +00:00
NCCLUtils.hpp Add torch.distributed.DistBackendError exception type, thrown from C10D_NCCL_CHECK (#88134) 2022-11-08 13:26:42 +00:00
Ops.cpp Fix a static initialization order fiasco in c10d (#90149) 2022-12-12 08:21:54 +00:00
Ops.hpp [21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813) 2022-12-08 23:39:26 +00:00
ParamCommsUtils.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ParamCommsUtils.hpp Enable capturing of comm collective parameters (#98) (#85368) 2022-10-11 04:38:26 +00:00
PrefixStore.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
PrefixStore.hpp [BE] Store helper functions C++ for python API parity (#82136) 2022-10-12 17:49:38 +00:00
ProcessGroup.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroup.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupGloo.cpp [Docs] Remove outdated comment for sparse all-reduce (#87018) 2022-10-17 21:17:07 +00:00
ProcessGroupGloo.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupMPI.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupMPI.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupNCCL.cpp [c10d] Implement __instancecheck__ for c10d::ReduceOp (#88275) 2022-11-15 13:21:41 +00:00
ProcessGroupNCCL.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupRoundRobin.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupRoundRobin.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupUCC.cpp [UCC] Properly finalize unsuccessful collective posts (#89306) 2022-12-01 23:01:45 +00:00
ProcessGroupUCC.hpp Add sequence number support for UCC (#85047) 2022-10-31 03:56:55 +00:00
ProcessGroupWrapper.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
ProcessGroupWrapper.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
PyProcessGroup.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
python_comm_hook.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
python_comm_hook.h Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
reducer_cuda.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
reducer_timer.hpp
reducer.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
reducer.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
sequence_num.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
sequence_num.hpp
socket.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
socket.h Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Store.cpp [BE] Store helper functions C++ for python API parity (#82136) 2022-10-12 17:49:38 +00:00
Store.hpp [BE] Store helper functions C++ for python API parity (#82136) 2022-10-12 17:49:38 +00:00
TCPStore.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
TCPStore.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
TraceUtils.h Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Types.hpp [ReduceOp] ameliorate custom __eq__ (#90088) 2022-12-06 05:13:50 +00:00
UCCForNCCL.hpp
UCCTracing.cpp Enable capturing of comm collective parameters (#98) (#85368) 2022-10-11 04:38:26 +00:00
UCCTracing.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
UCCUtils.cpp Fix typos in messages under torch (#88961) 2022-11-14 19:06:41 +00:00
UCCUtils.hpp [UCC] Properly finalize unsuccessful collective posts (#89306) 2022-12-01 23:01:45 +00:00
UnixSockUtils.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Utils.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Utils.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
WinSockUtils.hpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Work.cpp Refactor distribuetd to use absolute header path (#85780) 2022-09-30 05:13:50 +00:00
Work.hpp [2/N] [Dispatchable Collectives] Extract ProcessGroup::Work into a separate class and update references (#83680) 2022-09-14 13:05:58 +00:00