pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Pritam Damania a2b4177c5b Add barrier() at the end of init_process_group and new_group. (#45181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181 `init_process_group` and `new_group` update a bunch of global variables after initializing the actual process group. As a result, there is a race that after initializing the process group on say rank 0, if we immediately check the default process group on rank 1 (say via RPC), we might actually get an error since rank 1 hasn't yet updated its _default_pg variable. To resolve this issue, I've added barrier() at the end of both of these calls. This ensures that once these calls return we are guaranteed about correct initialization on all ranks. Since these calls are usually done mostly during initialization, it should be fine to add the overhead of a barrier() here. #Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378 ghstack-source-id: 112923112 Test Plan: Reproduced the failures in https://github.com/pytorch/pytorch/issues/40434 and https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes the issue. Reviewed By: mrshenli Differential Revision: D23858025 fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830		2020-09-25 15:46:59 -07:00
..
no_python_abi_suffix_test	Allow building libraries with setuptools that dont have abi suffix (#14130 )	2018-11-27 17:35:53 -08:00
self_compiler_include_dirs_test	allow user passing relative paths in include_dirs within setuptools.setup (#38264 )	2020-05-13 20:00:12 -07:00
torch_test_cpp_extension	remediation of S205607	2020-07-17 17:19:47 -07:00
cpp_c10d_extension.cpp	Add barrier() at the end of init_process_group and new_group. (#45181 )	2020-09-25 15:46:59 -07:00
cpp_c10d_extension.hpp	[NCCL] Add timeout to ProcessGroup Work Wait (#40944 )	2020-07-16 10:56:58 -07:00
cpp_frontend_extension.cpp	Better tests/support for Python/C++ inter-op (#15193 )	2018-12-14 08:42:10 -08:00
cuda_extension_kernel.cu	Improve CUDA extension support (#5324 )	2018-02-23 10:15:30 -05:00
cuda_extension_kernel2.cu	Improve CUDA extension support (#5324 )	2018-02-23 10:15:30 -05:00
cuda_extension.cpp	Deprecate tensor.type() (#30281 )	2019-12-05 10:55:34 -08:00
cuda_extension.cu	Improve CUDA extension support (#5324 )	2018-02-23 10:15:30 -05:00
cudnn_extension.cpp	Use torch:: instead of at:: in all C++ APIs (#13523 )	2018-11-06 14:32:25 -08:00
doubler.h	Use torch:: instead of at:: in all C++ APIs (#13523 )	2018-11-06 14:32:25 -08:00
extension.cpp	allow user passing relative paths in include_dirs within setuptools.setup (#38264 )	2020-05-13 20:00:12 -07:00
jit_extension.cpp	Unify C++ API with C++ extensions (#11510 )	2018-09-24 14:44:21 -07:00
jit_extension2.cpp	Unify C++ API with C++ extensions (#11510 )	2018-09-24 14:44:21 -07:00
msnpu_extension.cpp	pull empty() out of use_c10_dispatcher: full (#43572 )	2020-08-26 22:51:06 -07:00
rng_extension.cpp	Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459 )	2020-06-26 16:12:54 -07:00
setup.py	allow user passing relative paths in include_dirs within setuptools.setup (#38264 )	2020-05-13 20:00:12 -07:00