pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Yi Wang 5b6818f08a [Model Averaging] Enforce a synchronization before allreduce parameters (#60891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60891 This fix is particularly useful for local SGD when the averaging period is very small, which may cause the conflict between gradient allreduce within per-machine subgroup and the global parameter allreduce by the communication world. ghstack-source-id: 132564252 Test Plan: f281873295 (#Try1) failed due to the conflict between global process group and subgroup. ``` <Thread(configerator-monitor-singleton, started 139839806633728)> File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 870, in run self._target(self._args, *self._kwargs) File "/tmp/jetter.gson7tr3/configerator/client.py", line 348, in _monitor_loop self._parent_thread.join(self._interval_ms / 1000) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1015, in join self._wait_for_tstate_lock(timeout=max(timeout, 0)) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock elif lock.acquire(block, timeout): ``` Fixed after adding an explicit sync: f282044866, f282241800 Reviewed By: rohan-varma Differential Revision: D29434597 fbshipit-source-id: a4f777fc26f379639f85fda32de425cd3b337b33		2021-06-29 01:39:40 -07:00
..
ddp_comm_hooks	[Reland][Gradient Compression] Apply division first to avoid overflow (#59576 )	2021-06-08 10:03:21 -07:00
model_averaging	[Model Averaging] Enforce a synchronization before allreduce parameters (#60891 )	2021-06-29 01:39:40 -07:00
__init__.py	[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )	2020-11-06 00:28:09 -08:00