mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
[PT-D][BE] Fix DDP no_sync() test logic (#72348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72348
**Overview**
#43307 changed `_test_accumulate_gradients_no_sync()` to add a `num_iters` argument. However, I think the change misconstrued the test logic slightly.
61ab04e1db/torch/testing/_internal/distributed/distributed_test.py (L4369-L4397)
- `iteration % num_iters == 0` evaluates to `True` only for `iteration == 0` since `iteration` comes from `for iteration in `range(num_iters)`.
- IIUC, the intention is to alternate between accumulating gradients (using `no_sync()`) and synchronizing gradients normally. In the existing implementation, any iterations following the second one are non-productive since gradients are in sync, meaning it reduces to testing normal DDP.
- This PR changes the check back to `iteration % 2 == 0` to restore the alternating behavior.
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision: D34011559
Pulled By: awgu
fbshipit-source-id: 4ba771e45b28a343167a324462571e4b8e25ae72
This commit is contained in:
parent
6a5a1b0c3f
commit
8492a8b803
|
|
@ -4376,7 +4376,7 @@ class DistributedTest:
|
|||
rank * local_batch_size : (rank + 1) * local_batch_size
|
||||
]
|
||||
|
||||
if iteration % num_iters == 0:
|
||||
if iteration % 2 == 0:
|
||||
# accumulate grads locally
|
||||
with ddp_model.no_sync():
|
||||
step_model(ddp_model, ddp_input, ddp_target)
|
||||
|
|
@ -4387,7 +4387,7 @@ class DistributedTest:
|
|||
for i, j in zip(model.parameters(), ddp_model.parameters()):
|
||||
if not i.requires_grad:
|
||||
continue
|
||||
if iteration % num_iters == 0:
|
||||
if iteration % 2 == 0:
|
||||
self.assertNotEqual(i.grad, j.grad)
|
||||
else:
|
||||
self.assertEqual(i.grad, j.grad)
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user