[PT-D][BE] Fix DDP no_sync() test logic (#72348)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72348

**Overview**
#43307 changed `_test_accumulate_gradients_no_sync()` to add a `num_iters` argument. However, I think the change misconstrued the test logic slightly.

61ab04e1db/torch/testing/_internal/distributed/distributed_test.py (L4369-L4397)

- `iteration % num_iters == 0` evaluates to `True` only for `iteration == 0` since `iteration` comes from `for iteration in `range(num_iters)`.
- IIUC, the intention is to alternate between accumulating gradients (using `no_sync()`) and synchronizing gradients normally. In the existing implementation, any iterations following the second one are non-productive since gradients are in sync, meaning it reduces to testing normal DDP.
- This PR changes the check back to `iteration % 2 == 0` to restore the alternating behavior.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D34011559

Pulled By: awgu

fbshipit-source-id: 4ba771e45b28a343167a324462571e4b8e25ae72
This commit is contained in:
Andrew Gu 2022-02-07 09:49:45 -08:00 committed by Facebook GitHub Bot
parent 6a5a1b0c3f
commit 8492a8b803

View File

@ -4376,7 +4376,7 @@ class DistributedTest:
rank * local_batch_size : (rank + 1) * local_batch_size
]
if iteration % num_iters == 0:
if iteration % 2 == 0:
# accumulate grads locally
with ddp_model.no_sync():
step_model(ddp_model, ddp_input, ddp_target)
@ -4387,7 +4387,7 @@ class DistributedTest:
for i, j in zip(model.parameters(), ddp_model.parameters()):
if not i.requires_grad:
continue
if iteration % num_iters == 0:
if iteration % 2 == 0:
self.assertNotEqual(i.grad, j.grad)
else:
self.assertEqual(i.grad, j.grad)