Commit Graph

6 Commits

Author SHA1 Message Date
Pritam Damania
2d671ca41b [8/N] Remove c10d/ddp fork tests. (#63454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513
2021-08-20 12:23:18 -07:00
Pritam Damania
d565a7bd68 [6/N] Enable opt-asan for elastic and launcher tests. (#63442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63442

Continuation of https://github.com/pytorch/pytorch/pull/62051, I've
enabled elastic and launcher tests to run in opt-asan mode which is supported
with spawn multiprocessing.

This allows us to completely get rid of fork based tests from torch.distributed
and have all tests run in spawn mode.
ghstack-source-id: 136057123

Test Plan: waitforbuildbot

Reviewed By: cbalioglu

Differential Revision: D30384267

fbshipit-source-id: ad3447cfb9d6e31e7ec8332d64c8ff1054858dcb
2021-08-18 10:48:49 -07:00
Pritam Damania
82d81455ae [2/N] Remove unittest.skip across all of torch.distributed. (#61887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887

1) Introduced a `sandcastle_skip_if` decorator that ensures these
tests just get passed on sandcastle.
2) Fixed all test files under `test/distributed` to not use `unittest.skip`

Overall goal is to avoid using skips since sandcastle tags these tests as
continuously skipping.
ghstack-source-id: 134382237

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29784152

fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d
2021-07-27 10:53:23 -07:00
Aliaksandr Ivanou
060e4c96ee Torchelastic: forbid mp tests running with *san (#56827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56827

The diff makes sure that mp tests are not executed in modes that allow *san, since python mp does not behave well with tsan and asan.

Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/launcher/... -- --run-disabled

Reviewed By: cbalioglu

Differential Revision: D27976626

fbshipit-source-id: 7747d67687fa0fd095f799b3708038f672119e73
2021-04-23 17:55:26 -07:00
Aliaksandr Ivanou
8f663170bd [17/n][torch/elastic] Make torchelastic launcher compatible with the caffe2.distributed.launch (#55687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55687

The diff makes sure that users can transfer the following parameters:
* master_addr
* master_port
* node_rank
* use_env

The diff implement StaticTCPRendezvous that creates a store with listener on agent rank #0

The diff modifies caffe2/rendezvous: If the worker process launched with torchelastic agent, the worker processes will create a PrefixStore("worker/") from TCPStore without listener.

The diff adds macros functionality to torch/distributed/ealstic/utils that helps to resolve local_rank parameter.

Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/test:launch_test

Reviewed By: cbalioglu, wilson100hong

Differential Revision: D27643206

fbshipit-source-id: 540fb26feac322cc3ec0a989fe53324755ccc4ea
2021-04-14 19:33:26 -07:00
Aliaksandr Ivanou
960b40156c [6/n][torch/elastic][upstream] Move torchelastic/distributed/api to torch/distributed/elastic/launchers/api (#55471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55471

Move torchelastic/distributed/api to torch/distributed/elastic/launchers/api

Test Plan:
buck test mode/dev-nosan //pytorch/elastic/torchelastic/...
    buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/...

SyncSGD: tsm_aivanou-SparseNNApplication_432fc009

f263322216

Reviewed By: wilson100hong

Differential Revision: D27614353

fbshipit-source-id: a3b58fac2ebf803b8da5852ae2be0851b1cca695
2021-04-08 12:30:25 -07:00