Commit Graph

7 Commits

Author SHA1 Message Date
Pritam Damania
82d81455ae [2/N] Remove unittest.skip across all of torch.distributed. (#61887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887

1) Introduced a `sandcastle_skip_if` decorator that ensures these
tests just get passed on sandcastle.
2) Fixed all test files under `test/distributed` to not use `unittest.skip`

Overall goal is to avoid using skips since sandcastle tags these tests as
continuously skipping.
ghstack-source-id: 134382237

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29784152

fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d
2021-07-27 10:53:23 -07:00
Yanli Zhao
85b97e449d [RFC]fix test_ddp_logging_data_cpu with tsan (#54465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54465

It is reported that there is data race issue when the test runs with tsan. The root cause is from 'model.frc1.double()' call. This is not because DistributedDataParallel() works together with 'model.frc1.double()'. If we remove DistributedDataParallel(), just call 'model.frc1.double(); model.frc2.double();', it complained the same data race issue.

I'm not sure how to do data type cast in this test without tsan complains, so removing this line of codes and mixed data type logging check.

Please kindly let me know if you have a better suggestion on how to do data type cast correctly

Test Plan: unit test

Reviewed By: SciPioneer

Differential Revision: D27249821

fbshipit-source-id: 0368157e11cbe7d15828dccca78271d89d502ec4
2021-04-13 11:20:43 -07:00
Xiang Gao
dfb5f029da Disable TF32 on DDP tests (#52941)
Summary:
When a system has an ampere and a non-ampere card, lots of tests will fail, because results on different cards are differnet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52941

Reviewed By: albanD

Differential Revision: D26994287

Pulled By: mrshenli

fbshipit-source-id: 287537495fc13361104a4460f5bcd79a208b5d8d
2021-03-11 18:31:28 -08:00
Hong Xu
1b35b1a0c4 Properly skip distributed tests when distributed module is not built (#52945)
Summary:
Currently there is some code that intends to skip distributed tests if
the distributed module is not built. However, they are missing in some
test files; and in some other test files they are checked after
distributed module is imported, which leads to failure.  This is
generating a lot of headaches when testing minimal builds locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52945

Reviewed By: anjali411

Differential Revision: D26848241

Pulled By: ezyang

fbshipit-source-id: 983a848844add40869a86f3c9413503a3659b115
2021-03-05 10:28:47 -08:00
Rong Rong
ef50c94e7c reenabling MPI test (#48725)
Summary:
fixes https://github.com/pytorch/pytorch/issues/47443.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48725

Reviewed By: mrshenli

Differential Revision: D25278758

Pulled By: walterddr

fbshipit-source-id: a02d0fef99a7941c8e98da16a45d840e12b8b0c3
2020-12-03 06:50:36 -08:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Rohan Varma
106459acac Rename test_distributed to test_distributed_fork (#42932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42932

Follow up from https://github.com/pytorch/pytorch/pull/41769, rename `test_distributed` to `test_distributed_fork` to make it explicit that it forks.

New command to run test:
`python test/run_test.py -i distributed/test_distributed_fork -v`
ghstack-source-id: 111632568

Test Plan: `python test/run_test.py -i distributed/test_distributed_fork -v`

Reviewed By: izdeby

Differential Revision: D23072201

fbshipit-source-id: 48581688b6c5193a309e803c3de38e70be980872
2020-09-08 23:13:37 -07:00