Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887
1) Introduced a `sandcastle_skip_if` decorator that ensures these
tests just get passed on sandcastle.
2) Fixed all test files under `test/distributed` to not use `unittest.skip`
Overall goal is to avoid using skips since sandcastle tags these tests as
continuously skipping.
ghstack-source-id: 134382237
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D29784152
fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54465
It is reported that there is data race issue when the test runs with tsan. The root cause is from 'model.frc1.double()' call. This is not because DistributedDataParallel() works together with 'model.frc1.double()'. If we remove DistributedDataParallel(), just call 'model.frc1.double(); model.frc2.double();', it complained the same data race issue.
I'm not sure how to do data type cast in this test without tsan complains, so removing this line of codes and mixed data type logging check.
Please kindly let me know if you have a better suggestion on how to do data type cast correctly
Test Plan: unit test
Reviewed By: SciPioneer
Differential Revision: D27249821
fbshipit-source-id: 0368157e11cbe7d15828dccca78271d89d502ec4
Summary:
When a system has an ampere and a non-ampere card, lots of tests will fail, because results on different cards are differnet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52941
Reviewed By: albanD
Differential Revision: D26994287
Pulled By: mrshenli
fbshipit-source-id: 287537495fc13361104a4460f5bcd79a208b5d8d
Summary:
Currently there is some code that intends to skip distributed tests if
the distributed module is not built. However, they are missing in some
test files; and in some other test files they are checked after
distributed module is imported, which leads to failure. This is
generating a lot of headaches when testing minimal builds locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52945
Reviewed By: anjali411
Differential Revision: D26848241
Pulled By: ezyang
fbshipit-source-id: 983a848844add40869a86f3c9413503a3659b115
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42932
Follow up from https://github.com/pytorch/pytorch/pull/41769, rename `test_distributed` to `test_distributed_fork` to make it explicit that it forks.
New command to run test:
`python test/run_test.py -i distributed/test_distributed_fork -v`
ghstack-source-id: 111632568
Test Plan: `python test/run_test.py -i distributed/test_distributed_fork -v`
Reviewed By: izdeby
Differential Revision: D23072201
fbshipit-source-id: 48581688b6c5193a309e803c3de38e70be980872