Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21740237
Pulled By: mruberry
fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21717199
Pulled By: mruberry
fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37990
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
https://github.com/pytorch/pytorch/pull/20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103885383
Test Plan: CI
Differential Revision: D21443879
fbshipit-source-id: 764d8681ca629056bfe2c260ffab47fa5bdf07ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38078
`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".
This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.
It is also the source of test failures in https://github.com/pytorch/pytorch/pull/37990.
This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103782583
Test Plan: CI
Differential Revision: D21466768
fbshipit-source-id: 53b5af36672ebd8b51ba8b42709d87e96cadef20
Summary:
When a subprocess terminates with an exception in a distributed test, log the process number as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37627
Differential Revision: D21366149
Pulled By: rohan-varma
fbshipit-source-id: 132c4b4c1eb336761c2be26d034d8b739ae19691
Summary:
This makes its wrappers stackable with `common_utils.TestCase` ones
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36826
Test Plan: CI
Differential Revision: D21178217
Pulled By: mrshenli
fbshipit-source-id: f80dd4aa175e20bd338b38b2c42c3118258f45dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36970
We would like to move all distributed testing to use the existing
multiprocessing tooling defined in common_distributed.py. With this change, we
make `TestDistBackend` inherit from `MultiProcessTestCase` and enable fork mode
multiprocessing. In the next step, we can enable spawn mode for these tests
which will give us TSAN coverage.
ghstack-source-id: 102553801
Test Plan: Unittests
Differential Revision: D21146947
fbshipit-source-id: 608fa2cb93e88f8de6a5ac87c523e2c4e4fede1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36542
Python 3.8 set the default multiprocessing start mode to spawn, but we
need fork in these tests, otherwise there are some pickling issues.
Test: Ensure that these tests succeed when run with python 3.8
ghstack-source-id: 102093824
Test Plan: Ensure success with python 3.8
Differential Revision: D21007753
fbshipit-source-id: 4b39844c6ba76a53293c0dfde7c98ec5a78fe113
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34150
In the distributed setting we commonly have tests in which there are errors where one process
exits but the other do not (since they are for example waiting for work from
the process that exited). Currently, when this situation happens we do not
handle this well, and wait for process 0 to timeout. This results in wasted
time waiting for test errors and a less helpful "Process 0 timed out..." error
message when the error was actually something else.
This diff fixes the issue by checking for exited subprocesses and terminating
the test when we see a subprocess that has exited uncleanly. We still enforce
timeouts and return when all processes have exited cleantly in the happy path.
ghstack-source-id: 99921462
Test Plan:
All distributed tests + tested by writing tests that should trigger
the unclean subprocess detection, and verified that we exit quickly instead of
waiting for the entire timeout.
Differential Revision: D20231032
fbshipit-source-id: 3e0d4a20925b7d1098ec4c40ffcc66845425dd62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606