pytorch/torch/testing
Luca Wehrstedt 0128eb9a85 Fix TSAN issue in distributed tests (#59238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59238

Creating a `mutliprocessing.Manager()` launches a new process using the `fork` method (because it's the default one), and then in that subprocess it launches a new thread. TSAN really doesn't like this (and rightly so!) because we already had threads in the superprocess, and intermixing threads and forks is dangerous. The proper way to deal with this is to `exec` inside the child process or, in other words, use the `spawn` method.

Note that the method used to launch the Manager is entirely unrelated from the method used to launch our "own" subprocesses, hence we were using `fork` for the Manager even though we were using `spawn` for our own subprocesses.
ghstack-source-id: 130240724

Test Plan: Reverted the silencing introduced in D28490129, ran the `test_init_rpc_then_pg` test from the TensorPipe suite and saw the original TSAN failure. Then applied my fix, re-ran the test, and the failure was gone.

Reviewed By: zhaojuanmao

Differential Revision: D28794321

fbshipit-source-id: 12242e69be399a7f02a40a0ebb3d92f92e00ce73
2021-07-01 11:53:01 -07:00
..
_internal Fix TSAN issue in distributed tests (#59238) 2021-07-01 11:53:01 -07:00
__init__.py Un-ignore F403 in .flake8 (#55838) 2021-04-13 09:24:07 -07:00
_asserts.py add support for quantized tensors in torch.testing.assert_close (#58926) 2021-06-30 21:43:02 -07:00
_check_kernel_launches.py Paren-matching kernel launch check without external deps (#60778) 2021-06-28 10:18:04 -07:00
_core.py Modify error message when atol=0 and rtol=0 (#60897) 2021-06-29 14:17:02 -07:00