Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32749
The test was flaky since the message from owner RRef confirming fork would arrive after the test checked whether the pending User RRefs map was empty - leading to an assertion error. This diff creates a utility function that should be used by any test to wait for this message to complete processing before doing any assertions related to the pending User RRefs map.
GitHub Issue: https://github.com/pytorch/pytorch/issues/30988
Test Plan: Stress tested `test_rref_context_debug_info` 200 times.
Differential Revision: D19612289
fbshipit-source-id: 57a7c19b1cf792b94c263d3efbbbb6da60c07d07
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32656
Fixes these flaky tests.
Test Plan: Run the test 500 times and verify that it succeeds every time.
Differential Revision: D19584453
fbshipit-source-id: 07cbc4914211f274182ac0fa74bb5ef6d43392d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32633
There were 2 sources of current RPC agent.
- One is in Python world, `torch.distributedrpc.api._agent`.
- The other is in C++ world, `RpcAgent::defaultRpcAgent_`
Setting Python `_agent` to `None`, does not necessarily reset the C++ `defaultRpcAgent_` to `nullptr`.
i.e.
```
torch.distributedrpc.api._agent = None
```
does not translate to
```
RpcAgent::defaultRpcAgent_ = nullptr
```
This PR is to remove this ambiguity, and use the C++ pointer as source of truth.
The solution is to leverage a pybind11 behavior that it implicitly casts C++ `shared_ptr<RpcAgent>(nullptr)` to Python `None`.
ghstack-source-id: 97293315
Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_duplicate_name
buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info
```
```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_remote_module
buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_embedding
buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc
```
Differential Revision: D5733066
fbshipit-source-id: b3e6032ee975f19ca556497edbbf40b517b25be8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606