Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773
Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format:
"<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0")
This arg merges the original `on` and `device` arg.
Original PR issue: RemoteDevice Format #46554
ghstack-source-id: 115448051
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule
Reviewed By: pritamdamania87
Differential Revision: D24482562
fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44254
Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.
Original PR issue: RemoteModule enhancements #40550
Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule
Reviewed By: pritamdamania87
Differential Revision: D23483803
fbshipit-source-id: 4918583c15c6a38a255ccbf12c9168660ab7f6db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43906
This method returns a list of RRefs of remote parameters that can be fed into the DistributedOptimizer.
Original PR issue: RemoteModule enhancements #40550
Test Plan: buck test caffe2/test/distributed/rpc:process_group_agent -- RemoteModule
Reviewed By: rohan-varma
Differential Revision: D23399586
fbshipit-source-id: 4b0f1ccf2e47c8a9e4f79cb2c8668f3cdbdff820
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40902
See the bottom of this stack for context.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D22360210
Pulled By: suo
fbshipit-source-id: 4275127173a36982ce9ad357aa344435b98e1faf
Summary:
Previously the module would log some data using `print()`. This can be
a problem when used in contexts where the process expects to write data to
stdout itself. This diff changes the log statements to use `logger` instead.
This makes it similar to other log statements in the same module.
Test Plan:
Confirmed no weird test showed up when running:
buck test caffe2/test/distributed/nn/api:remote_module_fork
Differential Revision: D22136172
fbshipit-source-id: a3d144eba6c75925ed684981793c84b36eb45a5d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40173
- Avoid path sharing across runs and workers, so even the test methods/workers run in parallel on the same host, they don't interfere with each other.
- On some environment (e.g. fb internal CI platform), the torch package file tree is not writable. But the temporary folder chosen by Python `tempfile` module is always writable, on linux it's "/tmp".
close https://github.com/pytorch/pytorch/issues/40120
ghstack-source-id: 106086340
Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator
buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \
buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_scripted_remote_module_template
buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \
buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_non_scripted_remote_module_template
```
```
buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork
```
Differential Revision: D5708493
fbshipit-source-id: dd92695682433aaf79d1912c7956cef40a450eaf