Commit Graph

21 Commits

Author SHA1 Message Date
Richard Barnes
d9ea93181b Some types for remote_module (#58012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58012

Test Plan: Sandcastle

Reviewed By: SciPioneer

Differential Revision: D28334611

fbshipit-source-id: 5e4645a7de65e064cb6a919cdc2372151ec48d44
2021-05-11 16:43:55 -07:00
Yi Wang
4db88307d9 [RPC Framework] Add a link to the tutorial in RemoteModule docstring (#57875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57875

This tutorial combines DDP and RemoteModule.
ghstack-source-id: 128482681

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D28305382

fbshipit-source-id: 572e1ec4b4aa00735fff16a6ce6ae4c7cad0b27f
2021-05-07 19:42:27 -07:00
Yi Wang
74d493cc07 [RPC Framework] Support passing RemoteModule as an arg (#57695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57695

Add pickling/unpickling support for `RemoteModule`.

#Closes: https://github.com/pytorch/pytorch/issues/57516
ghstack-source-id: 128472946

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_with_a_new_attribute_over_the_wire

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: rohan-varma

Differential Revision: D28233108

fbshipit-source-id: 94eea2251fa53fb71912457c80d0a1e44504fc85
2021-05-07 19:41:17 -07:00
Yi Wang
5c7e35c689 [RPC Framework] Clang-format remote_module.py and instantiator.py (#57414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57414

ghstack-source-id: 127927609

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D28138870

fbshipit-source-id: 04894abaf2e713dc559cd9795197f85539b25e17
2021-05-03 20:28:51 -07:00
Yi Wang
4143483d95 [RPC Framework] Create a separate remote module template when moving CPU tensors to a cuda device is not enabled (#57413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57413

An internal test fails because somehow `Tuple[()]` is not considered compatible with `Tuple[Any]` in TorchScript, even if the code that involves this type of variables is not executed at all.

Therefore, create separate templates for instantiation to avoid typing check failure. This can address the FIXME left in https://github.com/pytorch/pytorch/pull/57288

#Closes: https://github.com/pytorch/pytorch/issues/51670

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule -j 1

buck test mode/dev-nosan caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- test_load_di_parts

Reviewed By: wanchaol

Differential Revision: D28138864

fbshipit-source-id: 39e3e67b0c3979b607ff104d84b4fb1070ffefd6
2021-05-03 19:10:24 -07:00
Yi Wang
13dbb77b7a [RPC Framework] Enable RemoteModule to directly send GPU tensors over the wire on TensorPipe RPC backend if a device map is provided (#57288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57288

If the device map provided by RemoteModue is not empty, then TensorPipe RPC backend can support directly sending GPU tensors over the wire.

Also add pybind of `_get_device_map`.

The changes in unit test setup is separated out as a follow-up PR, as currently it breaks some tests in `distributed/rpc/test_faulty_agent.py`.

Still need to fix test_load_di_parts in `torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test`. Currently an early return is used to bypass this test failure.

#Original PR issue: https://github.com/pytorch/pytorch/issues/51670

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device_script

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule -j 1

CAUTION: This one actually fails and now it is bypassed. See FIXME in `_remote_forward`.
buck test mode/dev-nosan caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- test_load_di_parts

Reviewed By: wanchaol

Differential Revision: D28021672

fbshipit-source-id: a89245dc35e1d9479811ec6f98d9f34116837d79
2021-04-30 18:04:45 -07:00
Jerry Zhang
0a541e23e1 [nn] Add allow_duplicate option for named_modules (#54812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54812

Needed for quantization since different attribute might refer to the same module instance

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D27408376

fbshipit-source-id: cada85c4a1772d3dd9502c3f6f9a56d690d527e7
2021-04-16 01:26:16 -07:00
Nikita Shulga
add49e7e4e Enforce PEP263 for PyTorch python codebase (#55346)
Summary:
All python files containing non-ASCII characters should be correctly annotated with `# -*- coding: utf-8 -*-` comment

Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346

Reviewed By: samestep

Differential Revision: D27582044

Pulled By: malfet

fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44
2021-04-06 18:31:38 -07:00
Pritam Damania
f612d4eb58 Add 'remote_parameters' and 'get_module_rref' to RemoteModule docs. (#54645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54645

Had to replace RRef[..] with just RRef in the return signature since
sphynx seemed to completely mess up rendering RRef[..]
ghstack-source-id: 125024783

Test Plan: View locally.

Reviewed By: SciPioneer

Differential Revision: D27314609

fbshipit-source-id: 2dd9901e79f31578ac7733f79dbeb376f686ed75
2021-03-26 21:41:28 -07:00
Pritam Damania
59c0c19be2 Add RemoteModule to master RPC docs. (#53084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53084

Adding RemoteModule to master RPC docs since it is a prototype
feature.
ghstack-source-id: 122816689

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D26743372

fbshipit-source-id: 00ce9526291dfb68494e07be3e67d7d9c2686f1b
2021-03-03 13:52:11 -08:00
chengjun
4a8ef4525e Add new backend type for Intel heterogeneous computation platform. (#49786)
Summary:
Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc.

https://github.com/pytorch/pytorch/issues/48246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786

Reviewed By: mrshenli

Differential Revision: D25893962

Pulled By: ezyang

fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052
2021-01-20 08:15:18 -08:00
Guilherme Leobas
5f8e1a1da9 add type annotations to torch.nn.modules.module (#49045)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49045

Reviewed By: malfet

Differential Revision: D25767092

Pulled By: walterddr

fbshipit-source-id: a81ba96f3495943af7bb9ee3e5fc4c94c690c405
2021-01-11 17:01:47 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Yi Wang
2b1057b0cf [RPC Framework] Support retrieving the RRef to the remote module (#48983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48983

Expose an API for users to retrieve the RRef for the underlying module.

This would be useful if users would like to run custom code on the remote end for the nn.Module.

Original PR issue: RemoteModule enhancements #40550
ghstack-source-id: 118378601

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D25386042

fbshipit-source-id: 2dff33e8d5c9770be464eacf0b26c3e82f49a943
2020-12-10 23:53:44 -08:00
Xu Zhao
7f66fa62ca Fix typing errors in torch.distributed.nn.* directory. (#47533)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47533

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D24952500

Pulled By: xuzhao9

fbshipit-source-id: 8e66784fd8f9f111b6329e0bb48d6cd61c690a4a
2020-11-16 23:27:55 -08:00
Yi Wang
cab32d9cdf [RPC Framework] Support remote device format "<workername>/<device>" (#46773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773

Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format:
"<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0")

This arg merges the original `on` and `device` arg.

Original PR issue: RemoteDevice Format #46554
ghstack-source-id: 115448051

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D24482562

fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1
2020-10-29 00:14:56 -07:00
Yi Wang
c68cc78299 Add a device parameter to RemoteModule (#44254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44254

Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D23483803

fbshipit-source-id: 4918583c15c6a38a255ccbf12c9168660ab7f6db
2020-09-18 10:31:03 -07:00
Yi Wang
396469f18c Explicitly forbidden the other inherited methods of RemoteModule. (#43895)
Summary:
Throw exceptions when the methods except for forwardXXX are used.

Original PR issue: RemoteModule enhancements #40550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43895

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: rohan-varma

Differential Revision: D23392842

Pulled By: SciPioneer

fbshipit-source-id: 7c09a55a03f9f0b7e9f9264a42bfb907607f4651
2020-09-05 14:48:56 -07:00
Yi Wang
8b17fd2516 Add remote_parameters() into RemoteModule class. (#43906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43906

This method returns a list of RRefs of remote parameters that can be fed into the DistributedOptimizer.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: rohan-varma

Differential Revision: D23399586

fbshipit-source-id: 4b0f1ccf2e47c8a9e4f79cb2c8668f3cdbdff820
2020-09-04 16:22:40 -07:00
wudenggang
9600ed9af3 typo fixes (#41632)
Summary:
typo fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632

Reviewed By: ezyang

Differential Revision: D22617827

Pulled By: mrshenli

fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f
2020-07-20 07:23:00 -07:00
Shihao Xu
00651b8c93 [distribtued.nn] Implement TorchScript-compatible RemoteModule API (#37139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37139

See design doc in https://github.com/pytorch/pytorch/issues/37136

ghstack-source-id: 105926270

Test Plan:
TODO:

- Make the generated Interface usable. https://github.com/pytorch/pytorch/pull/37139#discussion_r434190978
-
- Avoid generating the same template instances for Module that is not scriptable.
- Remove "infer_module_interface_cls".
- Use Python format instead of a CodeTemplate
- Use Python tempfile to track and delete file. Does it work if there is crash.

```
buck test mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator

buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \
buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_scripted_remote_module_template

buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \
buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_non_scripted_remote_module_template
```

```
buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_spawn
```

```
buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork

buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \
buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_user_provided_global_unique_name

buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \
buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_async_script

buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \
buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_sync_script

buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \
buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_with_kwargs

buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \
buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_user_provided_global_unique_name
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

buck test mode/opt-asan //caffe2/test:jit -- 'test_script_forward_method_replacement

buck build mode/dev-nosan //caffe2/test:jit && \
buck-out/gen/caffe2/test/jit\#binary.par -r 'test_script_forward_method_replacement'

buck build mode/dev-nosan //caffe2/test:jit && \
buck-out/gen/caffe2/test/jit\#binary.par -r 'test_imported_classes'

Differential Revision: D20499658

fbshipit-source-id: dd9383ae4eb2343366c11127664f845b91ca3b0a
2020-06-15 19:07:35 -07:00