Commit Graph

44 Commits

Author SHA1 Message Date
anjali411
4bf076e964 Add __all__ to torch.distributed, futures, fx, nn, package, benchmark submodules (#80520)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80520
Approved by: https://github.com/rohan-varma
2022-07-08 14:31:24 +00:00
Rohan Varma
eeabab03e7 [DataParallel] Log API Usage for tracking (#66038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66038

Will help track workflows for DP deprecation. Tested via standalone DP
script.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31356975

fbshipit-source-id: c0a3ac3a1faed794e3362f3f3a19a6fb800587a7
2021-10-05 18:30:23 -07:00
Mike Guo
5b4c3a9da1 record Torch DP and DDP modules forward (#55578)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55578

Reviewed By: gdankel

Differential Revision: D27862392

Pulled By: ilia-cher

fbshipit-source-id: 18545d23e35a97c8f760707fecb696a24d47dc0a
2021-04-19 17:52:59 -07:00
lixinyu
870a5a0d6d Enable DataParallel to run zero input Module (#46565)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46565

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D24405275

Pulled By: glaringlee

fbshipit-source-id: a8baaf4cf227f7f21fc3b080a446f92f0effe18e
2020-10-22 18:04:33 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
chengjun
8d570bc708 Decouple DataParallel/DistributedDataParallel from CUDA (#38454)
Summary:
Decouple DataParallel/DistributedDataParallel from CUDA to support more device types.
- Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility
- Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space.
- Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs.

Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454

Differential Revision: D22051557

Pulled By: mrshenli

fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8
2020-07-07 12:48:16 -07:00
Xiang Gao
df8d6eeb19 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-20 20:06:37 -07:00
Shen Li
f62a006097 Retry Fix Python DataParallel RNN in no_grad mode (#21262)
Summary:
Retry #21197

The previous one failed because it uses some Python3 only syntax.

ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262

Differential Revision: D15598941

Pulled By: mrshenli

fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715
2019-06-03 08:04:35 -07:00
Karl Ostmo
aac424a6c4 Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode
Differential Revision:
D15577342

Original commit changeset: 1a024c572171

fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de
2019-06-01 22:17:19 -07:00
Shen Li
51ebbe970a Fix Python DataParallel RNN in no_grad mode (#21197)
Summary:
Fixes #21108

When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)).

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

apsdehal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197

Differential Revision: D15577342

Pulled By: mrshenli

fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c
2019-06-01 10:37:57 -07:00
Alexandr Morev
abc171bd53 Fix typo in docstring
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18216

Differential Revision: D14539824

Pulled By: ezyang

fbshipit-source-id: 490b72951a75f3f8b949a2d692d660a3693ee98a
2019-03-20 11:16:36 -07:00
ZhuBaohe
19a6de328f Correct docstring of vision/init functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17351

Differential Revision: D14276355

Pulled By: soumith

fbshipit-source-id: 9b572b6a04eeb1e44cd93961edac76ed10f7b24e
2019-03-01 11:40:23 -08:00
Tongzhou Wang
3d5968d366 Fix DataParallel(cpu_m).cuda() not working by checking at forward (#17363)
Summary:
Fixes #17362
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17363

Differential Revision: D14175151

Pulled By: soumith

fbshipit-source-id: 7b7e2335d553ed2133287deeaca3f6b6254aea4a
2019-02-22 08:31:36 -08:00
Shen Li
472cfc0f2c Enforce module device at DataParallel construction time (#17129)
Summary:
closes #17065

CC douwekiela
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129

Differential Revision: D14093353

Pulled By: mrshenli

fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82
2019-02-15 11:14:46 -08:00
Derek Kim
9cb41e5386 Enhance the documentation for torch.nn.DataParallel (#15993)
Summary:
I found a few sentences in DataParallel docstring confusing, so I suggest this enhancement.

- Arbitrary arguments are allowed to be passed .... *INCLUDING* tensors (Not *EXCLUDING*)
- The original author said that "other types" are shallow-copied but I think actually only some builtin types are (effectively) shallow-copied.  And "other types" are shared. Here is an example.

```python
import torch
from torch.nn import Module, DataParallel
from collections import deque

class MyModel(Module):
    def forward(self, x):
        x.append(None)

model = MyModel(); model.cuda()
model = DataParallel(model)

d = deque()
model.forward(d)
print(d)
```

This is a side note.

As far as I know, copying objects is not a specially frequent operation in python unlike some other languages. Notably, no copying is involved in assignment or function parameter passing. They are only name bindings and it is the whole point of "everything is object" python philosophy, I guess. If one keep this in mind, it may help you dealing with things like multithreading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15993

Differential Revision: D14020404

Pulled By: ezyang

fbshipit-source-id: a38689c94d0b8f77be70447f34962d3a7cd25e2e
2019-02-10 15:55:31 -08:00
Edward Yang
34cfbb0040 Typofix (#16800)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16800

Differential Revision: D13972592

Pulled By: ezyang

fbshipit-source-id: 45c352ac6090c8060bf75f44dec7205556986d88
2019-02-06 10:34:04 -08:00
Tongzhou Wang
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
Wei Yang
54107ae8cf convert output_device at data_parallel from torch.device to index (#10189)
Summary:
- fixes #9984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189

Differential Revision: D9545390

Pulled By: weiyangfb

fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e
2018-09-11 20:27:07 -07:00
Tongzhou Wang
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
Tongzhou Wang
c6a923f486
Support modules that output scalar in Gather (and data parallel) (#7973)
* Support modules that output scalar in Gather (and data parallel)

* Improve warning msg
2018-06-01 16:20:39 -04:00
Isaac Ge
537cb10525 improve DataParallel/DistributedDataParallel docs (#7407) 2018-05-09 10:30:42 +02:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
Tongzhou Wang
6b7ec95abb Link relevant FAQ section in DataLoader docs (#6476)
* Link FAQ section on workers returning same random numbers in DataLoader docs

* explicitly mention section names
2018-04-11 13:41:46 -04:00
Tongzhou Wang
4d15442ebc
Add total_length option to pad_packed_sequence (#6327)
* add total_length to pad_packed_sequence; add example on how to use pack->rnn->unpack with DP

* address comments

* fix typo
2018-04-08 20:25:48 -04:00
Carl Lemaire
6b95ca4eda DataParallel: GPU imbalance warning (#5376) 2018-02-27 21:30:41 +01:00
Kaiyu Shi
10fd272b7a Update doc of batch size requirements for DP (#5108)
* Update doc of batch size requirements for DP 

Fix #5039

* Delete the recommendation for batch size

There's no significant speed difference between divisible and indivisible batch size.
2018-02-26 00:55:08 -05:00
Richard Zou
cac3026b35 Fix typo in DataParallel docs (#5268) 2018-02-15 23:02:26 +01:00
Tongzhou Wang
805639906a Broacast output requires_grad if only corresponding input requires_grad (#5061) 2018-02-05 23:38:35 -05:00
Nintorac
2e42272cc1 Make DataParallel a no-op when CUDA not available (#3318) 2017-10-29 13:47:36 +01:00
SsnL
de1f4e69dd raw text (#3327) 2017-10-28 01:24:02 +05:30
Adam Paszke
421607a935 DataParallel device_ids slicing fixes (#2200) 2017-07-26 01:54:38 +05:30
Adam Paszke
dc17fb68e4 Fix minor bug in parallel_apply (#2193) 2017-07-25 03:45:00 +05:30
Adam Paszke
4af40e3471 Let parallel_apply accept arbitrary inputs 2017-07-20 01:45:57 -04:00
Adam Paszke
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
Soumith Chintala
e7f5220dfa device_ids can be None again in data_parallel (#1187) 2017-04-06 10:30:53 -04:00
Sam Gross
e50a1f19b3 Use streams in scatter to overlap copy with compute 2017-03-14 22:46:07 +01:00
Soumith Chintala
60736bdf99 fix corner case in kwargs for DataParallel (#930) 2017-03-05 14:27:52 -05:00
Christian Sarofeen
b1ae7f90d5 Added functionality for data parallel table (#843) 2017-03-05 02:35:46 +01:00
Eli Stevens
88275da5e8 CUDA documentation tweaks (#858) 2017-02-26 20:37:43 +01:00
Eli Stevens
b87c113cf4 CUDA documentation enhancement and docs versioning (#848)
* Add more detail to CUDA documentation

Also adds better cross-linking to the pages that discuss relevant topics.

* Adds recommendation to torch.save docs

* Make the version numbers for the docs dynamic

Might need tweaks for beta, 1.0, etc.
2017-02-26 08:33:26 -05:00
Adam Paszke
876202503f Support multiple inputs in data parallel 2017-02-20 23:28:31 -08:00
Natalia Gimelshein
7c44506441 allow DataParallel to have tuple inputs on a single GPU 2017-02-16 19:07:17 +01:00
Adam Paszke
d6fa3b3fd5 Deprecate nn.Container in favor of nn.Module 2017-01-16 19:07:37 -05:00
Sam Gross
ea728e7c5e Add DataParallel container (#268)
Adds a container version of the `data_parallel` function. This is a
drop-in replacement for the DataParallel class in the ImageNet example.
2016-11-29 16:36:01 -05:00