Commit Graph

13 Commits

Author SHA1 Message Date
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00
Mike Ruberry
fe276d541e Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix)
Test Plan: revert-hammer

Differential Revision:
D20541921

Original commit changeset: abb5488dca86

fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4
2020-03-19 22:39:12 -07:00
Michael Carilli
991b97277a [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011)
Summary:
https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.

The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.

Original description of https://github.com/pytorch/pytorch/pull/32140:
> Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

> In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011

Differential Revision: D20541921

Pulled By: ezyang

fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5
2020-03-19 20:18:18 -07:00
Jeff Daily
35d9874a35 in test_data_parallel.py, remove skipIfRocm from tests that pass (#34978)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34978

Differential Revision: D20535920

Pulled By: mrshenli

fbshipit-source-id: 3baa8608dd3b0dd5578bc32e56a2e6c1fe69492d
2020-03-19 09:16:43 -07:00
Edward Yang
d0577e19f0 Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only
Test Plan: revert-hammer

Differential Revision:
D20346700

Original commit changeset: 12d77b391731

fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60
2020-03-18 11:42:51 -07:00
Michael Carilli
aaa8f02156 Eager autocasting, out-of-place ops only (#32140)
Summary:
Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests.  Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140

Differential Revision: D20346700

Pulled By: ezyang

fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f
2020-03-18 10:28:21 -07:00
Natalia Gimelshein
cd9d9a2235 fix handling of replica parameters in DataParallel (#33907)
Summary:
In DataParallel, replica parameters are not leaves (because they are computed via broadcast from master parameters), and should be treated as such. Fixes https://github.com/pytorch/pytorch/issues/33552
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33907

Differential Revision: D20150199

Pulled By: ngimel

fbshipit-source-id: 5965d4115b6b3a8433063126ff6269567872fbeb
2020-03-10 10:35:44 -07:00
iurii zdebskyi
908eee5583 remove .data from test/distributed/ (#33874)
Summary:
`.data` calls are unsafe and should not be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33874

Differential Revision: D20141059

Pulled By: izdeby

fbshipit-source-id: 8e11afc74f0cb04f5b18b458068fb813a6d51708
2020-02-27 13:14:29 -08:00
Pritam Damania
fd684cc312 Use torch.set_default_dtype in test_data_parallel and rename dtype2prec (#32962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32962

As per gchanan's comments on
https://github.com/pytorch/pytorch/pull/30445, I've used
`torch.set_default_dtype` in test_data_parallel instead of specifying
dtype=torch.double everywhere. Also, renamed dtype2prec to dtype2prec_DONTUSE
ghstack-source-id: 98388429

Test Plan: waitforbuildbot

Differential Revision: D19714374

fbshipit-source-id: eb55bbca33881625636ba9ea6dd4cb692f25668e
2020-02-15 14:07:54 -08:00
Peter Bell
efba630287 Issue a warning when zero_grad is used in DataParallel (#33064)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870

DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064

Differential Revision: D19790178

Pulled By: albanD

fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad
2020-02-10 07:04:27 -08:00
Richard Zou
8195961f20 Revert D19730209: [pytorch][PR] Issue a warning when using zero_grad in DataParallel
Test Plan: revert-hammer

Differential Revision:
D19730209

Original commit changeset: cb9b2cb0c2e0

fbshipit-source-id: 5bf53ea3c37a7ed2411a2acc34e40d07eff144c9
2020-02-06 07:05:51 -08:00
Peter Bell
46c3c18bcc Issue a warning when using zero_grad in DataParallel (#32870)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31768

`DataParallel` creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module,

~breaking any model that uses `backward`-`zero_grad` in its `forward`. I fix this by patching the replica module so that `zero_grad` clears grads on the parent as well.~

However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should raise a warning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32870

Differential Revision: D19730209

Pulled By: ezyang

fbshipit-source-id: cb9b2cb0c2e0aca688ce0ff3e56b40fbd2aa3c66
2020-02-05 20:25:04 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00