Commit Graph

1070 Commits

Author SHA1 Message Date
Jessica Lin
8acfecaecb
[1.6] Add optimizer_for_mobile doc into python api root doc (#41491)
* Add optimizer_for_mobile doc into python api root doc

* Apply suggestions from code review

Remove all references to `optimization_blacklist` as it's missing in 1.6

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-07-22 17:37:45 -07:00
anjali411
8f804baaa9 Doc note for complex (#41252)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D22553266

Pulled By: anjali411

fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d
2020-07-22 14:49:51 -07:00
anjali411
a395e0903e Autograd Doc for Complex Numbers (#41012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012

Test Plan: Imported from OSS

Differential Revision: D22476911

Pulled By: anjali411

fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828
2020-07-22 14:40:52 -07:00
Edward Z. Yang
2ca55430d2
Add reference documentation for torch/library.h (#41470) (#41602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41470

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D22577426

Pulled By: ezyang

fbshipit-source-id: 4bfe5806061e74181a74d161c868acb7c1ecd1e4
2020-07-22 11:10:16 -07:00
Luca Wehrstedt
d9e9e0087a
[v1.6] [RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41229)
Summary:
In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs.

Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role.

Original PR against master: #41200 (merged as dde3d5f4a8)

Test Plan: Docs only
2020-07-10 09:02:08 -07:00
mrshenli
59bb44a8e8
Add a link in RPC doc page to point to PT Distributed overview (#41108) (#41156)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108

Test Plan: Imported from OSS

Differential Revision: D22440751

Pulled By: mrshenli

fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb
2020-07-09 07:49:10 -07:00
mcarilli
eaf3f2fd34
Added index_put to promotelist (#41036)
* Added index_put to promotelist

* docstring

Co-authored-by: Michael Carilli <mcarilli@nvidia.com>
2020-07-07 13:00:32 -07:00
Wanchao
31d9776c04
[1.6] fix autograd doc subsubsection display issue (#40796)
Master branch PR: https://github.com/pytorch/pytorch/pull/40582
2020-07-01 09:28:25 -07:00
eellison
8682ac147b
Docs merge (#40569)
Co-authored-by: Elias Ellison <eellison@fb.com>
2020-06-26 12:24:08 -07:00
Jessica Lin
4cc605e80a
(1.6) Update docs feature classifications (#40539)
Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
2020-06-26 12:23:02 -07:00
Jessica Lin
b0cce716f7
Add beta warning for quant docs (#40540)
Add a beta warning to match stable and master docs: https://github.com/pytorch/pytorch/blob/master/docs/source/quantization.rst
2020-06-26 12:20:06 -07:00
mrshenli
0dc93ac119
[v1.6.0 patch] Install method docstrings from PyRRef to RRef (#40620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461

It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable.

Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type.

As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11.

{F241283111}

ghstack-source-id: 106472496

P134031188

Differential Revision: D7933834

fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
2020-06-26 12:15:28 -07:00
Jessica Lin
bb848df10b
[1.6] Remove table of contents at the top of rpc.rst (#40482)
Master PR: https://github.com/pytorch/pytorch/pull/40205

Remove the table of contents created by the `.. contents:: :local: :depth: 2` since this page isn't one of the large documentation pages (https://github.com/pytorch/pytorch/issues/38010) and is simply a landing page for the Distributed RPC Framework.

Changes made in this original PR: f10fbcc820 (diff-250b9b23fd6f1a5c15aecdb72afb9d7d)
2020-06-26 08:37:49 -07:00
Michael Carilli
3b040c478a Make custom_fwd a no-op when not executed under autocast (#36171)
Summary:
Currently, a custom autograd function written with
```
torch.cuda.amp.custom_fwd(cast_inputs=dtype)
def forward(ctx, *args):
    ...
```
casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region.  I think I had the wrong idea there.  Autocast-disabled regions should give the user control of input types.  Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions.  C++-side casting wrappers have no effect when autocast is disabled, and  `custom_fwd`'s casting should behave the same way.

The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171

Differential Revision: D22179511

Pulled By: ngimel

fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd
2020-06-23 10:23:02 -07:00
Vasiliy Kuznetsov
9bf255573f quant docs: add and clean up ELU (#40377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40377

Cleans up the docstring for quantized ELU and adds it to the quantization docs.

Test Plan: * build on Mac OS and inspect

Differential Revision: D22162834

Pulled By: vkuzo

fbshipit-source-id: e548fd4dc8d67db27ed19cac4dbdf2a942586759
2020-06-23 09:02:43 -07:00
Vasiliy Kuznetsov
d71ec51c0e quant docs: add and clean up BatchNorm{n}d (#40346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40346

Cleans up docstrings for quantized BatchNorm and adds to quantization docs

Test Plan: * build on Mac OS and inspect

Differential Revision: D22152633

Pulled By: vkuzo

fbshipit-source-id: e0bf02194158231e0205b5b2df7f6f1ffc3c4d65
2020-06-23 09:02:41 -07:00
Vasiliy Kuznetsov
5e683517a7 quant docs: add and clean up InstanceNorm{n}d (#40345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40345

Fixes docstrings and adds to quantization docs for quantized InstanceNorm.

Test Plan: * build on Mac OS and inspect

Differential Revision: D22152637

Pulled By: vkuzo

fbshipit-source-id: 7a485311ead20796b7a0944827d1d04e14ec8dcd
2020-06-23 09:02:39 -07:00
Vasiliy Kuznetsov
6e3fdd77ca quant docs: add and clean up GroupNorm (#40343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40343

Cleans up the quantized GroupNorm docstring and adds it to quantization docs.

Test Plan: * build on Mac OS and inspect

Differential Revision: D22152635

Pulled By: vkuzo

fbshipit-source-id: 5553b841c7a5d77f1467f0c40657db9e5d730a12
2020-06-23 09:02:36 -07:00
Vasiliy Kuznetsov
d15fcc7e49 quant docs: add and clean up LayerNorm (#40342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40342

Cleans up the docstrings for quantized LayerNorm, and adds it to the docs.

Test Plan: * build on Mac OS and inspect

Differential Revision: D22152639

Pulled By: vkuzo

fbshipit-source-id: 38adf14b34675d1983ac4ed751938aa396e5400b
2020-06-23 09:02:34 -07:00
Vasiliy Kuznetsov
d27f8eaf92 quant docs: add and clean up hardtanh (#40341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40341

Cleans up the hardtanh docstring and adds it to quantization docs.

Test Plan: * build and inspect on Mac OS

Differential Revision: D22152636

Pulled By: vkuzo

fbshipit-source-id: c98e635199c8be332aa6958664ff23faad834908
2020-06-23 09:02:32 -07:00
Vasiliy Kuznetsov
8e74fb6a0c quant docs: add and clean up hardsigmoid (#40340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40340

Adds and simplifies quantization docs for hardsigmoid

Test Plan:
* build docs on Mac OS
* inspect

Differential Revision: D22152634

Pulled By: vkuzo

fbshipit-source-id: 18da273023fb00e5f0bc1e881b00536492c606d3
2020-06-23 09:02:29 -07:00
Vasiliy Kuznetsov
c4594a97ae quant docs: clean up hardswish (#40323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40323

Cleans up the naming and the function param docs for quantized hardswish.
Remove redundant docstrings and link to floating point modules instead.

Test Plan:
* build the docs on Mac OS
* verify that every link works as expected

Differential Revision: D22152638

Pulled By: vkuzo

fbshipit-source-id: fef04874ae460b449c677424a6a1c6dd47054795
2020-06-23 08:59:34 -07:00
Michael Carilli
8066fba226 [RELAND2] Change AccumulateGrad to yield .grads that match weights' memory layout (#40358)
Summary:
https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test.

This PR is intended as the "master copy" for merge, and it runs on full CI.
Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`).
- https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master.  In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself.
- https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs.

Both fail the same way, indicating failure is unrelated to this PR's other diffs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358

Differential Revision: D22165785

Pulled By: albanD

fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e
2020-06-22 17:13:21 -07:00
Rohan Varma
ae2f1f0372 [DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380)
Summary:
Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380

Differential Revision: D22165556

Pulled By: rohan-varma

fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75
2020-06-22 16:19:29 -07:00
anjali411
8ec2ae9a9f Add view_as_real, view_as_complex for complex tensors (#39099)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099

Test Plan: Imported from OSS

Differential Revision: D22057886

Pulled By: anjali411

fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14
2020-06-22 15:15:27 -07:00
Edward Yang
e4766fb4d9 Meta tensors, but without code deduplication (#38490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490

A meta tensor is a tensor that is a lot like a normal tensor,
except it doesn't actually have any data associated with it.
You can use them to carry out shape/dtype computations without
actually having to run the actual code; for example, this could
be used to do shape inference in a JIT analysis pass.
Check out the description in DispatchKey.h for more information.

Meta tensors are part of a larger project to rationalize how we
write kernels so that we don't have to duplicate shape logic
in CPU kernel, CUDA kernel and meta kernel (this PR makes the
duplication problem worse!)  However, that infrastructure can
be built on top of this proof of concept, which just shows how
you can start writing meta kernels today even without this
infrastructure.

There are a lot of things that don't work:
- I special cased printing for dense tensors only; if you try to
  allocate a meta sparse / quantized tensor things aren't going
  to work.
- The printing formula implies that torch.tensor() can take an
  ellipsis, but I didn't add this.
- I wrote an example formula for binary operators, but it isn't
  even right!  (It doesn't do type promotion of memory layout
  correctly).  The most future proof way to do it right is to
  factor out the relevant computation out of TensorIterator,
  as it is quite involved.
- Nothing besides torch.add works right now
- Meta functions are ALWAYS included in mobile builds (selective
  build doesn't work on them).  This isn't a big deal for now
  but will become more pressing as more meta functions are added.

One reason I'm putting up this PR now is to check with Yinghai Lu
if we can unblock shape inference for accelerators, while we are
still working on a long term plan for how to unify all shape
computation across our kernels.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21935609

Pulled By: ezyang

fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee
2020-06-22 09:18:33 -07:00
Jerry Zhang
59ca1d31ca [quant][graphmode] docstrings for top level APIs (#40328)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40328

Test Plan: Imported from OSS

Differential Revision: D22149708

fbshipit-source-id: 63a1cd229d9e4668fba0ef3977e894cb8984318b
2020-06-19 22:20:23 -07:00
Mike Ruberry
4f761f325c Back out "[pytorch][PR] Removes dunder div"
Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users.

Test Plan: Sandcaslte + OSS CI.

Reviewed By: ngimel

Differential Revision: D22146782

fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070
2020-06-19 18:31:20 -07:00
Shen Li
3ca05500fa Improve RPC documents (#40296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40296

1. Added a link to parameter server tutorial
2. Explained current states for TorchScript support

Test Plan: Imported from OSS

Differential Revision: D22142647

Pulled By: mrshenli

fbshipit-source-id: ffd697dd64a3aa874cf3f3488122ed805903370d
2020-06-19 15:34:49 -07:00
James Reed
c73095e78f Add note to serialization docs about zipfile format (#40288)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288

Test Plan: Imported from OSS

Differential Revision: D22140324

Pulled By: jamesr66a

fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d
2020-06-19 13:40:08 -07:00
Negin Raoof
73a156e81f [ONNX] Update pytorch/onnx docs for new export API args (#39802)
Summary:
Update pytorch/onnx docs for new export API args:
Use external data format and Training args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39802

Reviewed By: hl475

Differential Revision: D22139664

Pulled By: houseroad

fbshipit-source-id: 7d6dcf75129cb88987f8c37b7d9d48ca594c0f38
2020-06-19 13:38:47 -07:00
Luca Wehrstedt
2393bab036 [TensorPipe] Update documentation (#40222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222

Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use.
ghstack-source-id: 106225711

Test Plan: Export to GitHub, build locally and try out the docs.

Differential Revision: D22116494

fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1
2020-06-19 04:26:49 -07:00
Meghan Lele
d58b8222b7 [JIT] Add support for with statements (#34705)
Summary:
**Summary**
This commit adds support for with statements to PyTorch JIT. Each
of the with items in a with statement is represented in the JIT IR
as a pair of `prim::Enter` and `prim::Exit` nodes that call the
`__enter__` and `__exit__` methods defined on the context manager objects
returned by the expressions in the with item.

**Testing**
This commit adds unit tests for with statements with named with items,
nameless with items, and with statements that encounter exceptions.
```
$ python test/test_jit.py TestWith.test_with_as
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.430s

OK
```

```
$ python test/test_jit.py TestWith.test_with_no_as
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 0.264s

OK
```

```
$ python test/test_jit.py TestWith.test_with_exceptions
Fail to import hypothesis in common_utils, tests are not derandomized
Couldn't download test skip set, leaving all tests enabled...
.
----------------------------------------------------------------------
Ran 1 test in 1.053s

OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34705

Differential Revision: D22095945

Pulled By: SplitInfinity

fbshipit-source-id: f661565a834786725259b8ea014b4d7532f9419d
2020-06-18 16:57:18 -07:00
Shihao Xu
f3f30d4354 [JIT x RPC] Consolidate RRef type class and RRef impl class (#35694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35694

close https://github.com/pytorch/pytorch/issues/35110

Differential Revision: D7881729

fbshipit-source-id: eedda8f1b7510491886d469efeed4e002bb8b991
2020-06-18 07:46:38 -07:00
Shen Li
74142f76fa Adding torch.futures to API docs (#40051)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40051

Test Plan: Imported from OSS

Differential Revision: D22055031

Pulled By: mrshenli

fbshipit-source-id: ce8a79ba4ffdc7dbed6d4c62b1c33b96764c89e7
2020-06-17 17:55:48 -07:00
Alban Desmaison
08227fea4f Revert D22079377: [pytorch][PR] [RELAND] Change AccumulateGrad to yield .grads that match weights' memory layout
Test Plan: revert-hammer

Differential Revision:
D22079377

Original commit changeset: 9bd2b7e0c34f

fbshipit-source-id: c22cc349d790caa574eace0d63980854c33e5a59
2020-06-17 10:17:27 -07:00
Michael Carilli
1ec8ece2b9 [RELAND] Change AccumulateGrad to yield .grads that match weights' memory layout (#40129)
Summary:
https://github.com/pytorch/pytorch/pull/34904 was reverted because it had a misconfigured 4 GPU test that for some reason wasn't caught by external CI ([example failure](https://app.circleci.com/pipelines/github/pytorch/pytorch/181719/workflows/cfb37cd9-9a0c-4738-898b-d683934cd308/jobs/5868948/steps)).

This PR reverts the revert, and adds diffs that should repair the misconfigured test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40129

Differential Revision: D22079377

Pulled By: albanD

fbshipit-source-id: 9bd2b7e0c34fdaf887497b52037cfe82cba709c1
2020-06-17 09:02:54 -07:00
Mike Ruberry
9d588f7ce2 Removes dunder div (#39151)
Summary:
BC-breaking note:

If a user is using one of these dunders directly they will not longer be available. Users should update to Python3 compatible dunders.

Original PR note:

`__div__` (and `__idiv__` and `__rdiv__`) are no longer special dunders in Python3. This PR replaces them with the `__truediv__` (`__itrudediv__`, `__rtruediv__`) dunders, since we no longer support Python2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39151

Differential Revision: D22075713

Pulled By: mruberry

fbshipit-source-id: d318b47b51f7cc4c3728b1606a34d81e49ba0fa1
2020-06-16 23:02:20 -07:00
Alban Desmaison
f1e575a0bf Revert D20496044: [pytorch][PR] Change AccumulateGrad to yield .grads that match weights' memory layout
Test Plan: revert-hammer

Differential Revision:
D20496044

Original commit changeset: 248d680f4b1b

fbshipit-source-id: 6462b25e3fb9c8596c1da443389089f09c32df4d
2020-06-16 10:38:40 -07:00
mattip
dd581b4512 DOC: fix rpc reference in top-level index (#40077)
Summary:
Fixes gh-40046

PR gh-37419 refactored the content of `docs/source/rpc/index.rst` into `docs/source/rpc.rst` but did not link to the latter from `doc/source/index.rst` so the top-level RPC documentation is missing from https://pytorch.org/docs/master/.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40077

Differential Revision: D22068128

Pulled By: mrshenli

fbshipit-source-id: 394433f98f86509e0c9cb6d910a86fb8a2932683
2020-06-16 10:26:03 -07:00
Michael Carilli
2beb9690c3 Change AccumulateGrad to yield .grads that match weights' memory layout (#34904)
Summary:
Currently, whether `AccumulateGrad`  [steals](67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L42)) or [clones](67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L80)) an incoming gradient, the gradient ends up rowmajor contiguous, regardless of its param's layout.  If the param's layout is channels last, or otherwise not rowmajor contigous, later kernels that apply gradients to params are forced into an uncoalesced memory access pattern for either the param or the gradient.  This may not sound like a big deal but for any binary op on large tensors it's a >3X increase in gmem traffic => 3X slowdown.

The present PR changes `AccumulateGrad` to prefer, where possible, stashing gradients that match their params' layouts (["Gradient Layout Contract"](https://github.com/pytorch/pytorch/pull/34904/files#diff-ef1a56d24f66b280dcdb401502d6a796R29-R38)).

Allowing `AccumulateGrad` to stash non-rowmajor-contiguous grads means DDP allreduces and DP reduces must allow non-rowmajor-contiguous grads.  This PR extends DDP and DP to allow gradients with non-rowmajor-contiguous strides as long as their layout is nonoverlapping and dense.

For good measure, I include changes that allow all five nccl primitives (allreduce, reduce, broadcast, allgather, reducescatter) to act on non-rowmajor-contiguous tensors (again as long as each input's layout is nonoverlapping and dense, and as long as all tensors participating in a given collective have the same layout).  The primitive comm changes aren't necessary to enable the DDP changes, but I wasn't sure this would end up true until I had written both sets of changes.  I think primitive comm enablement is reasonable to keep in the PR, especially since the code for it is simple.

Channels last params will be a major beneficiary of this PR, but I don't see it as channels-last-specific fix.  The spirit is layout matching in general:
- Grads should be stashed with memory layouts matching their params.
- Src and dst tensors on opposite ends of collectives should have matching dense layouts.

This PR also updates autograd docs to describe potential BC-breaking changes below.

## BC notes
ngimel albanD gchanan

#### BC-breaking
In the common case where the user lets AccumulateGrad decide grad layouts, strides for grads of dense but non-rowmajor-contiguous params will change.  Any user code that was accustomed to `view(-1)`ing these grads will break.

Also, the circumstances under which a grad can be stolen directly from the backward function that created it, as opposed to deep-copied by AccumulateGrad, have changed.  In most cases we expect silent performance improvement, because we expect channels-last-aware backward kernels will create channels last gradients for channels last params.  Now those can be stolen, whereas before this PR they were cloned and made rowmajor contiguous.  IMO this is a mild BC breakage.  Param backward hooks still see grads come in with whatever format the backward kernel gave them.  The only BC breakage potential I see is if user code relies somehow on a grad in a hook having or not having the same deep memory as the eventual `param.grad`.  Any such users hopefully know they're off the edge of the map and understand how to update their expectations.

#### BC escape hatches
At alband's recommendation, this PR's changes to AccumulateGrad do not alter the pre-PR code's decisions about whether grad is accumulated in or out of place.  Accumulations of new grads onto an existing `.grad` attribute were (usually) in-place before this PR and remain in-place after this PR, keeping the existing `.grad`'s layout.  After this PR, if the user wants to force accumulation into a grad with a particular layout, they can preset `param.grad` to a zeroed tensor with the desired strides or call `grad.contiguous(desired format)`.  This likely won't be as performant as letting AccumulateGrad establish grad layouts by cloning or stealing grads with contract-compliant strides, but at least users have a control point.

One limitation (present before this PR and unchanged by this PR):  Presetting `param.grad` does not ensure in-place accumulation all the time.  For example, if `create_graph=True`, or if incoming `new_grad` is dense and existing `variable_grad` is sparse, accumulation occurs out of place, and the out-of-place result may not match the existing grad's strides.

----------------------------
I also noticed some potential DDP improvements that I considered out of scope but want to mention for visibility:
1. make sure Reducer's ops sync with AccumulateGrad streams
2. ~to reduce CPU overhead and incur fewer kernel launches, lazily create flat `contents` tensors by a single `cat` kernel only when a bucket is full, instead of `copy_`ing grads into `contents` individually as soon as they are received.~  PR includes a [minor change](https://github.com/pytorch/pytorch/pull/34904/files#diff-c269190a925a4b0df49eda8a8f6c5bd3R312-R315) to divide grads while copying them into flat buffers, instead of copying them in, then dividing separately.  Without cat+div fusion, div-while-copying is the best we can do.
3. https://github.com/pytorch/pytorch/issues/38942
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34904

Differential Revision: D20496044

Pulled By: albanD

fbshipit-source-id: 248d680f4b1bf77b0a986451844ec6e254469217
2020-06-16 08:43:31 -07:00
Shawn Zhong
96870181c6 Remove duplicated entries in random.rst (#39725)
Summary:
In the current master doc, every function under [`torch.random`](https://pytorch.org/docs/master/random.html) appears twice because the function docs are generated by both `automodule` and `autofunction`.

This PR removes the parts generated by `autofunction`.

See changed docs at https://5751500-65600975-gh.circle-artifacts.com/0/docs/random.html:

![image](https://user-images.githubusercontent.com/6421097/84165823-bf720580-aa39-11ea-9149-c428d43260f8.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39725

Differential Revision: D21983701

Pulled By: ngimel

fbshipit-source-id: 5f515d7fd8034687e754e2c7b2ea9e154b3ea9b9
2020-06-10 16:51:15 -07:00
lixinyu
7cb4eae8b1 correct some cpp extension code usages and documents (#39766)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39766

Test Plan: Imported from OSS

Differential Revision: D21967284

Pulled By: glaringlee

fbshipit-source-id: 8597916bee247cb5f8c82ed8297119d2f3a72170
2020-06-10 08:31:22 -07:00
kshitij12345
9733390998 Add torch.flip{lr, ud} (#38599)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/38349

TODO:
* [x] Add Tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38599

Differential Revision: D21941884

Pulled By: mruberry

fbshipit-source-id: 7a442ff11051c2c868cf8e3c04e4bba0f1a1d426
2020-06-09 07:19:37 -07:00
krshrimali
335e4a1e3b Add arcosh, arcsinh and arctanh to unary ops (#38388)
Summary:
This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue https://github.com/pytorch/pytorch/issues/38349 for more details.

**TODOs:**

* [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help)
* [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`).

Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`)

cc: mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38388

Differential Revision: D21882055

Pulled By: mruberry

fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3
2020-06-04 11:40:55 -07:00
mattip
ada2652ca6 Restore docs coverage test via sphinx (#39331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39331

Fixes gh-37590

Adds an extra `make coverage` to document building, which uses the built-in facility in sphinx to check docstring coverage. Also fixes a failure to import `torch/jit/supported_ops.py` which broke the [Torchscript Builtins](https://pytorch.org/docs/stable/jit_builtin_functions.html) page.

This also adds the required `SPHINXOPTS` to turn warnings into error, but this is commented out. Note that since documentation of `torchvision` is merged in here, failures there would cause failures here if this is made active. Some thought might be needed about pinning the torchvision version merged into documentation.

The first commit should fail, since the "ScriptModule" class is commented out. I did that in order to check that a CI failure is properly reported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38244

Differential Revision: D21640589

Pulled By: ezyang

fbshipit-source-id: 1e240d81669b5f21404d596de4a27d192dc9fd8a
2020-06-04 10:49:38 -07:00
Aayush Naik
0829cadca3 Implement rad2deg, deg2rad (#38852)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/38372.

cc mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38852

Differential Revision: D21868935

Pulled By: mruberry

fbshipit-source-id: ae6ded11b743c9d1cdc032984b4abe0a115290d6
2020-06-03 22:21:54 -07:00
neginraoof
4d597cb794 [ONNX] Update pytoch/onnx doc (#39480)
Summary:
Updated dos for operator_export_types and recently added op symbolics.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39480

Reviewed By: hl475

Differential Revision: D21877364

Pulled By: houseroad

fbshipit-source-id: 9831fe5776629da897db6d7943f830528cb916d2
2020-06-03 22:15:30 -07:00
Shen Li
a05ef17e46 Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216

The `rpc.functions.async_execution` decorator specifies that the
wrapped function is guaranteed to return a `torch.futures.Future`.
The decorator adds a `_wrapped_async_rpc_function` attribute to
the wrapper function. The caller retrieves this information and
then sets `isAsyncFunction` argument accordingly which is later
added to PythonCall RPC message as a field. On the callee side,
if the PythonCall carries an asynchronous function, it will cast
the function's return value to a jit::PythonFutureWrapper object,
and then install response creation and communication as a callback
on the that jit::PythonFutureWrapper.

For applications, this feature is useful when a function needs to
wait for IO or additional singaling. In those cases, marking the
user function as `rpc.functions.async_execution` will prevent it
from blocking one thread on callee for too long.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D21779962

fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
2020-06-02 23:21:25 -07:00
Xiang Gao
ebd4125e7e [JIT] Make torch.unique_consecutive compatible (#39339)
Summary:
A `unique_consecutive` version of https://github.com/pytorch/pytorch/pull/38156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39339

Differential Revision: D21823997

Pulled By: eellison

fbshipit-source-id: d14596a36ba36497e296da5a344e0376cef56f1b
2020-06-02 14:54:29 -07:00