Summary:
Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395
Reviewed By: ngimel
Differential Revision: D22560538
Pulled By: mruberry
fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36403
Copy-paste of the issue description:
* Escape hatch: Introduce unsafe_* version of the three functions above that have the current behavior (outputs not tracked as views). The documentation will explain in detail why they are unsafe and when it is safe to use them. (basically, only the outputs OR the input can be modified inplace but not both. Otherwise, you will get wrong gradients).
* Deprecation: Use the CreationMeta on views to track views created by these three ops and throw warning when any of the views is modified inplace saying that this is deprecated and will raise an error soon. For users that really need to modify these views inplace, they should look at the doc of the unsafe_* version to make sure their usecase is valid:
* If it is not, then pytorch is computing wrong gradients for their use case and they should not do inplace anymore.
* If it is, then they can use the unsafe_* version to keep the current behavior.
* Removal: Use the CreationMeta on view to prevent any inplace on these views (like we do for all other views coming from multi-output Nodes). The users will still be able to use the unsafe_ versions if they really need to do this.
Note about BC-breaking:
- This PR changes the behavior of the regular function by making them return proper views now. This is a modification that the user will be able to see.
- We skip all the view logic for these views and so the code should behave the same as before (except the change in the `._is_view()` value).
- Even though the view logic is not performed, we do raise deprecation warnings for the cases where doing these ops would throw an error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39299
Differential Revision: D22432885
Pulled By: albanD
fbshipit-source-id: 324aef091b32ce69dd067fe9b13a3f17d85d0f12
Summary:
some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203
Differential Revision: D22463700
Pulled By: ngimel
fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41200
In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs.
Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role.
ghstack-source-id: 107458630
Test Plan: Docs only
Differential Revision: D22462158
fbshipit-source-id: 0d72fea11bcaab6d662184bbe7270529772a5e9b
Summary:
Fixes gh-39007
We replaced actual content with links to generated content in many places to break the documentation into manageable chunks. This caused references like
```
https://pytorch.org/docs/stable/torch.html#torch.flip
```
to become
```
https://pytorch.org/docs/master/generated/torch.flip.html#torch.flip
```
The textual content that was located at the old reference was replaced with a link to the new reference. This PR adds a `<p id="xxx"/p>` reference next to the link, so that the older references from outside tutorials and forums still work: they will bring the user to the link that they can then follow through to see the actual content.
The way this is done is to monkeypatch the sphinx writer method that produces the link. It is ugly but practical, and in my mind not worse than adding javascript to do the same thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39086
Differential Revision: D22462421
Pulled By: jlin27
fbshipit-source-id: b8f913b38c56ebb857c5a07bded6509890900647
Summary:
[index_put](https://pytorch.org/docs/master/tensors.html#torch.Tensor.index_put) requires src and dst tensors to be the same dtype, so imo it belongs on the promote list when autocast is active (output should be widest dtype among input dtypes).
i also put some other registrations in alphabetical order.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41035
Differential Revision: D22418305
Pulled By: ngimel
fbshipit-source-id: b467cb16ac6c2ba1f9e43531f69a144b17f00b87
Summary:
solves most of gh-38011 in the framework of solving gh-32703.
These should only be formatting fixes, I did not try to fix grammer and syntax.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41068
Differential Revision: D22411919
Pulled By: zou3519
fbshipit-source-id: 25780316b6da2cfb4028ea8a6f649bb18b746440
Summary:
I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878
Differential Revision: D22404647
Pulled By: ngimel
fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461
It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable.
Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type.
As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11.
{F241283111}
ghstack-source-id: 106472496
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork
buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \
-r test_rref_str
buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \
-r test_return_local_rrefs
buck test mode/dev-nosan //caffe2/torch/fb/distributed/model_parallel/tests:test_elastic_averaging -- 'test_elastic_averaging_center \(caffe2\.torch\.fb\.distributed\.model_parallel\.tests\.test_elastic_averaging\.TestElasticAveragingCenter\)'
P134031188
Differential Revision: D7933834
fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247
Summary:
Currently, a custom autograd function written with
```
torch.cuda.amp.custom_fwd(cast_inputs=dtype)
def forward(ctx, *args):
...
```
casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way.
The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171
Differential Revision: D22179511
Pulled By: ngimel
fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40377
Cleans up the docstring for quantized ELU and adds it to the quantization docs.
Test Plan: * build on Mac OS and inspect
Differential Revision: D22162834
Pulled By: vkuzo
fbshipit-source-id: e548fd4dc8d67db27ed19cac4dbdf2a942586759
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40346
Cleans up docstrings for quantized BatchNorm and adds to quantization docs
Test Plan: * build on Mac OS and inspect
Differential Revision: D22152633
Pulled By: vkuzo
fbshipit-source-id: e0bf02194158231e0205b5b2df7f6f1ffc3c4d65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40345
Fixes docstrings and adds to quantization docs for quantized InstanceNorm.
Test Plan: * build on Mac OS and inspect
Differential Revision: D22152637
Pulled By: vkuzo
fbshipit-source-id: 7a485311ead20796b7a0944827d1d04e14ec8dcd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40343
Cleans up the quantized GroupNorm docstring and adds it to quantization docs.
Test Plan: * build on Mac OS and inspect
Differential Revision: D22152635
Pulled By: vkuzo
fbshipit-source-id: 5553b841c7a5d77f1467f0c40657db9e5d730a12
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40342
Cleans up the docstrings for quantized LayerNorm, and adds it to the docs.
Test Plan: * build on Mac OS and inspect
Differential Revision: D22152639
Pulled By: vkuzo
fbshipit-source-id: 38adf14b34675d1983ac4ed751938aa396e5400b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40341
Cleans up the hardtanh docstring and adds it to quantization docs.
Test Plan: * build and inspect on Mac OS
Differential Revision: D22152636
Pulled By: vkuzo
fbshipit-source-id: c98e635199c8be332aa6958664ff23faad834908
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40340
Adds and simplifies quantization docs for hardsigmoid
Test Plan:
* build docs on Mac OS
* inspect
Differential Revision: D22152634
Pulled By: vkuzo
fbshipit-source-id: 18da273023fb00e5f0bc1e881b00536492c606d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40323
Cleans up the naming and the function param docs for quantized hardswish.
Remove redundant docstrings and link to floating point modules instead.
Test Plan:
* build the docs on Mac OS
* verify that every link works as expected
Differential Revision: D22152638
Pulled By: vkuzo
fbshipit-source-id: fef04874ae460b449c677424a6a1c6dd47054795
Summary:
https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test.
This PR is intended as the "master copy" for merge, and it runs on full CI.
Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`).
- https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself.
- https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs.
Both fail the same way, indicating failure is unrelated to this PR's other diffs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358
Differential Revision: D22165785
Pulled By: albanD
fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e
Summary:
Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380
Differential Revision: D22165556
Pulled By: rohan-varma
fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490
A meta tensor is a tensor that is a lot like a normal tensor,
except it doesn't actually have any data associated with it.
You can use them to carry out shape/dtype computations without
actually having to run the actual code; for example, this could
be used to do shape inference in a JIT analysis pass.
Check out the description in DispatchKey.h for more information.
Meta tensors are part of a larger project to rationalize how we
write kernels so that we don't have to duplicate shape logic
in CPU kernel, CUDA kernel and meta kernel (this PR makes the
duplication problem worse!) However, that infrastructure can
be built on top of this proof of concept, which just shows how
you can start writing meta kernels today even without this
infrastructure.
There are a lot of things that don't work:
- I special cased printing for dense tensors only; if you try to
allocate a meta sparse / quantized tensor things aren't going
to work.
- The printing formula implies that torch.tensor() can take an
ellipsis, but I didn't add this.
- I wrote an example formula for binary operators, but it isn't
even right! (It doesn't do type promotion of memory layout
correctly). The most future proof way to do it right is to
factor out the relevant computation out of TensorIterator,
as it is quite involved.
- Nothing besides torch.add works right now
- Meta functions are ALWAYS included in mobile builds (selective
build doesn't work on them). This isn't a big deal for now
but will become more pressing as more meta functions are added.
One reason I'm putting up this PR now is to check with Yinghai Lu
if we can unblock shape inference for accelerators, while we are
still working on a long term plan for how to unify all shape
computation across our kernels.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21935609
Pulled By: ezyang
fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee
Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users.
Test Plan: Sandcaslte + OSS CI.
Reviewed By: ngimel
Differential Revision: D22146782
fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40296
1. Added a link to parameter server tutorial
2. Explained current states for TorchScript support
Test Plan: Imported from OSS
Differential Revision: D22142647
Pulled By: mrshenli
fbshipit-source-id: ffd697dd64a3aa874cf3f3488122ed805903370d