Summary:
Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc.
https://github.com/pytorch/pytorch/issues/48246
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786
Reviewed By: mrshenli
Differential Revision: D25893962
Pulled By: ezyang
fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052
Summary:
Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640)
Things added in this PR:
1. Modify backward and double-backward formulas
2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1)
3. Modify some existing tests to support complex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912
Reviewed By: zhangguanheng66
Differential Revision: D25853036
Pulled By: soulitzer
fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49726
Just cleaned up the unnecessary `ModuleAttributeError`
BC-breaking note:
`ModuleAttributeError` was added in the previous unsuccessful [PR](https://github.com/pytorch/pytorch/pull/49879) and removed here. If a user catches `ModuleAttributeError` specifically, this will no longer work. They should catch `AttributeError` instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50298
Reviewed By: mrshenli
Differential Revision: D25907620
Pulled By: jbschlosser
fbshipit-source-id: cdfa6b1ea76ff080cd243287c10a9d749a3f3d0a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49035
Test Plan:
Imported from GitHub, without a `Test Plan:` line.
Force rebased to deal with merge conflicts
Reviewed By: zhangguanheng66
Differential Revision: D25767065
Pulled By: walterddr
fbshipit-source-id: ffb904e449f137825824e3f43f3775a55e9b011b
Summary:
These unused variables were identified by [pyflakes](https://pypi.org/project/pyflakes/). They can be safely removed to simplify the code and possibly improve performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50100
Reviewed By: ezyang
Differential Revision: D25797764
Pulled By: smessmer
fbshipit-source-id: ced341aee692f429d2dcc3a4ef5c46c8ee99cabb
Summary:
The first commit fixes the `MultiheadAttention` docstrings, which are causing a cryptic KaTeX crash.
The second commit fixes many documentation issues in `torch/_torch_docs.py`, and closes gh-43667 (missing "Keyword arguments" headers). It also fixes a weird duplicate docstring for `torch.argmin`; there's more of these, it looks like they were written based on whether the C++ implementation has an overload. That makes little sense to a Python user though, and the content is simply duplicate.
The `Shape:` heading for https://pytorch.org/docs/master/generated/torch.nn.MultiheadAttention.html looked bad, here's what it looks like with this PR:
<img width="475" alt="image" src="https://user-images.githubusercontent.com/98330/102797488-09a44e00-43b0-11eb-8788-acdf4e936f2f.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49684
Reviewed By: ngimel
Differential Revision: D25730909
Pulled By: mruberry
fbshipit-source-id: d25bcf8caf928e7e8e918017d119de12e10a46e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49549
Somehow `mypy torch/quantization` got broken in the past couple of days:
https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d
. I didn't see any relevant PRs other than
https://github.com/pytorch/pytorch/pull/47725, which doesn't seem
related. The error doesn't seem real, as the arguments to
`_cudnn_rnn_flatten_weight` seem correct. For now,
ignoring the failure so we have a clean `mypy` run on
`torch/quantization`.
Test Plan:
```
mypy torch/quantization
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D25616972
fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd
Summary:
Fixes https://github.com/pytorch/pytorch/issues/598
This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output.
This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module).
This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46163
Reviewed By: ailzhang, mruberry
Differential Revision: D24894180
Pulled By: albanD
fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46213
I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should.
1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes.
2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that.
3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places.
4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that?
5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725
Reviewed By: zou3519
Differential Revision: D25449794
Pulled By: ngimel
fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49187
Expands the implementation of PixelShuffle to support any number of batch dimensions
Test Plan: `buck test caffe2/test:nn -- test_pixel_shuffle`
Reviewed By: mruberry
Differential Revision: D25399058
fbshipit-source-id: ab0a7f593b276cafc9ebb46a177e2c1dce56d0de
Summary:
- Add the link to the original paper (Attention is All You Need)
- Fix indentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48775
Reviewed By: H-Huang
Differential Revision: D25465914
Pulled By: heitorschueroff
fbshipit-source-id: bbc296ec1523326e323587023c126e820e90ad8d
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49064 by using raw strings
I removed `# noqa: W605` because that's the "invalid escape sequence" check: https://www.flake8rules.com/rules/W605.html
I wrote a quick test to make sure the strings are the same before and after this PR. This block should print `True` (it does for me).
```
convolution_notes1 = \
{"groups_note": r"""* :attr:`groups` controls the connections between inputs and outputs.
:attr:`in_channels` and :attr:`out_channels` must both be divisible by
:attr:`groups`. For example,
* At groups=1, all inputs are convolved to all outputs.
* At groups=2, the operation becomes equivalent to having two conv
layers side by side, each seeing half the input channels
and producing half the output channels, and both subsequently
concatenated.
* At groups= :attr:`in_channels`, each input channel is convolved with
its own set of filters (of size
:math:`\frac{\text{out\_channels}}{\text{in\_channels}}`).""",
"depthwise_separable_note": r"""When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also known as a "depthwise convolution".
In other words, for an input of size :math:`(N, C_{in}, L_{in})`,
a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments
:math:`(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})`."""} # noqa: B950
convolution_notes2 = \
{"groups_note": """* :attr:`groups` controls the connections between inputs and outputs.
:attr:`in_channels` and :attr:`out_channels` must both be divisible by
:attr:`groups`. For example,
* At groups=1, all inputs are convolved to all outputs.
* At groups=2, the operation becomes equivalent to having two conv
layers side by side, each seeing half the input channels
and producing half the output channels, and both subsequently
concatenated.
* At groups= :attr:`in_channels`, each input channel is convolved with
its own set of filters (of size
:math:`\\frac{\\text{out\_channels}}{\\text{in\_channels}}`).""", # noqa: W605
"depthwise_separable_note": """When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also known as a "depthwise convolution".
In other words, for an input of size :math:`(N, C_{in}, L_{in})`,
a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments
:math:`(C_\\text{in}=C_\\text{in}, C_\\text{out}=C_\\text{in} \\times \\text{K}, ..., \\text{groups}=C_\\text{in})`."""} # noqa: W605,B950
print(convolution_notes1 == convolution_notes2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49065
Reviewed By: agolynski
Differential Revision: D25464507
Pulled By: H-Huang
fbshipit-source-id: 88a65a24e3cc29774af25e09823257b2136550fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48661
Previously _conv_forward will add self.bias to the result, so bias is added twice in qat ConvBn module
this PR added a bias argument to _conv_forward and _conv_forward is called with zero bias
in ConvBn module
fixes: https://github.com/pytorch/pytorch/issues/48514
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D25249175
fbshipit-source-id: 4536c7545d3dcd7e8ea254368ffb7cf15118d78c
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46983.
The solution is based of two components:
1. The introduction of the `_initialized` attribute. This will be used during ParameterList/Dict creation methods `__init__` (introduced in https://github.com/pytorch/pytorch/issues/47772) and `__setstate__` to not trigger warnings when setting general `Module` attributes.
2. The introduction of the `not hasattr(self, key)` check to avoid triggering warnings when changing general `Module` attributes such as `.training` during the `train()` and `eval()` methods.
Tests related to the fix are added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48315
Reviewed By: mrshenli
Differential Revision: D25130217
Pulled By: albanD
fbshipit-source-id: 79e2abf1eab616f5de74f75f370c2fe149bed4cb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48593
Previously _conv_forward will add self.bias to the result, so bias is added twice in qat ConvBn module
this PR added a bias argument to _conv_forward and _conv_forward is called with zero bias
in ConvBn module
fixes: https://github.com/pytorch/pytorch/issues/48514
Test Plan: Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D25222215
fbshipit-source-id: 90c0ab79835b6d09622dcfec9de4139881a60746
Summary:
Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453
Reviewed By: mruberry
Differential Revision: D25181871
Pulled By: ngimel
fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641
Summary:
Added a convenience function that allows users to load models without DP/DDP from a DP/DDP state dict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45643
Reviewed By: rohan-varma
Differential Revision: D24574649
fbshipit-source-id: 17d29ab16ae24a30890168fa84da6c63650e61e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758
It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type.
Test Plan: unit tests
Reviewed By: ngimel
Differential Revision: D24470808
fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b