Summary:
This PR adds a polyfill for `typing.List` for Python versions that don't
support `typing` as a builtin. It also moves the type defintions from
`annotations.py` so that they can be used in `torch.nn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482
Differential Revision: D13237570
Pulled By: driazati
fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361
Differential Revision: D13192231
Pulled By: driazati
fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13252887
Pulled By: driazati
fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
Summary:
Resubmitting PR #14415
The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.
Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509
Differential Revision: D13247917
Pulled By: eellison
fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15
Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415
Reviewed By: wanchaol
Differential Revision: D13219647
Pulled By: eellison
fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13192230
Pulled By: driazati
fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.
This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Fixes#14081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425
Differential Revision: D13222104
Pulled By: driazati
fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392
Differential Revision: D13219698
Pulled By: eellison
fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.
This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081
Differential Revision: D13192228
Pulled By: driazati
fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232
Differential Revision: D13164506
Pulled By: wanchaol
fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:
```python
torch.jit.script
def fn(x=None):
# type: (Optional[Tensor]) -> Tensor
return torch.jit._unwrap_optional(x)
torch.jit.script
def fn2():
# type: () -> Tensor
return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120
Differential Revision: D13124625
Pulled By: driazati
fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
Summary:
This PR did three things:
1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016
Differential Revision: D13112064
Pulled By: wanchaol
fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
Summary:
This PR did two thing:
1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic
2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877
Differential Revision: D13076090
Pulled By: wanchaol
fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8
Summary:
I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199
Differential Revision: D13062326
Pulled By: nairbv
fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd
Summary:
Convert some more functions to match up with features added. Some
conversions were unsuccessful but the type line was left in for later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707
Differential Revision: D13030210
Pulled By: driazati
fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc
Summary:
* Adds `OptionalType` support for import/export
* Optionals get exported along with their contained type, i.e. 'Optional[int]'
* Allows concrete types and `None` to be passed to an op that takes an optional
* Converts `softmax`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647
Differential Revision: D12954672
Pulled By: driazati
fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a
Summary:
This PR is a part of task to unblock standard library export.
* we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue.
* Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy.
* It also tries to export bilinear as an example of undefined tensor(None) input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650
Differential Revision: D12967026
Pulled By: wanchaol
fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a
Summary:
To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script
Depends on #13484 for saving rng state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401
Differential Revision: D12868501
Pulled By: driazati
fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27
Summary:
This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script
NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR)
Converts these functions to weak script:
* `threshold`
* `relu`
* `hardtanh`
* `relu6`
* `elu`
* `selu`
* `celu`
* `leaky_relu`
* `rrelu`
* `tanh`
* `sigmoid`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322
Differential Revision: D12852203
Pulled By: driazati
fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809
Summary:
To convert `nn.functional.dropout`
* `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions`
* rng state between calls in the tests needed to be made consistent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484
Differential Revision: D12929622
Pulled By: driazati
fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms
CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8
1 thread (before vs. after)
10240: 17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop
16 threads (before vs. after)
10240: 17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop
CUDA timings are not measurably different.
[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182
Reviewed By: soumith
Differential Revision: D12825105
Pulled By: colesbury
fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
Summary:
Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310
Differential Revision: D12840813
Pulled By: SsnL
fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5
Summary:
1. Refactor DDPG predictor. Merge the critic predictor with ParametricDQNPredictor since they are the same
2. Fix bug where loss was multiplied by the batch size
3. Create DDPGFeedPredictor which uses the feed predictor output format
4. Add support for gridworld simulation memoization to DDPG. Also memoize normalization tables.
Reviewed By: kittipatv
Differential Revision: D10161240
fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12
Summary:
Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885
Differential Revision: D9643786
Pulled By: SsnL
fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac
Summary:
* Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn`
* Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error
* Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723
Differential Revision: D10452986
Pulled By: driazati
fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36
Summary:
include atomicAdd commentary as this is less well known
There is some discussion in #12207
Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217
Differential Revision: D10419739
Pulled By: SsnL
fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e
Summary:
There were two problems with SN + DP:
1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.
Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.
cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu
Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671
Differential Revision: D10410232
Pulled By: SsnL
fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719
Reviewed By: ezyang
Differential Revision: D10175514
Pulled By: zou3519
fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:
CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```
CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```
CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```
CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop
>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```
The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.
ezyang SsnL zou3519 soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758
Differential Revision: D9995799
Pulled By: weiyangfb
fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
Summary:
Related to #11624 adding maxes to the function def of embedding_bag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784
Differential Revision: D9892598
Pulled By: ezyang
fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b
Summary:
I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623
Differential Revision: D9818255
Pulled By: soumith
fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536