Summary:
https://github.com/pytorch/pytorch/issues/38349
mruberry
Not entirely sure if all the changes are necessary in how functions are added to Pytorch.
Should it throw an error when called with a non-complex tensor? Numpy allows non-complex arrays in its imag() function which is used in its isreal() function but Pytorch's imag() throws an error for non-complex arrays.
Where does assertONNX() get its expected output to compare to?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41298
Reviewed By: ngimel
Differential Revision: D22610500
Pulled By: mruberry
fbshipit-source-id: 817d61f8b1c3670788b81690636bd41335788439
Summary:
Implementing the quantile operator similar to [numpy.quantile](https://numpy.org/devdocs/reference/generated/numpy.quantile.html).
For this implementation I'm reducing it to existing torch operators to get free CUDA implementation. It is more efficient to implement multiple quickselect algorithm instead of sorting but this can be addressed in a future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39417
Reviewed By: mruberry
Differential Revision: D22525217
Pulled By: heitorschueroff
fbshipit-source-id: 27a8bb23feee24fab7f8c228119d19edbb6cea33
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575
Fixes https://github.com/pytorch/pytorch/issues/34294
This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design.
The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures.
I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support.
Benchmarks are still WIP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725
Reviewed By: mruberry
Differential Revision: D22357738
Pulled By: ezyang
fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36403
Copy-paste of the issue description:
* Escape hatch: Introduce unsafe_* version of the three functions above that have the current behavior (outputs not tracked as views). The documentation will explain in detail why they are unsafe and when it is safe to use them. (basically, only the outputs OR the input can be modified inplace but not both. Otherwise, you will get wrong gradients).
* Deprecation: Use the CreationMeta on views to track views created by these three ops and throw warning when any of the views is modified inplace saying that this is deprecated and will raise an error soon. For users that really need to modify these views inplace, they should look at the doc of the unsafe_* version to make sure their usecase is valid:
* If it is not, then pytorch is computing wrong gradients for their use case and they should not do inplace anymore.
* If it is, then they can use the unsafe_* version to keep the current behavior.
* Removal: Use the CreationMeta on view to prevent any inplace on these views (like we do for all other views coming from multi-output Nodes). The users will still be able to use the unsafe_ versions if they really need to do this.
Note about BC-breaking:
- This PR changes the behavior of the regular function by making them return proper views now. This is a modification that the user will be able to see.
- We skip all the view logic for these views and so the code should behave the same as before (except the change in the `._is_view()` value).
- Even though the view logic is not performed, we do raise deprecation warnings for the cases where doing these ops would throw an error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39299
Differential Revision: D22432885
Pulled By: albanD
fbshipit-source-id: 324aef091b32ce69dd067fe9b13a3f17d85d0f12
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39342
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.
Test Plan:
python test/test_nn.py TestAddRelu
Imported from OSS
Differential Revision: D21822397
fbshipit-source-id: 03df83a3e46ddb48a90c5a6f755227a7e361a0e8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490
A meta tensor is a tensor that is a lot like a normal tensor,
except it doesn't actually have any data associated with it.
You can use them to carry out shape/dtype computations without
actually having to run the actual code; for example, this could
be used to do shape inference in a JIT analysis pass.
Check out the description in DispatchKey.h for more information.
Meta tensors are part of a larger project to rationalize how we
write kernels so that we don't have to duplicate shape logic
in CPU kernel, CUDA kernel and meta kernel (this PR makes the
duplication problem worse!) However, that infrastructure can
be built on top of this proof of concept, which just shows how
you can start writing meta kernels today even without this
infrastructure.
There are a lot of things that don't work:
- I special cased printing for dense tensors only; if you try to
allocate a meta sparse / quantized tensor things aren't going
to work.
- The printing formula implies that torch.tensor() can take an
ellipsis, but I didn't add this.
- I wrote an example formula for binary operators, but it isn't
even right! (It doesn't do type promotion of memory layout
correctly). The most future proof way to do it right is to
factor out the relevant computation out of TensorIterator,
as it is quite involved.
- Nothing besides torch.add works right now
- Meta functions are ALWAYS included in mobile builds (selective
build doesn't work on them). This isn't a big deal for now
but will become more pressing as more meta functions are added.
One reason I'm putting up this PR now is to check with Yinghai Lu
if we can unblock shape inference for accelerators, while we are
still working on a long term plan for how to unify all shape
computation across our kernels.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21935609
Pulled By: ezyang
fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39203
Adds logic and test coverage for optional weights and biases for
the quantized normalization operators. This was broken before this
PR because the `TORCH_LIBRARY` registration had these as required parameters
- removed it, and cleaned up the callsites.
Note: consolidating the registrations in `native_functions.yaml` as opposed to `library.cpp`
after a discussion with ezyang .
Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_qlayer_norm
python test/test_quantization.py TestQuantizedOps.test_group_norm
python test/test_quantization.py TestQuantizedOps.test_instance_norm
python test/test_quantization.py TestStaticQuantizedModule.test_layer_norm
python test/test_quantization.py TestStaticQuantizedModule.test_group_norm
python test/test_quantization.py TestStaticQuantizedModule.test_instance_norm
python test/test_quantization.py TestQuantizeScriptPTSQOps.test_layer_norm
python test/test_quantization.py TestQuantizeScriptPTSQOps.test_group_norm
python test/test_quantization.py TestQuantizeScriptPTSQOps.test_instance_norm
```
Imported from OSS
Differential Revision: D21885259
fbshipit-source-id: 978c7b8bd6c11a03e9e5fdb68f154cb80cc43599
Summary:
Adds `torch.experimental.deterministic` flag to enforce deterministic algorithms across all of pytorch.
Adds `torch.experimental.deterministic_error_level` to allow users to choose between error/warning/silent if determinism for an operation is not available.
Adds `torch.experimental.alert_not_deterministic()` which should be called within operations that are not deterministic.
Offers both Python and ATen interfaces
Issue https://github.com/pytorch/pytorch/issues/15359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38683
Differential Revision: D21998093
Pulled By: ezyang
fbshipit-source-id: 23aabbddd20f6199d846f97764ff24d728163737
Summary:
This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue https://github.com/pytorch/pytorch/issues/38349 for more details.
**TODOs:**
* [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help)
* [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`).
Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`)
cc: mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38388
Differential Revision: D21882055
Pulled By: mruberry
fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37259, fixes https://github.com/pytorch/pytorch/issues/20156
This lazily calls `at::init_num_threads` once for each thread by adding a call to `lazy_init_num_threads` in `at::parallel_for` and `at::parallel_reduce`.
If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37461
Reviewed By: ezyang
Differential Revision: D21472763
Pulled By: ilia-cher
fbshipit-source-id: 889d6664f5bd4080037ade02ee324b1233992915
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36847
Adds a quantized instancenorm operator, which can reuse most of
groupnorm's logic.
Benchmarking shows that the quantized version is about 10x faster than
floating point for equivalent input sizes
(https://gist.github.com/vkuzo/2f230e84d26f26cc6030afdbfbc8e7f0)
Test Plan:
```
python test/quantization/test_quantized.py TestQuantizedOps.test_instance_norm
```
Imported from OSS
Differential Revision: D21107925
fbshipit-source-id: 6bacda402f0eb9857bc8f9a5cf8ef306150613d4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36835
Adds a quantized groupnorm operator. We reuse most of the layernorm
kernel, modifying it to be able to perform channel-wise scaling.
Benchmark results: the quantized layer is between 6x to 15x faster
from fp to q, depending on input shapes
(full results:
https://gist.github.com/vkuzo/db67623232415382dabff6c8923124e9)
Test Plan:
```
python test/quantization/test_quantized.py TestQuantizedOps.test_group_norm
python test/quantization/test_quantized.py TestQuantizedOps.test_qlayer_norm
```
Numerics are nearly equivalent, with the only difference documented
in the test case. The difference is the same type as with quantized
layernorm. Making numerics equivalent is possible but will sacrifice
speed.
Imported from OSS
Differential Revision: D21107926
fbshipit-source-id: 80e87e9e2c71310bc28c3d114c88de428819cb45
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36815
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103267234
Test Plan:
buck run caffe2/test:quantization --
quantization.test_quantized.TestQuantizedOps.test_channel_shuffle
X86 implementation for QNNPACK is sse2 so this may not be the most efficient
for x86.
Reviewed By: dreiss
Differential Revision: D21093841
fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e
Summary:
Adds support for generating Vandermonde matrices based off of the Numpy implementation found [here](https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/twodim_base.py#L475-L563).
Adds test to ensure generated matrix matches expected Numpy implementation. Note test are only limited to torch.long and torch.double due to differences in now PyTorch and Numpy deal with type promotion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36725
Differential Revision: D21075138
Pulled By: jessebrizzi
fbshipit-source-id: 6bb1559e8247945714469b0e2b07c6f4d5fd1fd0
Summary:
Notes:
1. didn't name them as _copy_real and _copy_imag because it's desirable (but not necessary) to have these methods as tensor methods.
2. replaced old .real() and .imag() instances with _copy_real() and _copy_imag() methods
3. didn't add documentation because we plan to remove these methods when we add real and imag as tensor attributes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35879
Differential Revision: D20841760
Pulled By: anjali411
fbshipit-source-id: 7267e6fbaab9a5ce426e9396f12238994666b0dd
Summary:
Since the last one was apparently reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35530
Differential Revision: D20777341
Pulled By: ezyang
fbshipit-source-id: 6aaaf2a0755359074ae3d0efe32018d78dafe976