This PR heavily simplifies the code of `linalg.solve`. At the same time,
this implementation saves quite a few copies of the input data in some
cases (e.g. A is contiguous)
We also implement it in such a way that the derivative goes from
computing two LU decompositions and two LU solves to no LU
decompositions and one LU solves. It also avoids a number of unnecessary
copies the derivative was unnecessarily performing (at least the copy of
two matrices).
On top of this, we add a `left` kw-only arg that allows the user to
solve `XA = B` rather concisely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74046
Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry
```Python
chebyshev_polynomial_v(input, n, *, out=None) -> Tensor
```
Chebyshev polynomial of the third kind $V_{n}(\text{input})$.
```Python
chebyshev_polynomial_w(input, n, *, out=None) -> Tensor
```
Chebyshev polynomial of the fourth kind $W_{n}(\text{input})$.
```Python
legendre_polynomial_p(input, n, *, out=None) -> Tensor
```
Legendre polynomial $P_{n}(\text{input})$.
```Python
shifted_chebyshev_polynomial_t(input, n, *, out=None) -> Tensor
```
Shifted Chebyshev polynomial of the first kind $T_{n}^{\ast}(\text{input})$.
```Python
shifted_chebyshev_polynomial_u(input, n, *, out=None) -> Tensor
```
Shifted Chebyshev polynomial of the second kind $U_{n}^{\ast}(\text{input})$.
```Python
shifted_chebyshev_polynomial_v(input, n, *, out=None) -> Tensor
```
Shifted Chebyshev polynomial of the third kind $V_{n}^{\ast}(\text{input})$.
```Python
shifted_chebyshev_polynomial_w(input, n, *, out=None) -> Tensor
```
Shifted Chebyshev polynomial of the fourth kind $W_{n}^{\ast}(\text{input})$.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78304
Approved by: https://github.com/mruberry
Adds:
```Python
bessel_j0(input, *, out=None) -> Tensor
```
Bessel function of the first kind of order $0$, $J_{0}(\text{input})$.
```Python
bessel_j1(input, *, out=None) -> Tensor
```
Bessel function of the first kind of order $1$, $J_{1}(\text{input})$.
```Python
bessel_j0(input, *, out=None) -> Tensor
```
Bessel function of the second kind of order $0$, $Y_{0}(\text{input})$.
```Python
bessel_j1(input, *, out=None) -> Tensor
```
Bessel function of the second kind of order $1$, $Y_{1}(\text{input})$.
```Python
modified_bessel_i0(input, *, out=None) -> Tensor
```
Modified Bessel function of the first kind of order $0$, $I_{0}(\text{input})$.
```Python
modified_bessel_i1(input, *, out=None) -> Tensor
```
Modified Bessel function of the first kind of order $1$, $I_{1}(\text{input})$.
```Python
modified_bessel_k0(input, *, out=None) -> Tensor
```
Modified Bessel function of the second kind of order $0$, $K_{0}(\text{input})$.
```Python
modified_bessel_k1(input, *, out=None) -> Tensor
```
Modified Bessel function of the second kind of order $1$, $K_{1}(\text{input})$.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78451
Approved by: https://github.com/mruberry
Adds:
```Python
chebyshev_polynomial_t(input, n, *, out=None) -> Tensor
```
Chebyshev polynomial of the first kind $T_{n}(\text{input})$.
If $n = 0$, $1$ is returned. If $n = 1$, $\text{input}$ is returned. If $n < 6$ or $|\text{input}| > 1$ the recursion:
$$T_{n + 1}(\text{input}) = 2 \times \text{input} \times T_{n}(\text{input}) - T_{n - 1}(\text{input})$$
is evaluated. Otherwise, the explicit trigonometric formula:
$$T_{n}(\text{input}) = \text{cos}(n \times \text{arccos}(x))$$
is evaluated.
## Derivatives
Recommended $k$-derivative formula with respect to $\text{input}$:
$$2^{-1 + k} \times n \times \Gamma(k) \times C_{-k + n}^{k}(\text{input})$$
where $C$ is the Gegenbauer polynomial.
Recommended $k$-derivative formula with respect to $\text{n}$:
$$\text{arccos}(\text{input})^{k} \times \text{cos}(\frac{k \times \pi}{2} + n \times \text{arccos}(\text{input})).$$
## Example
```Python
x = torch.linspace(-1, 1, 256)
matplotlib.pyplot.plot(x, torch.special.chebyshev_polynomial_t(x, 10))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78196
Approved by: https://github.com/mruberry
`nn.Transformer` is not possible to be used to implement BERT, while `nn.TransformerEncoder` does. So this PR moves the sentence 'Users can build the BERT model with corresponding parameters.' from `nn.Transformer` to `nn.TransformerEncoder`.
Fixes#68053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78337
Approved by: https://github.com/jbschlosser
Euler beta function:
```Python
torch.special.beta(input, other, *, out=None) → Tensor
```
`reentrant_gamma` and `reentrant_ln_gamma` implementations (using Stirling’s approximation) are provided. I started working on this before I realized we were missing a gamma implementation (despite providing incomplete gamma implementations). Uses the coefficients computed by Steve Moshier to replicate SciPy’s implementation. Likewise, it mimics SciPy’s behavior (instead of the behavior in Cephes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78031
Approved by: https://github.com/mruberry
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies
Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.
This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.
I really hope we don't continue adding more features.
This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833
Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
This functionality does not seem to be used
and there are some requests to update dependency.
Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
This PR adds a function for computing the LDL decomposition and a function that can solve systems of linear equations using this decomposition. The result of `torch.linalg.ldl_factor_ex` is in a compact form and it's required to use it only through `torch.linalg.ldl_solve`. In the future, we could provide `ldl_unpack` function that transforms the compact representation into explicit matrices.
Fixes https://github.com/pytorch/pytorch/issues/54847.
cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69828
Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/albanD
Closes#44459
This migrates the python implementation of `_pad_circular` to ATen and
removes the old C++ implementation that had diverged from python.
Note that `pad` can't actually use this until the
forward-compatibility period is over.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73410
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73378
1) ran check_for_c10_loops.py to automatically update all files (*.h, *.hpp, *.cpp) under fbcode/caffe2/torch (this is the path in the check_for_c10_loops.py, slightly different from the task description where the path mentioned was fbcode/caffe2. since current commit already contains 27 files, will use a separate commit for additional files).
2) manually reviewed each change, and reverted a few files:
(a) select_keys.cpp, bucketize_calibration.cpp, index_mmh and TCPStore.cpp: iterator modified in loop
(b) qlinear_4bit_ops.cpp and id_list_feature_merge_conversion.cpp: condition containing multiple expressions.
Test Plan:
Doing the following (still in progress, will address issues as they appear):
buck build ...
buck test ...
Reviewed By: r-barnes
Differential Revision: D34435473
fbshipit-source-id: b8d3c94768b02cf71ecb24bb58d29ee952f672c2
(cherry picked from commit fa9b0864f3761a501868fe0373204b12fdfc2b32)
Summary:
With this change, the optimizer is almost twice as fast as before. As the result of the first call is never used, it looks like a copy paste error and therefore can be removed. In addition, this duplicate call is not present in the Python implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72773
Reviewed By: samdow
Differential Revision: D34214312
Pulled By: albanD
fbshipit-source-id: 4f4de08633c7236f3ccce8a2a74e56500003281b
(cherry picked from commit 4a63f812ab)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693
Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596270
Pulled By: albanD
fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326
See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.
Test Plan: buck run //caffe2/torch/fb/sparsenn:test
Reviewed By: r-barnes
Differential Revision: D33243400
fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933
This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.
This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.
We add a test and an OpInfo for the new function.
This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.
Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D32834069
Pulled By: mruberry
fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421
I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.
Test Plan: Unit tests and CI.
Reviewed By: aaronenyeshi, albanD
Differential Revision: D32865907
fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/68095
This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585
Reviewed By: mrshenli
Differential Revision: D32958594
Pulled By: albanD
fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933
This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.
This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.
We add a test and an OpInfo for the new function.
This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.
Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D32521980
Pulled By: mruberry
fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568
This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.
This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.
This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.
Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.
We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.
Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32588230
Pulled By: mruberry
fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672
This PR adds `python_module: sparse` to `native_function.yaml`.
These functions would appear in `torch._C._sparse` namespace instead of
just `torch`.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D32517813
fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568
This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.
This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.
This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.
Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.
We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.
Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: zou3519, JacobSzwejbka
Differential Revision: D32283178
Pulled By: mruberry
fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803
* Addresses comments from #63589
[ONNX] remove torch::onnx::PRODUCER_VERSION (#67107)
Use constants from version.h instead.
This simplifies things since we no longer have to update
PRODUCER_VERSION for each release.
Also add TORCH_VERSION to version.h so that a string is available for
this purpose.
[ONNX] Set `ir_version` based on opset_version. (#67128)
This increases the odds that the exported ONNX model will be usable.
Before this change, we were setting the IR version to a value which may
be higher than what the model consumer supports.
Also some minor clean-up in the test code:
* Fix string replacement.
* Use a temporary file so as to not leave files around in the test
current working directory.
Test Plan: Imported from OSS
Reviewed By: msaroufim
Differential Revision: D32181306
Pulled By: malfet
fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663
Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Summary:
Summary : Inserted a check for the momentum and print "None" in case is not defined. See https://github.com/pytorch/pytorch/issues/65143
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67335
Test Plan:
The code below now prints `torch::nn::BatchNorm2d(128, eps=1e-05, momentum=None, affine=true, track_running_stats=true)` without generating errors.
```
torch::nn::BatchNorm2d m(torch::nn::BatchNormOptions(128).momentum(c10::nullopt));
std::cerr << *m << "\n";
```
Fixes https://github.com/pytorch/pytorch/issues/65143
Reviewed By: mruberry
Differential Revision: D32067820
Pulled By: ngimel
fbshipit-source-id: f40f9bbe090aa78e00f6c3a57deae393d946b88d
Summary:
The documentation of torch.nn.Upsample stated that `align_corners` only affects `linear`, `bilinear` and `trilinear`.
This PR updates the documentation for the Python `Upsample` module and the C++ `UpsampleOptions` struct to reflect that `bicubic` is also affected by `align_corners`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66756
Reviewed By: zou3519
Differential Revision: D31731148
Pulled By: jbschlosser
fbshipit-source-id: 3ec277fc3fbdf8414d0de327d8c57ba07342a5b9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612
This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.
Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.
To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.
I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D30728580
Pulled By: ezyang
fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995
JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod.
Test Plan: add case to imethod API test
Reviewed By: suo
Differential Revision: D30559401
fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345
This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
3. Generated torch::deploy examples when building torch_deploy library.
Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.*
Reviewed By: ngimel
Differential Revision: D30346257
Pulled By: alanwaketan
fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380
Crosses off some more of #62011, see the test in the stacked PR #63381
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere
Differential Revision: D30455843
Pulled By: driazati
fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755
Summary:
Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928
Reviewed By: driazati
Differential Revision: D30171904
Pulled By: malfet
fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521
This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
Test Plan:
./build/bin/test_api --gtest_filter=IMethodTest.*
To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command.
Reviewed By: ezyang
Differential Revision: D30055372
Pulled By: alanwaketan
fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60747
Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342
Reviewed By: malfet
Differential Revision: D30022592
Pulled By: jbschlosser
fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11959
Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes.
Implementation is dumb and simple right now, but future work can add higher performance kernels for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044
Reviewed By: zou3519
Differential Revision: D29876894
Pulled By: jbschlosser
fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60691
### Changes
Per the discussion in the above issue, this PR makes 2 changes:
1. When `error_if_nonfinite=False`, the NaN/Inf checks are truly skipped, and no device synchronization occurs.
- Additionally, when performing the checks, the 2 results are combined with `torch.logical_or` to incur only a single sync (instead of 2 in the happy/finite path).
2. The `clip_coef` conditional is removed, in favor of a call to `clamp(..., max=1.0)` and an unconditional multiplication.
### Testing
- The existing unit tests for `clip_grad_norm_` pass.
- I have manually profiled the example program from https://github.com/pytorch/pytorch/issues/60691, and verified that:
- No synchronizations occur when using `error_if_nonfinite=False`.
- A single synchronization occurs when using `error_if_nonfinite=True`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61042
Reviewed By: mrshenli
Differential Revision: D29764096
Pulled By: jbschlosser
fbshipit-source-id: db594b24608d16374b91bcbb9469046dfeeb152d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61856
This diff did the following few things:
1. It implemented IMethod::getArgumentNames() for all IMethod's subclasses.
2. It refactors PyTorchDeployPredictor to use IMethod for model executions.
Test Plan:
[... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor
[... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchPredictor
Reviewed By: wconstab
Differential Revision: D29648756
fbshipit-source-id: e047345f26ce495a5d74d8063f7f8edc32a1b13c
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
Summary:
Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy.
IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy.
Test Plan: add unit tests.
Reviewed By: suo
Differential Revision: D29463455
fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631
Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small.
PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around.
Test Plan: Imported from OSS
Reviewed By: jbschlosser, albanD
Differential Revision: D29356673
Pulled By: bhosmer
fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345
`zeta` was already present in the codebase to support computation of `polygamma`.
However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).
With this PR, float computations will take place in float and double in double.
Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`
**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.
Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623
Reviewed By: ngimel
Differential Revision: D29348269
Pulled By: mruberry
fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655
This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791
Reviewed By: gchanan
Differential Revision: D29242015
Pulled By: jbschlosser
fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59728
I noticed Sandcastle jobs failing with:
```
fbcode/caffe2/torch/csrc/api/include/torch/nn/modules/rnn.h:19:35: error: using namespace directive in global context in header [-Werror,-Wheader-hygiene]
using namespace torch::nn::utils::rnn;
```
(cf. V3 of D28939167 or https://www.internalfb.com/intern/sandcastle/job/36028797455955174/).
Removing `using namespace ...` fixes the problem.
~~... also applied code formatting ...~~
Test Plan: Sandcastle
Reviewed By: jbschlosser
Differential Revision: D29000888
fbshipit-source-id: 10917426828fc0c82b982da435ce891dc2bb6eec
Summary:
Switches most of the simple for loops outside of `jit` directories to use `c10::irange`.
Generated with D28874212.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D28909681
fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85
Summary:
This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment.
The main difference between the norms is when `dim=None`. In this case
- `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm
- `linalg.vector_norm` will flatten the input
- `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices
In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127
Reviewed By: mrshenli
Differential Revision: D28186736
Pulled By: mruberry
fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180
We have now a separate function for computing only the singular values.
`compute_uv` argument is not needed and it was decided in the
offline discussion to remove it. This is a BC-breaking change but our
linalg module is beta, therefore we can do it without a deprecation
notice.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28142163
Pulled By: mruberry
fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853
Summary:
As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215
Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None`
### BC Breaking Note
This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662
Reviewed By: albanD, mruberry
Differential Revision: D28228870
Pulled By: heitorschueroff
fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13
Summary:
In my last PR I've missed CUDA and distributed folders, fixing this now
This change is autogenerated by `python tool/clang_tidy.py -s`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235
Reviewed By: janeyx99
Differential Revision: D28084444
Pulled By: malfet
fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Match updated `Embedding` docs from https://github.com/pytorch/pytorch/pull/54026 as closely as possible. Additionally, update the C++ side `Embedding` docs, since those were missed in the previous PR.
There are 6 (!) places for docs:
1. Python module form in `sparse.py` - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s
2. Python `from_pretrained()` in `sparse.py` (refers back to module docs)
3. Python functional form in `functional.py`
4. C++ module options - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s
5. C++ `from_pretrained()` options
6. C++ functional options
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56065
Reviewed By: malfet
Differential Revision: D27908383
Pulled By: jbschlosser
fbshipit-source-id: c5891fed1c9d33b4b8cd63500a14c1a77d92cc78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56181
Need to change to size_t vs size_t:
Reviewed By: ezyang
Differential Revision: D27800849
fbshipit-source-id: 25f744128eb8750c382dc967a99af3c9f16247d9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647
This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B).
```bash
$ cat <<EOF > test.py
import torch
torch.utils.enable_minidump_collection()
# temporary util that just segfaults
torch._C._crash()
EOF
$ python test.py
Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp
fish: “python test.py” terminated by signal SIGSEGV (Address boundary error)
$ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp
$ gdb python core.dmp
... commence debugging ...
```
Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only
handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something).
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27679767
Pulled By: driazati
fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54812
Needed for quantization since different attribute might refer to the same module instance
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D27408376
fbshipit-source-id: cada85c4a1772d3dd9502c3f6f9a56d690d527e7
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345
Changes:
* Add `i0e`
* Move some kernels from `UnaryOpsKernel.cu` to `UnarySpecialOpsKernel.cu` to decrease compilation time per file.
Time taken by i0e_vs_scipy tests: around 6.33.s
<details>
<summary>Test Run Log</summary>
```
(pytorch-cuda-dev) kshiteej@qgpu1:~/Pytorch/pytorch_module_special$ pytest test/test_unary_ufuncs.py -k _i0e_vs
======================================================================= test session starts ========================================================================
platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
rootdir: /home/kshiteej/Pytorch/pytorch_module_special, configfile: pytest.ini
plugins: hypothesis-5.38.1
collected 8843 items / 8833 deselected / 10 selected
test/test_unary_ufuncs.py ...sss.... [100%]
========================================================================= warnings summary =========================================================================
../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73
test/test_unary_ufuncs.py::TestUnaryUfuncsCUDA::test_special_i0e_vs_scipy_cuda_bfloat16
/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/warnings.html
===================================================================== short test summary info ======================================================================
SKIPPED [3] test/test_unary_ufuncs.py:1182: not implemented: Could not run 'aten::_copy_from' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_copy_from' is only available for these backends: [BackendSelect, Named, InplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
InplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:56 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_4.cpp:9348 [kernel]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
==================================================== 7 passed, 3 skipped, 8833 deselected, 2 warnings in 6.33s =====================================================
```
</details>
TODO:
* [x] Check rendered docs (https://11743402-65600975-gh.circle-artifacts.com/0/docs/special.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54409
Reviewed By: jbschlosser
Differential Revision: D27760472
Pulled By: mruberry
fbshipit-source-id: bdfbcaa798b00c51dc9513c34626246c8fc10548
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.
This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.
Fixes https://github.com/pytorch/pytorch/issues/3194
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237
Reviewed By: walterddr, VitalyFedyunin
Differential Revision: D26948258
Pulled By: jbschlosser
fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345
Chages:
* Alias for sigmoid and logit
* Adds out variant for C++ API
* Updates docs to link back to `special` documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54759
Reviewed By: mrshenli
Differential Revision: D27615208
Pulled By: mruberry
fbshipit-source-id: 8bba908d1bea246e4aa9dbadb6951339af353556
Summary:
This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility.
MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000.
Unfortunately, there is no cuSOLVER function for this operation.
Autograd support for `torch.linalg.eig` will be added in a follow-up PR.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491
Reviewed By: anjali411
Differential Revision: D27563616
Pulled By: mruberry
fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5
Summary:
Converts loops of the form:
```
for(int64_t VAR=0;VAR<LIMIT;VAR++)
```
to the form
```
for(const auto VAR : c10::irange(LIMIT))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55148
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D27447811
fbshipit-source-id: 6311a094ec4a81a0b57383aaee0ba1b1dc2445c4
Summary:
Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169
Reviewed By: mruberry
Differential Revision: D27511150
Pulled By: jbschlosser
fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859
This reverts commit 92a4ee1cf6.
Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27402390
Pulled By: heitorschueroff
fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034
Summary:
**BC-breaking note**: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False`
Fixes https://github.com/pytorch/pytorch/issues/46849
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843
Reviewed By: malfet
Differential Revision: D27291838
Pulled By: jbschlosser
fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4
Summary:
* Lowering NLLLoss/CrossEntropyLoss to ATen dispatch
* This allows the MLC device to override these ops
* Reduce code duplication between the Python and C++ APIs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53789
Reviewed By: ailzhang
Differential Revision: D27345793
Pulled By: albanD
fbshipit-source-id: 99c0d617ed5e7ee8f27f7a495a25ab4158d9aad6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667
First part of #3867 (Pooling operators still to do)
This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input.
Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace.
A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D27170744
Pulled By: jbschlosser
fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22
Summary:
This PR adds autograd support for `torch.orgqr`.
Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it.
The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html.
Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation.
Resolves https://github.com/pytorch/pytorch/issues/50104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637
Reviewed By: agolynski
Differential Revision: D27114246
Pulled By: mruberry
fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583
`Scalar` takes 32 bytes due to `c10::complex<double>`
requires aligning to 16 bytes. Passing Scalar by reference
shows about 1% improvements on instruction count.
All the changes in this commit are codemoded except for
the following 4 files (which code-gen signatures):
```
tools/codegen/api/cpp.py
tools/codegen/api/native.py
tools/codegen/api/structured.py
caffe2/contrib/aten/gen_op.py
```
# Codemode
## Main Step
For the codemod part, here is the main command used:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
```
As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter).
In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference).
## Pre-Step
Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
```
## Fixup
There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list:
```
fastmod --extensions cpp 'const const Scalar' 'const Scalar'
fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod 'at::const Scalar&' 'const at::Scalar&'
```
## Supplementary
`cu` and `mm` files also need to be codemoded, for example:
```
fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&'
fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
```
Function pointers are not codemoded. Here is an incomplete list:
```
# Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
# Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}'
```
Some corner cases needs to be manually fixed.
ghstack-source-id: 123970306
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D26904445
fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.
The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.
The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093
Reviewed By: albanD
Differential Revision: D26991788
Pulled By: mruberry
fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50577
Learning rate schedulers had not yet been implemented for the C++ API.
This pull request introduces the learning rate scheduler base class and the StepLR subclass. Furthermore, it modifies the existing OptimizerOptions such that the learning rate scheduler can modify the learning rate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52268
Reviewed By: mrshenli
Differential Revision: D26818387
Pulled By: glaringlee
fbshipit-source-id: 2b28024a8ea7081947c77374d6d643fdaa7174c1
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.
The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.
The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093
Reviewed By: H-Huang
Differential Revision: D26723384
Pulled By: mruberry
fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713
Summary:
-Lower Relu6 to ATen
-Change Python and C++ to reflect change
-adds an entry in native_functions.yaml for that new function
-this is needed as we would like to intercept ReLU6 at a higher level with an XLA-approach codegen.
-Should pass functional C++ tests pass. But please let me know if more tests are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52723
Reviewed By: ailzhang
Differential Revision: D26641414
Pulled By: albanD
fbshipit-source-id: dacfc70a236c4313f95901524f5f021503f6a60f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807
Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html).
This function does not support broadcasting or batched inputs at the moment.
**NOTE**
numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction.
**TODO**
- [ ] Benchmark against NumPy
- [x] Add OpInfo testing
- [x] Remove unnecessary copy for out= argument
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision: D26375734
Pulled By: heitorschueroff
fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b
Summary:
Uses cmake's `configure_file()` macro to generate a new `torch/csrc/api/include/torch/version.h` header with `TORCH_VERSION_{MAJOR,MINOR,PATCH}` \#defines from an input file `torch/csrc/api/include/torch/version.h.in`.
For Bazel builds, this is accomplished with `header_template_rule()`.
For Buck builds, this is accomplished with `fb_native.genrule()`.
Fixes https://github.com/pytorch/pytorch/issues/44365
<img width="1229" alt="Screen Shot 2021-01-05 at 3 19 24 PM" src="https://user-images.githubusercontent.com/75754324/103809279-3fd80380-5027-11eb-9039-fd23922cebd5.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50073
Reviewed By: glaringlee
Differential Revision: D25855877
Pulled By: jbschlosser
fbshipit-source-id: 6bb792718c97e2c2dbaa74b7b7b831a4f6938e49
Summary:
This PR adds `torch.linalg.slogdet`.
Changes compared to the original torch.slogdet:
- Complex input now works as in NumPy
- Added out= variant (allocates temporary and makes a copy for now)
- Updated `slogdet_backward` to work with complex input
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194
Reviewed By: VitalyFedyunin
Differential Revision: D25916959
Pulled By: mruberry
fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48
Summary:
This PR adds `torch.linalg.pinv`.
Changes compared to the original `torch.pinverse`:
* New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition.
* `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values.
* Added `out=` variant (allocates temporary and makes a copy for now)
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399
Reviewed By: zhangguanheng66
Differential Revision: D25869572
Pulled By: mruberry
fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.
`linalg_inv_out` uses in-place operations on provided `result` tensor.
I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.
I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261
Reviewed By: gchanan
Differential Revision: D25849590
Pulled By: mruberry
fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651
Summary:
Adding `torch::cuda::synchronize()` to libtorch. Note that the implementation here adds a new method to the `CUDAHooksInterface`. An alternative that was suggested to me is to add a method to the `DeviceGuard` interface.
Fixes https://github.com/pytorch/pytorch/issues/47722
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50072
Reviewed By: H-Huang
Differential Revision: D25804342
Pulled By: jbschlosser
fbshipit-source-id: 45aa61d7c6fbfd3178caf2eb5ec053d6c01b5a43
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.
`linalg_inv_out` uses in-place operations on provided `result` tensor.
I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.
I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261
Reviewed By: ngimel
Differential Revision: D25690129
Pulled By: mruberry
fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712
Summary:
This PR adds `torch.linalg.solve`.
`linalg_solve_out` uses in-place operations on the provided result tensor.
I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.
In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456
Reviewed By: izdeby
Differential Revision: D25562222
Pulled By: mruberry
fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46213
I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should.
1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes.
2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that.
3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places.
4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that?
5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725
Reviewed By: zou3519
Differential Revision: D25449794
Pulled By: ngimel
fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c
Summary:
Ref https://github.com/pytorch/pytorch/issues/42175
This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default.
The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594
Reviewed By: heitorschueroff
Differential Revision: D25298929
Pulled By: mruberry
fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0
Summary:
This PR adds `torch.linalg.matrix_rank`.
Changes compared to the original `torch.matrix_rank`:
- input with the complex dtype is supported
- batched input is supported
- "symmetric" kwarg renamed to "hermitian"
Should I update the documentation for `torch.matrix_rank`?
For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do.
Ref https://github.com/pytorch/pytorch/issues/42666.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206
Reviewed By: albanD
Differential Revision: D25211965
Pulled By: mruberry
fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837
Summary:
This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility.
The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does).
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526
Reviewed By: gchanan
Differential Revision: D25022659
Pulled By: mruberry
fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a
Summary:
Fix https://github.com/pytorch/pytorch/issues/44601
I added bicubic grid sampler in both cpu and cuda side, but haven't in AVX2
There is a [colab notebook](https://colab.research.google.com/drive/1mIh6TLLj5WWM_NcmKDRvY5Gltbb781oU?usp=sharing) show some test results. The notebook use bilinear for test, since I could only use distributed version of pytorch in it. You could just download it and modify the `mode_torch=bicubic` to show the results.
There are some duplicate code about getting and setting values, since the helper function used in bilinear at first clip the coordinate beyond boundary, and then get or set the value. However, in bicubic, there are more points should be consider. I could refactor that part after making sure the overall calculation are correct.
Thanks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44780
Reviewed By: mrshenli
Differential Revision: D24681114
Pulled By: mruberry
fbshipit-source-id: d39c8715e2093a5a5906cb0ef040d62bde578567
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45164
This PR implements `fft2`, `ifft2`, `rfft2` and `irfft2`. These are the last functions required for `torch.fft` to match `numpy.fft`. If you look at either NumPy or SciPy you'll see that the 2-dimensional variants are identical to `*fftn` in every way, except for the default value of `axes`. In fact you can even use `fft2` to do general n-dimensional transforms.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D24363639
Pulled By: mruberry
fbshipit-source-id: 95191b51a0f0b8e8e301b2c20672ed4304d02a57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45377
This PR adds a C++ implementation of the TripletMarginWithDistanceLoss, for which the Python implementation was introduced in PR #43680. It's based on PR #44072, but I'm resubmitting this to unlink it from Phabricator.
Test Plan: Imported from OSS
Reviewed By: izdeby
Differential Revision: D24003973
fbshipit-source-id: 2d9ada7260a6f27425ff2fdbbf623dad0fb79405
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433
Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time
fixing some type errors, updated fn signature in a few more files
removing my usage of Scalar, making beta a double everywhere instead
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D23636720
Pulled By: bdhirsh
fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44550
Part of the `torch.fft` work (gh-42175).
This adds n-dimensional transforms: `fftn`, `ifftn`, `rfftn` and `irfftn`.
This is aiming for correctness first, with the implementation on top of the existing `_fft_with_size` restrictions. I plan to follow up later with a more efficient rewrite that makes `_fft_with_size` work with arbitrary numbers of dimensions.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D23846032
Pulled By: mruberry
fbshipit-source-id: e6950aa8be438ec5cb95fb10bd7b8bc9ffb7d824
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486
SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.
This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D23630699
Pulled By: gchanan
fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437
MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True.
This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the MSELoss CriterionTests to verify that the target derivative is checked.
TODO:
1) do we still need check_criterion_jacobian when we run grad/gradgrad checks?
2) ensure the Module tests check when target.requires_grad
3) do we actually test when reduction='none' and reduction='mean'?
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D23612166
Pulled By: gchanan
fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43732.
Requires importing the fft namespace in the C++ API, just like the Python API does, to avoid clobbering torch::fft the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43749
Reviewed By: glaringlee
Differential Revision: D23391544
Pulled By: mruberry
fbshipit-source-id: d477d0b6d9a689d5c154ad6c31213a7d96fdf271
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43341
This is to remove the empty pretty_print() since it overrides the impl within Module base which is not as designed here.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D23244616
Pulled By: glaringlee
fbshipit-source-id: 94b8dfd3697dfc450f53b3b4eee6e9c13cafba7b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43069
The transformer c++ impl need to put TransformerEncoderLayer/DecoderLayer and TransformerEncoder/TransformerDecoder in different header since TransformerEncoder/Decoder's options class need TransformerEncoderLayer/DecoderLayer as input parameter. Split header files to avoid cycle includsion.
Test Plan: Imported from OSS
Reviewed By: yf225
Differential Revision: D23139437
Pulled By: glaringlee
fbshipit-source-id: 3c752ed7702ba18a9742e4d47d049e62d2813de0
Summary:
This PR:
- updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT
- adds torch.linalg.det as an alias for torch.det
- moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy)
The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace.
The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA.
Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite.
The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name.
The guidance for adding an alias has been updated accordingly.
cc mattip
Note:
ngimel suggests:
- deprecating and then removing the `torch.ger` name
- reviewing the implementation of `torch.outer`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802
Reviewed By: zou3519
Differential Revision: D23059883
Pulled By: mruberry
fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1
Summary:
This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did.
Future PRs will likely:
- add more functions to torch.linalg
- expand the testing done in test_linalg.py, including legacy functions, like torch.ger
- deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664
Reviewed By: ngimel
Differential Revision: D22991019
Pulled By: mruberry
fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b
Summary:
This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function.
Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python:
```
import torch.fft
t = torch.randn(128, dtype=torch.cdouble)
torch.fft.fft(t)
```
See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911
Reviewed By: glaringlee
Differential Revision: D22941894
Pulled By: mruberry
fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42215
Specifically on https://github.com/pytorch/pytorch/pull/27477#discussion_r371402079
We would like to supported with include_last=True overall for other reduction types like mean and max. It now causes further code fragmentation in DPER (https://www.internalfb.com/intern/diff/D22794469/).
More details: https://www.internalfb.com/intern/diff/D22794469/?dest_fbid=309597093427021&transaction_id=631457624153457
ghstack-source-id: 108733009
Test Plan:
```
buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu"
```
```
(base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ TORCH_SHOW_CPP_STACKTRACES=1 buck test mode/dev-nosan //caffe2/test:
nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" --print-passing-details
Parsing buck files: finished in 1.2 sec
Building: finished in 5.5 sec (100%) 10130/10130 jobs, 2 updated
Total time: 6.7 sec
More details at https://www.internalfb.com/intern/buck/build/dbdc2063-69d8-45cb-9146-308a9e8505ef
First unknown argument: --print-passing-details.
Falling back to TestPilot classic.
Trace available for this run at /tmp/testpilot.20200728-195414.1422748.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par
Discovering tests
Running 1 test
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375
✓ caffe2/test:nn - test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) 0.162 1/1 (passed)
Test output:
> /data/users/jianyuhuang/fbsource/fbcode/buck-out/dev/gen/caffe2/test/nn#binary,link-tree/torch/_utils_internal.py:103: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen.
> threadSafeForkRegisterAtFork()
> /usr/local/fbcode/platform007/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__
and __path__
> return f(*args, **kwds)
> test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) ... Couldn't download test skip set, leaving all tests enabled...
> ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.162s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375
Summary (total time 5.54s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
Did _not_ run with tpx. See https://fburl.com/tpx for details.
```
Reviewed By: dzhulgakov
Differential Revision: D22801881
fbshipit-source-id: 80a624465727081bb9bf55c28419695a3d79c6e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42037
This is to fix#41951
Test Plan: Imported from OSS
Reviewed By: yf225
Differential Revision: D22764717
Pulled By: glaringlee
fbshipit-source-id: e6da0aeb05a2356f52446e6d5fad391f2cd1cf6f
Summary:
xref gh-38010 and gh-38011.
After this PR, there should be only two warnings:
```
pytorch/docs/source/index.rst:65: WARNING: toctree contains reference to nonexisting \
document 'torchvision/index'
WARNING: autodoc: failed to import class 'tensorboard.writer.SummaryWriter' from module \
'torch.utils'; the following exception was raised:
No module named 'tensorboard'
```
If tensorboard and torchvision are prerequisites to building docs, they should be added to the `requirements.txt`.
As for breaking up quantization into smaller pieces: I split out the list of supported operations and the list of modules to separate documents. I think this makes the page flow better, makes it much "lighter" in terms of page cost, and also removes some warnings since the same class names appear in multiple sub-modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41321
Reviewed By: ngimel
Differential Revision: D22753099
Pulled By: mruberry
fbshipit-source-id: d504787fcf1104a0b6e3d1c12747ec53450841da
Summary:
Update the API to access grad in cpp to avoid unexpected thread safety issues.
In particular, with the current API, a check like `t.grad().defined()` is not thread safe.
- This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is **not** thread safe.
- The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887
Reviewed By: ezyang
Differential Revision: D22343932
Pulled By: albanD
fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2
Summary:
BCELoss currently uses different broadcasting semantics than numpy. Since previous versions of PyTorch have thrown a warning in these cases telling the user that input sizes should match, and since the CUDA and CPU results differ when sizes do not match, it makes sense to upgrade the size mismatch warning to an error.
We can consider supporting numpy broadcasting semantics in BCELoss in the future if needed.
Closes https://github.com/pytorch/pytorch/issues/40023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41426
Reviewed By: zou3519
Differential Revision: D22540841
Pulled By: ezyang
fbshipit-source-id: 6c6d94c78fa0ae30ebe385d05a9e3501a42b3652
Summary:
This is a duplicate of https://github.com/pytorch/pytorch/pull/38362
"This PR completes Interpolate's deprecation process for recomputing the scales values, by updating the default value of the parameter recompute_scale_factor as planned for pytorch 1.6.0.
The warning message is also updated accordingly."
I'm recreating this PR as previous one is not being updated.
cc gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39453
Reviewed By: hl475
Differential Revision: D21955284
Pulled By: houseroad
fbshipit-source-id: 911585d39273a9f8de30d47e88f57562216968d8
Summary:
Remove `-std=c++14` flag from `utils.cmake`, since PyTorch C++ API can be invoked by any compiler compliant with C++14 standard or later
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40510
Differential Revision: D22253313
Pulled By: malfet
fbshipit-source-id: ff731525868b251c27928fc98b0724080ead9be2
Summary:
Slightly modified Adam, following the python implementation, and the `ProducesPyTorchValues` tests pass. I had a problem with another test though (see commit c1a6241676ab84fc531c1c3a10f964aa5704092e), it seems that optimizing for two steps with the same optimizer vs optimizing for two steps using freshly initialized objects will produce the same output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40009
Differential Revision: D22096053
Pulled By: glaringlee
fbshipit-source-id: a31a8f5488cb37c53752ddf15436efabdba67dc4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35614
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well.
Test Plan: CI
Differential Revision: D20842876
Pulled By: dreiss
fbshipit-source-id: 18abf0d324ed2185ec6d27c864e935d856dcc6ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922
New c++14 feature we can use now
ghstack-source-id: 103767403
Test Plan: waitforsandcastle
Differential Revision: D18869644
fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548
Moving RecordFunction from torch::autograd::profiler into at namespace
Test Plan:
CI
Imported from OSS
Differential Revision: D21315852
fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37704
If input tensor can not be chunked, run `parallel_apply` on fewer devices
Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices
Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs
Differential Revision: D21365416
fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8
Summary:
Allows creation of _NamedAnyModule_ directly from _AnyModule_, e.g.
```
auto a=torch::nn::AnyModule(torch::nn::Linear(1,2));
auto m=torch::nn::NamedAnyModule("fc", a);
```
Without the public constructor, it would be necessary to recast the AnyModule to underlying type,
then have the constructor cast it back to AnyModule.
With the public AnyModule constructor,
possible to do
```
auto q=Sequential({m});
```
or
```
q->push_back(m.name, m.module());
```
(works in conjunction with PR https://github.com/pytorch/pytorch/issues/36720 which allowed adding _AnyModule_ directly)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36869
Differential Revision: D21110074
Pulled By: yf225
fbshipit-source-id: aaea02282b9024824785e54d8732c0a12c850977
Summary:
It was called twice, but the result of the first invocation was not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36453
Differential Revision: D20993535
Pulled By: yf225
fbshipit-source-id: 4d85207a936b846866424903d7622905f3fddd36
Summary:
Modified messages in the check of default options for the Adam optimizer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36161
Differential Revision: D20920140
Pulled By: yf225
fbshipit-source-id: e697ef1741d4dd86f7f18dc0be2c3b4bd3894d8f
Summary:
This supersedes https://github.com/pytorch/pytorch/pull/35698.
`abs` is a C-style function that takes only integral argument
`std::abs` is polymorphic and can be applied to both integral and floating point types
This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974
Test Plan: CI
Reviewed By: pbelevich
Differential Revision: D20853570
Pulled By: yf225
fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4
Summary:
`abs` is a C-style function that takes only integral argument
`std::abs` is polymorphic and can be applied to both integral and floating point types
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35698
Test Plan: CI
Differential Revision: D20749588
Pulled By: malfet
fbshipit-source-id: b6640af67587650786366fe3907384bc8803069f
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD
**TODO**: add BC-breaking notes for this PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957
Test Plan: Imported from GitHub, without a `Test Plan:` line.
Differential Revision: D20678162
Pulled By: yf225
fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD
**TODO**: add BC-breaking notes for this PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957
Differential Revision: D20645945
Pulled By: yf225
fbshipit-source-id: 383588065bf1859b38f0ad0a25d93d41e153c96e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35163
This PR is BC-breaking in the following way:
Renaming:
- `torch::nn::functional::MultiLabelMarginLossFuncOptions` -> `torch::nn::functional::MultilabelMarginLossFuncOptions`
- `torch::nn::functional::MultiLabelSoftMarginLossFuncOptions` -> `torch::nn::functional::MultilabelSoftMarginLossFuncOptions`
Reason for renaming: to be consistent with the corresponding functional name after camel case to snake case conversion (e.g. the `multilabel_margin_loss` functional should use `MultilabelMarginLossFuncOptions` as options)
Test Plan: Imported from OSS
Differential Revision: D20582598
Pulled By: yf225
fbshipit-source-id: 0f5bdb8249d901b310875a14320449a2fdfa8ecd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35025
This PR fixes `F::interpolate` and `torch::nn::Upsample` implementation to match the Python API implementation.
**This PR is BC-breaking in the following way:**
There are changes to `UpsampleOptions` and `InterpolateFuncOptions`:
- `size` is changed from `std::vector<int64_t>` to `c10::optional<std::vector<int64_t>>`. If you want to pass a list of `int64_t` to this argument, you must pass it as `std::vector<int64_t>`.
- `scale_factor` is changed from `std::vector<double>` to `c10::optional<std::vector<double>>`. If you want to pass a list of `double` to this argument, you must pass it as `std::vector<double>`.
**TODO**: cherry-pick this PR into v1.5 release branch.
Test Plan: Imported from OSS
Differential Revision: D20559892
Pulled By: yf225
fbshipit-source-id: ac18609e351a9f2931eaeced8966b9491b2995f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35023
This PR fixes Conv and ConvTranspose implementation to match the Python API implementation.
**TODO**: cherry-pick this PR into v1.5 release branch.
Test Plan: Imported from OSS
Differential Revision: D20559889
Pulled By: yf225
fbshipit-source-id: 53783a7398ef968ec6d25b6f568fde44907417c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35022
This PR fixes `AdaptiveAvgPool{2,3}d` and `AdaptiveMaxPool{2,3}d` implementation to match the Python API implementation. Particularly, `output_size` is changed to accept `c10::nullopt` in its elements, matching the Python API behavior.
**TODO**: cherry-pick this PR into v1.5 release branch.
Test Plan: Imported from OSS
Differential Revision: D20559890
Pulled By: yf225
fbshipit-source-id: ccddbd278dd39165cf1dda11fc0e49387c76dbef