Commit Graph

2762 Commits

Author SHA1 Message Date
pritam
a81be44410 Fix shard_module to appropriately deal with sub process groups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79264

`shard_module` API didn't work correctly with a sub-pg since
`dist.scatter` actually takes the global rank as input for `src`.

Fixing this by passing in the appropriate rank to `dist.scatter`

Differential Revision: [D37062766](https://our.internmc.facebook.com/intern/diff/D37062766/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-06-12 03:50:45 +00:00
Mikayla Gawarecki
1ec30a6647 Add offsets-based reduction to segment_reduce (CPU, CUDA)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78907

Approved by: https://github.com/cpuhrsch
2022-06-11 17:43:42 +00:00
Michael Suo
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Michael Suo
f51d5233f2 [ci] fix GITHUB_ACTIONS env var checks
`GITHUB_ACTIONS` is set to `true`, but some of our code checks that it
is `1`. Make the checks more general.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79290

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
George Qi
164029f783 masked logasumexp/logaddexp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78291

Approved by: https://github.com/cpuhrsch
2022-06-11 05:46:36 +00:00
lezcano
54949a5abc Simplify and optimize linalg.solve
This PR heavily simplifies the code of `linalg.solve`. At the same time,
this implementation saves quite a few copies of the input data in some
cases (e.g. A is contiguous)

We also implement it in such a way that the derivative goes from
computing two LU decompositions and two LU solves to no LU
decompositions and one LU solves. It also avoids a number of unnecessary
copies the derivative was unnecessarily performing (at least the copy of
two matrices).

On top of this, we add a `left` kw-only arg that allows the user to
solve `XA = B` rather concisely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74046

Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry
2022-06-11 04:06:40 +00:00
Mikayla Gawarecki
e727539c29 Support multi-dimensional lengths in segment_reduce to support pytorch_scatter.segment_* functionalities (CUDA)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77061

Approved by: https://github.com/cpuhrsch
2022-06-11 01:45:22 +00:00
anjali411
38350acf8f Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322

Approved by: https://github.com/albanD
2022-06-11 00:29:32 +00:00
kshitij12345
5e656eaae5 [refs] ravel (#78421)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78421
Approved by: https://github.com/mruberry
2022-06-10 20:20:13 +00:00
kshitij12345
3d77017674 [primTorch] refs: masked_fill (#78132)
TODO

* [x] Add error inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78132
Approved by: https://github.com/mruberry
2022-06-10 20:19:48 +00:00
PyTorch MergeBot
b712467cd1 Revert "Add mutation checks for tensor inputs"
This reverts commit 83c0a2bc38.

Reverted https://github.com/pytorch/pytorch/pull/79078 on behalf of https://github.com/davidberard98 due to broke bazel build-and-test, see [https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true](https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true%22)
2022-06-10 20:15:30 +00:00
goldenxuett
83c0a2bc38 Add mutation checks for tensor inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79078

Approved by: https://github.com/davidberard98, https://github.com/Krovatkin
2022-06-10 18:17:33 +00:00
kshitij12345
adaecb2cbb [chalf] index_select: cpu support (#79217)
Fixes https://github.com/pytorch/pytorch/issues/79204

PR https://github.com/pytorch/pytorch/pull/78173 took care of adding CUDA support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79217
Approved by: https://github.com/mruberry
2022-06-10 14:06:32 +00:00
pritam
b9e3d722c4 Use appropriate dtype for sharded linear implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79255

We use several collective operations in our sharded linear
implementation and for many collectives, we do not set the `dtype` of the
output tensor appropriately. As a result, using a datatype like torch.float16
(which is not the default torch.float32) results in errors.

Fixing this across the board and adding appropriate tests.

Differential Revision: [D37059752](https://our.internmc.facebook.com/intern/diff/D37059752/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-06-10 07:32:15 +00:00
Kshiteej K
d837443a6f [fix] composite compliance: matrix_rank (#78968)
Ref: https://github.com/pytorch/pytorch/issues/69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78968
Approved by: https://github.com/zou3519
2022-06-10 05:41:19 +00:00
PyTorch MergeBot
fefff54cad Revert "Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"""
This reverts commit a2d2981e8e.

Reverted https://github.com/pytorch/pytorch/pull/79224 on behalf of https://github.com/suo due to broke lots of things a2d2981e8e
2022-06-10 04:40:43 +00:00
Horace He
a2d2981e8e Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads""
This reverts commit d67309aefb.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79224

Approved by: https://github.com/mruberry
2022-06-10 03:07:14 +00:00
PyTorch MergeBot
87a5ecced2 Revert "Support multi-dimensional lengths in segment_reduce to support pytorch_scatter.segment_* functionalities (CUDA)"
This reverts commit 40f7ef1f3d.

Reverted https://github.com/pytorch/pytorch/pull/77061 on behalf of https://github.com/janeyx99 due to Broke segment_reduce tests on trunk, e.g., 40f7ef1f3d
2022-06-10 01:57:34 +00:00
Mikayla Gawarecki
40f7ef1f3d Support multi-dimensional lengths in segment_reduce to support pytorch_scatter.segment_* functionalities (CUDA)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77061

Approved by: https://github.com/cpuhrsch
2022-06-10 00:49:37 +00:00
Joel Benjamin Schlosser
70d6446a3d Support both train / eval modes for ModuleInfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78735

Approved by: https://github.com/albanD
2022-06-09 20:57:17 +00:00
Olga Andreeva
b1ae519df9 Added functionality for post_local SGD (#78988)
Fixes #74556

Added functionality to save and restore step counter for model averager.
Added a unittest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78988
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-06-09 17:47:04 +00:00
lezcano
af6321f3d8 Port linalg_qr to structured
This PR simplifies the logic of `linalg.qr` using structured kernels. I
also took this chance and merged a few `copy_` operations with other
ops.

This PR removes a the previous magma implementation as is never faster
than that of cusolver and it's rather buggy. This has the side-effect
that now `qr` is not supported in Rocm. Ivan confirmed that this is
fine, given how incredibly slow was QR on Rocm anyway (we were marking
some tests as slow because of this...).

This PR also corrects the dispatch in geqrf. Before, if we called it
with a matrix for which `input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)` is false, and we have cublas but not cusolver, we would end up calling magma rather than cublas. This is not what the heuristic suggested.
Probaly we should benchmark these heuristics again, but that's beyond the scope of this PR.

Note. It looks like `torch.geqrf` maybe broken in MAGMA as per the
previous comment in `linalg_qr_helper_magma`. IvanYashchuk wdyt?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79054

Approved by: https://github.com/IvanYashchuk, https://github.com/ezyang
2022-06-09 14:41:30 +00:00
PyTorch MergeBot
d67309aefb Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"
This reverts commit 64b6bd8c1e.

Reverted https://github.com/pytorch/pytorch/pull/79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000
2022-06-09 13:11:23 +00:00
PyTorch MergeBot
3556457dd2 Revert "kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. (#79007)"
This reverts commit 72ad222cff.

Reverted https://github.com/pytorch/pytorch/pull/79007 on behalf of https://github.com/janeyx99 due to Broke test_fn_fwgrad_bwgrad_nn_functional_kl_div_cpu_float64 on trunk https://hud.pytorch.org/minihud?name_filter=pull%20/%20linux-xenial-py3.7-clang7-asan%20/%20test%20(default,%202,%205,%20linux.2xlarge)
2022-06-09 13:07:03 +00:00
Pearu Peterson
fb6749d977 Support CSC/BSR/BSC inputs to unary zero-preserving functions.
In addition, enable testing masked reductions in sparse compressed consistency check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78173

Approved by: https://github.com/cpuhrsch
2022-06-09 09:46:34 +00:00
Pearu Peterson
2a0e4322e6 Support ComplexHalf in nonzero and add of sparse_csr input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79062

Approved by: https://github.com/cpuhrsch
2022-06-09 09:46:33 +00:00
Nikita Vedeneev
72ad222cff kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. (#79007)
Fixes https://github.com/pytorch/pytorch/issues/78867,
fixes https://github.com/pytorch/pytorch/issues/65466.
Adds forward-over-reverse AD support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79007
Approved by: https://github.com/soulitzer, https://github.com/jbschlosser
2022-06-09 09:06:52 +00:00
Peter Bell
cd9e158007 Accept non-standard bools in more CUDA kernels
This fixes all remaining CUDA kernels, except those using `cub` or
`thrust`, to accept boolean tensors with values other than 1 or 0.

I do this by using `c10::load` in more places, and also adding a
`load_vector` helper into `MemoryAccess.cuh` that does the same thing
for vectorized loads.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78957

Approved by: https://github.com/mruberry
2022-06-09 08:31:28 +00:00
Horace He
64b6bd8c1e Added {logical_not, trace} refs, moved logical ops to use method overloads
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79000

Approved by: https://github.com/ezyang
2022-06-09 07:16:36 +00:00
PyTorch MergeBot
854c833f81 Revert "Support both train / eval modes for ModuleInfo"
This reverts commit 12658fcd5b.

Reverted https://github.com/pytorch/pytorch/pull/78735 on behalf of https://github.com/malfet due to Broke eval tests on Win, 10.2 and ROCM, see 12658fcd5b
2022-06-09 03:37:55 +00:00
Horace He
dc11a5642d Improved stack ref and added more decomposition annotations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78994

Approved by: https://github.com/mruberry
2022-06-09 03:20:28 +00:00
Joel Benjamin Schlosser
12658fcd5b Support both train / eval modes for ModuleInfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78735

Approved by: https://github.com/albanD
2022-06-08 23:20:17 +00:00
Kshiteej K
e85f3b58ab [fix] composite compliance: margin_ranking_loss, hinge_embedding_loss (#78935)
Ref: #69991

Cause of failure is similar to the one discussed for fixing forward_ad of `nn.functional.linear`: https://github.com/pytorch/pytorch/pull/77950#discussion_r878328822
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78935
Approved by: https://github.com/zou3519
2022-06-08 20:58:35 +00:00
Wei Wei
79ee0c0dd5 Swap fx2trt_oss to torch_tensorrt (#950) (#79115)
Summary:
X-link: https://github.com/pytorch/benchmark/pull/950

X-link: https://github.com/pytorch/fx2trt/pull/91

Reviewed By: yinghai

Differential Revision: D36958046

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79115
Approved by: https://github.com/yinghai, https://github.com/842974287, https://github.com/malfet
2022-06-08 19:02:43 +00:00
Khushi Agrawal
5b32c34450 [reland][complex32, jiterator] cos, sinh, cosh, tanh (#78718)
Ref: #78458
Follows: #74537 and #74748

cc @kshitij12345 @anjali411 :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78718
Approved by: https://github.com/anjali411, https://github.com/kshitij12345
2022-06-08 15:00:41 +00:00
Ivan Yashchuk
ff39e3493a Test torch._refs with aten and nvfuser executors (#78926)
This PR adds testing of references with "aten" and "nvfuser" executors using `torch._prims.executor.make_traced`.

Many tests are skipped even for "aten" executor because of https://github.com/pytorch/pytorch/issues/78923.

I limited the dtypes for the nvfuser executor tests because it's slow due to compilation overhead (it took about 30 mins in total). With `float32` and `int32` types nvfuser tests take 5 minutes.
```
58 passed, 2507 skipped, 28162 deselected, 79 xfailed, 5 warnings in 297.58s (0:04:57)
```
58 tests passed means that 29 references work correctly with nvfuser executor now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78926
Approved by: https://github.com/mruberry
2022-06-08 12:45:27 +00:00
Philip Meier
32593ef2dd move MPS compat into common comparison machinery (#77836)
Addresses https://github.com/pytorch/pytorch/issues/77144#issuecomment-1128168082.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77836
Approved by: https://github.com/albanD
2022-06-08 08:09:18 +00:00
soulitzer
99ffeff949 [forward ad] Sync conj for between primal and tangent on set forward grad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78358

Approved by: https://github.com/Lezcano, https://github.com/zou3519
2022-06-08 04:20:17 +00:00
lezcano
f7b9a46880 Deprecate torch.lu
**BC-breaking note**:

This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`.
A upgrade guide is added to the documentation for `torch.lu`.

Note this PR DOES NOT remove `torch.lu`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77636

Approved by: https://github.com/malfet
2022-06-07 22:50:14 +00:00
PyTorch MergeBot
c8a5f28fde Revert "Test torch._refs with aten and nvfuser executors (#78926)"
This reverts commit d4eebca7bc.

Reverted https://github.com/pytorch/pytorch/pull/78926 on behalf of https://github.com/malfet due to breaks rocms, see d4eebca7bc
2022-06-07 22:39:05 +00:00
lezcano
c7d6cec078 Add linalg.lu_solve
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.

We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.

We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.

Fixes https://github.com/pytorch/pytorch/issues/61657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77634

Approved by: https://github.com/malfet
2022-06-07 22:28:28 +00:00
Omkar Salpekar
a07f57d44b [fx2trt] support for new_ones, new_empty, as_strided, einsum (#79047)
Fix the internal<>OSS divergence caused by D36460857
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79047
Approved by: https://github.com/frank-wei
2022-06-07 21:32:31 +00:00
Ivan Yashchuk
d4eebca7bc Test torch._refs with aten and nvfuser executors (#78926)
This PR adds testing of references with "aten" and "nvfuser" executors using `torch._prims.executor.make_traced`.

Many tests are skipped even for "aten" executor because of https://github.com/pytorch/pytorch/issues/78923.

I limited the dtypes for the nvfuser executor tests because it's slow due to compilation overhead (it took about 30 mins in total). With `float32` and `int32` types nvfuser tests take 5 minutes.
```
58 passed, 2507 skipped, 28162 deselected, 79 xfailed, 5 warnings in 297.58s (0:04:57)
```
58 tests passed means that 29 references work correctly with nvfuser executor now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78926
Approved by: https://github.com/mruberry
2022-06-07 20:34:07 +00:00
Mikayla Gawarecki
814ff74460 Add prod reduce option to segment_reduce + opinfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76067

Approved by: https://github.com/cpuhrsch
2022-06-07 17:06:07 +00:00
Peter Bell
c936396af2 Always convert truthy booleans to 1
Ref #54789

A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.

This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122

Approved by: https://github.com/mruberry
2022-06-07 16:00:30 +00:00
Horace He
e675dbadc4 Ported gelu decomp to ref (#78697)
Ugh... these are actually so painful to write without operator overloading lol.

Decided to just utilize operator overloading, and xfail the ref tests for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78697
Approved by: https://github.com/mruberry
2022-06-06 22:30:20 +00:00
Edward Z. Yang
80f2c175be Follow up on CR for "Replace TensorMeta with FakeTensor"
See https://github.com/pytorch/pytorch/pull/78836

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78895

Approved by: https://github.com/albanD
2022-06-06 22:20:40 +00:00
Khushi Agrawal
e7b96ad078 [complex32] sqrt-rsqrt : cuda (#77490)
Follows #74537

cc @kshitij12345!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77490
Approved by: https://github.com/ngimel
2022-06-06 20:53:54 +00:00
Kshiteej K
c461d8a977 [primTorch] refs: hsplit, vsplit (#78418)
As per title

TODO:
* [x] Add error inputs (already exist)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78418
Approved by: https://github.com/mruberry
2022-06-06 19:54:05 +00:00
goldenxuett
1f53d036d2 Build a __torch_dispatch__ class that records torch operator names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78835

Approved by: https://github.com/Gamrix
2022-06-06 16:39:46 +00:00