Commit Graph

275 Commits

Author SHA1 Message Date
CaoE
9399e0b1ff add fp16 support for gemm (#99498)
### Testing

Native matmul vs. mkldnn matmul  on SPR (with avx512_fp16 support)

single core:

Input | Naïve impl   / ms | oneDNN /   ms | Speed up
-- | -- | -- | --
M: 128, N: 128, K: 128, trans_a: False, trans_b: False | 2010.387 | 64.700 | 31.072
M: 128, N: 256, K: 128, trans_a: False, trans_b: False | 4027.116 | 107.780 | 37.364
M: 8192, N: 768, K: 768, trans_a: False, trans_b: False | 28685868.488 | 90663.008 | 316.401

56 cores:
Input | Naïve impl   / ms | oneDNN /   ms | Speed up
-- | -- | -- | --
M: 128, N: 128, K: 128, trans_a: False, trans_b: False | 5.091 | 0.24 | 211.30
M: 128, N: 128, K: 128, trans_a: False, trans_b: True | 5.224 | 0.23 | 220.09
M: 128, N: 256, K: 128, trans_a: False, trans_b: False | 10.006 | 0.30 | 330.31
M: 8192, N: 768, K: 768, trans_a: False, trans_b: False | 29435.372 | 1.770 | 1662.80
M: 8192, N: 768, K: 768, trans_a: False, trans_b: True | 31464.961 | 1.728 |  18204.76
M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False | 115035.849  | 7.990 | 14396.90
M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True | 122981.023 |  7.725 | 15918.34
Batch: 768, M: 128, N: 64, K: 128  | 2032.523 | 0.705 | 2882.23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99498
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-09-28 01:03:50 +00:00
Edward Z. Yang
869226bf94 Avoid passing generator to parametrize (#110104)
Fixes

```
ValueError: <function TestMeta.test_layer_norm_backward at 0x7f555f56e440>: An empty arg_values was passed to @parametrize. Note that this may result from reuse of a generator.
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110104
Approved by: https://github.com/malfet, https://github.com/jbschlosser, https://github.com/voznesenskym
2023-09-27 02:52:48 +00:00
Mwiza Kunda
5c4b5baf21 Fix python decomps for OpOverloadPackets and add tests (#107707)
- Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments)

- Add out parameter wrappers to python decomps for aten ops that have out overloads

CC. @ezyang @albanD @lezcano

Fixes #107713

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707
Approved by: https://github.com/lezcano
2023-09-25 20:53:30 +00:00
Mwiza Kunda
8dedc9dd9b Add meta tests for layer/group/batch norm backward (#109591)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591
Approved by: https://github.com/ezyang
2023-09-21 18:58:51 +00:00
CaoE
54c28c564f add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima
2023-09-19 10:43:33 +00:00
Jez Ng
7f3885137f Add meta function for _segment_reduce (#109359)
This fixes numerous tests which were xfailing. For instance, the
`_segment_reduce.lengths` OpInfo test, which was previously relying on
the fallback kernel to determine the shape of the meta tensor. The
fallback kernel would fail with

    segment_reduce(): Expected all rows of lengths along axis to sum to data.size(lengths.dim()-1) when !unsafe.

as it was trying to read the values of a meta tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109359
Approved by: https://github.com/ezyang
2023-09-16 13:31:03 +00:00
PyTorch MergeBot
b226373d16 Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit b6a1d3fb97.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main b6a1d3fb97 https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))
2023-09-14 16:13:34 +00:00
CaoE
b6a1d3fb97 add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-14 12:23:59 +00:00
PyTorch MergeBot
04a765f95d Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit 6065e7a97c.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` 6065e7a97c https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))
2023-09-13 22:38:42 +00:00
CaoE
6065e7a97c add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-13 17:30:16 +00:00
Huamin Li
9fa5283401 [dynamo+aten] Enable embedding_bag_byte_unpack + meta kernel impl (#107937)
Summary:
```
torch._dynamo.exc.Unsupported: unsupported operator: quantized.embedding_bag_byte_unpack.default
```

Differential Revision: D48652953

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107937
Approved by: https://github.com/houseroad
2023-08-26 08:52:42 +00:00
Jordan Fix
df16b1ed53 [dynamo+aten] Enable embedding_bag_byte_rowwise_offsets + meta kernel impl (#106105)
Differential Revision: D47007550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106105
Approved by: https://github.com/gmagogsfm
2023-08-21 16:33:42 +00:00
Sam Larsen
e165938853 Implement decomposition for aten.rrelu_with_noise (#106812)
Test Plan:
* Primarily, added new test in test/test_decomp.py
* Updated existing tests, e.g., to NOT expect failure

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812
Approved by: https://github.com/eellison
2023-08-11 19:18:29 +00:00
Nikita Karetnikov
12041d8e1f Use default dispatch table for tensordot.out (#106669)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106669
Approved by: https://github.com/ezyang
2023-08-07 00:58:17 +00:00
Nikita Karetnikov
05e1a50723 [pt2] remove meta skips for aminmax, decomp exists (#106670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106670
Approved by: https://github.com/ezyang
2023-08-07 00:55:25 +00:00
Nikita Karetnikov
19621a73c0 [pt2] add metas for grid_sampler_3d ops (#106261)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106261
Approved by: https://github.com/ezyang
2023-08-05 14:48:11 +00:00
Kshiteej K
a899333ffc fix: nll_loss batch rule with negative ignore_idx (#106118)
We use python decompositions instead of writing our own for batching rules.

Fixes https://github.com/pytorch/pytorch/issues/105736

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106118
Approved by: https://github.com/lezcano, https://github.com/zou3519
2023-08-04 07:43:02 +00:00
Nikita Karetnikov
1f734e03df [pt2] add metas for mode ops (#106273)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106273
Approved by: https://github.com/ezyang
ghstack dependencies: #106272
2023-08-03 13:11:10 +00:00
Nikita Karetnikov
70469e6f04 [pt2] add metas for median ops (#106272)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106272
Approved by: https://github.com/ezyang
2023-08-03 13:11:10 +00:00
Nikita Karetnikov
f23d755e1f [pt2] add meta for ormqr (#106278)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106278
Approved by: https://github.com/ezyang
2023-08-01 06:47:48 +00:00
Nikita Karetnikov
0ee3b84021 [pt2] add meta for cholesky_inverse (#106120)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106120
Approved by: https://github.com/ezyang
2023-07-29 17:16:20 +00:00
Nikita Karetnikov
80755884be [pt2] add meta for cholesky (#106115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106115
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-07-29 17:16:20 +00:00
Nikita Karetnikov
a4cffaae67 [pt2] add metas for _cholesky_solve_helper and cholesky_solve (#105867)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105867
Approved by: https://github.com/ezyang
2023-07-25 20:21:47 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Nikita Karetnikov
c00dd43e43 [pt2] add metas for multilabel_margin_loss ops (#104388)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104388
Approved by: https://github.com/ezyang
2023-07-05 13:42:22 +00:00
Nikita Karetnikov
a3aa4da154 [pt2] add metas for multi_margin_loss ops (#104236)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104236
Approved by: https://github.com/ezyang
2023-07-05 13:40:05 +00:00
Nikita Karetnikov
b1c31b1d26 [pt2] metas and SymInt support for max_pool ops (#103951)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103951
Approved by: https://github.com/Chillee, https://github.com/kulinseth
2023-07-01 01:33:35 +00:00
Nikita Karetnikov
c4a6f86062 [pt2] add metas for max_unpool2d and max_unpool3d (#103821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103821
Approved by: https://github.com/Skylion007, https://github.com/Chillee
2023-07-01 01:33:35 +00:00
Nikita Karetnikov
e9705c52ac [pt2] add metas for _pdist_forward and _pdist_backward (#103817)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103817
Approved by: https://github.com/ezyang
2023-06-22 11:18:05 +00:00
Aleksandar Samardžić
09fdea8564 Fix autograd issue with identity conversions (#92022)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92022
Approved by: https://github.com/pearu, https://github.com/mtaaooby, https://github.com/amjames, https://github.com/cpuhrsch
2023-06-21 21:23:03 +00:00
Nikita Karetnikov
2b3d955ffd [pt2] add meta and SymInt support for linalg_matrix_exp (#102945)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102945
Approved by: https://github.com/lezcano
2023-06-09 22:45:16 +00:00
Edward Z. Yang
96fd283640 Preserve CreationMeta when metafying views. (#103152)
This helps us avoid erroring / generate more accurate error messages
in Dynamo when doing mutations on views.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103152
Approved by: https://github.com/albanD
2023-06-09 12:34:54 +00:00
Nikita Karetnikov
757791d1e3 [pt2] add SymInt support for linalg.vander (#102469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102469
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-06-04 09:58:02 +00:00
PyTorch MergeBot
463df86ce8 Revert "[pt2] add SymInt support for linalg.vander (#102469)"
This reverts commit 05717895aa.

Reverted https://github.com/pytorch/pytorch/pull/102469 on behalf of https://github.com/clee2000 due to broke test_aotdispatch on linux ex 05717895aa https://github.com/pytorch/pytorch/actions/runs/5125654882/jobs/9219389448, shows up as green on pr due to bug with keep-going flag and reruns ([comment](https://github.com/pytorch/pytorch/pull/102469#issuecomment-1569041604))
2023-05-30 20:24:26 +00:00
Nikita Karetnikov
05717895aa [pt2] add SymInt support for linalg.vander (#102469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102469
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-05-30 19:50:16 +00:00
Nikita Karetnikov
995ac703cd [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-27 11:10:47 +00:00
PyTorch MergeBot
da3aba1e46 Revert "[pt2] add SymInt support for linalg.pinv (#102367)"
This reverts commit 0d5b74da0c.

Reverted https://github.com/pytorch/pytorch/pull/102367 on behalf of https://github.com/kit1980 due to Broke slow tests https://github.com/pytorch/pytorch/actions/runs/5095190248/jobs/9160028124 ([comment](https://github.com/pytorch/pytorch/pull/102367#issuecomment-1565104562))
2023-05-27 00:33:42 +00:00
Nikita Karetnikov
0d5b74da0c [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-26 15:20:34 +00:00
Nikita Karetnikov
e79d9b9938 [pt2] add SymInt support for linalg.matrix_power (#101940)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101940
Approved by: https://github.com/lezcano, https://github.com/ezyang
2023-05-24 00:21:52 +00:00
Nikita Karetnikov
42b974e8f7 [pt2] add meta for linalg_lu_solve (#101836)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836
Approved by: https://github.com/lezcano
2023-05-24 00:21:50 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Angela Yi
72a73ef67b Add aten.searchsorted.Tensor meta kernel (#101637)
Test Plan: CI

Differential Revision: D45933187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637
Approved by: https://github.com/ezyang
2023-05-18 06:55:11 +00:00
kshitij12345
afea1a9fe9 [meta] error checking for inplace ops (#101532)
Fixes #100753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101532
Approved by: https://github.com/lezcano
2023-05-16 17:26:59 +00:00
Nikita Karetnikov
a8964d6377 [pt2] add meta and SymInt support for linalg_householder_product (#101315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Sun, Jiayi
d56e1b2f67 add Half support for unary ops on CPU (#98493)
Add Half support for log_sigmoid and some unary ops on CPU, including sinc, acosh, asinh, atanh, digamma, trigamma, rsqrt, acos, asin, atan, ceil, cos, erf, erfc, erfinv, exp, expml, floor, log, log10, log1p, log2, i0, round, sin, sqrt, tan, tanh, trunc, lgamma.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98493
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel
2023-05-12 04:52:34 +00:00
Khushi
51fe53e619 [opinfo] item (#100313)
Follows #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313
Approved by: https://github.com/ezyang
2023-05-10 11:32:45 +00:00
Nikita Karetnikov
1e591a8b64 [pt2] add meta function for solve_triangular (#100829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829
Approved by: https://github.com/ezyang
2023-05-08 13:48:15 +00:00
Nikita Karetnikov
e87ed2a88d [primTorch] add ref for polar (#100345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100345
Approved by: https://github.com/ezyang
2023-05-04 01:37:02 +00:00
Richard Zou
4135295a76 Excise yaml dependency in torchgen.model (#100203)
The problem:
- The new CustomOp API depends on torchgen.model
- torchgen.model imports `yaml`
- `yaml` is not a PyTorch runtime dependency

To unblock myself, because I'm not sure how long it'll take to
convince people yaml should be a PyTorch runtime dependency
(unless one of you wants to approve #100166), this PR removes the
yaml dependency from torchgen.model.

It does so by splitting torchgen.utils (the offender) into
torchgen.utils (no yaml) and torchgen.yaml (which uses yaml).

Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100203
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2023-04-28 13:45:39 +00:00
Jiong Gong
e5c9a0fcf5 [dynamo] avoid graph break on repeat_interleave.self_int (#99528)
Address convit_base failure: https://github.com/pytorch/torchdynamo/issues/1886 mentioned in https://github.com/pytorch/pytorch/issues/93777
Also for models like EleutherAI/gpt-j-6B.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99528
Approved by: https://github.com/ezyang
2023-04-25 04:47:39 +00:00
Peter Bell
7b91bd2a7b [primTorch] Add count_nonzero (#98995)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98995
Approved by: https://github.com/lezcano
2023-04-13 22:08:19 +00:00
Nikita Karetnikov
8db04e080c [pt2] add SymInt support for cdist (#98881)
Fixes #98853.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98881
Approved by: https://github.com/ezyang
2023-04-12 23:06:40 +00:00
Nikita Karetnikov
ff825de442 [primTorch] add ref for cumprod (#98670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98670
Approved by: https://github.com/ezyang
2023-04-09 15:22:28 +00:00
Nikita Karetnikov
b411238d76 [pt2] add meta function for logcumsumexp (#98683)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683
Approved by: https://github.com/ezyang
2023-04-09 01:26:37 +00:00
Nikita Karetnikov
1c226f5aad [pt2] add meta functions for cummax and cummin (#98552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552
Approved by: https://github.com/Chillee
2023-04-07 17:58:28 +00:00
albanD
0210481dcb Fix _like meta registrations (#98160)
The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!).
zeros_like is special due to sparse and is fixed directly by always filling it with zeros.
Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions.
I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal.

test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160
Approved by: https://github.com/ezyang
2023-04-06 18:44:34 +00:00
Nikita Karetnikov
7b25976323 [pt2] add meta function for take (#98451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451
Approved by: https://github.com/ezyang
2023-04-06 14:48:35 +00:00
Wonjoo Lee
3095c95828 Fixes for PyTorch/XLA functionalization integration (#94537)
Fixes for PyTorch/XLA functionalization integration

---
Some notable changes include:
- More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
- Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them
- Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
- Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
- Update MetaConverter to exclude XLA tensors in raising NotImplemented…
- Add `_propagate_xla_data` op
- Add meta tensor support for some ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537
Approved by: https://github.com/bdhirsh
2023-03-02 23:02:34 +00:00
Khushi
a0389681c2 [complex] nansum & nanmean (#93199)
Follows: #71472

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93199
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kshitij12345
2023-02-16 06:13:42 +00:00
Zheng Yan
753c33bf86 Enable half type support for unique cpu (#91666)
Test Plan: CI

Differential Revision: D42326527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91666
Approved by: https://github.com/jgong5, https://github.com/ngimel
2023-02-16 04:59:35 +00:00
haozhe.zhu
ed54a5d06b enable bf16 emb (#94163)
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5
2023-02-12 00:05:09 +00:00
Brian Hirsh
948cd61afc add fallthrough kernel for AutogradMeta key (#94603)
The other `Autograd[Backend]` keys all have fallthrough kernels registered to them, but `AutogradMeta` was missing the fallthrough kernel.

This is a problem for custom ops that don't have autograd support, if you try to run them with meta tensors. If you have a custom op, and register a CPU and a Meta kernel, then:

(1) if you run the op with cpu tensors, it will dispatch straight to the CPU kernel (as expected)

(2) if you run the op with meta tensors, you will error - because we don't have a fallthrough registered to the AutogradMeta key, we will try to dispatch to the AutogradMeta key and error, since the op author hasn't provided an autograd implementation.

Here's a repro that I confirmed now works:

```
import torch
from torch._dispatch.python import enable_python_dispatcher
from torch._subclasses.fake_tensor import FakeTensorMode

lib = torch.library.Library("test", "DEF")
impl_cpu = torch.library.Library("test", "IMPL", "CPU")
impl_meta = torch.library.Library("test", "IMPL", "Meta")

def foo_impl(x):
    return x + 1

lib.define("foo(Tensor a) -> Tensor")
impl_meta.impl("foo", foo_impl)
impl_cpu.impl("foo", foo_impl)

with enable_python_dispatcher():
    a = torch.ones(2, device='meta')
    print("@@@@@")
    b = torch.ops.test.foo.default(a)
    print(b)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94603
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 22:44:52 +00:00
Aaron Gokaslan
748bac8757 [BE]: Apply pyupgrade yield from and unit test alias upgrades (#94309)
Applies some more harmless pyupgrades. This one gets rid of deprecated aliases in unit_tests and more upgrades yield for loops into yield from generators which are more performance and propagates more information / exceptions from original generator. This is the modern recommended way of forwarding generators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94309
Approved by: https://github.com/albanD
2023-02-07 20:08:58 +00:00
PyTorch MergeBot
53e4fe076a Revert "enable bf16 emb (#94163)"
This reverts commit f3bf46e801.

Reverted https://github.com/pytorch/pytorch/pull/94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk.  For example, 05397b1250
2023-02-07 00:32:22 +00:00
albanD
496c0a207b Make segment_reduce properly private. (#93166)
I am attempting not to change the aten function to reduce the amount of BC issues on the torchscript side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93166
Approved by: https://github.com/ngimel
2023-02-06 18:32:23 +00:00
haozhe.zhu
f3bf46e801 enable bf16 emb (#94163)
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5
2023-02-06 07:11:40 +00:00
Michael Suo
4e4293f15f Add meta registration for bucketize (#93893)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893
Approved by: https://github.com/zhxchen17
2023-02-02 21:03:08 +00:00
Ivan Yashchuk
fba13d94a1 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

- [x] XLA PR: https://github.com/pytorch/xla/pull/4498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet
2023-01-31 11:59:11 +00:00
mfkasim1
75cfc0be21 Logcumsumexp for CPU (#93153)
Partial work from #90847, in the direction of solving #89205.
Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot.

tag: @albanD, @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93153
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-01-27 22:29:33 +00:00
PyTorch MergeBot
9b23fd378f Revert "Logcumsumexp for complex in CPU and CUDA (#90847)"
This reverts commit 64985123e4.

Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
PyTorch MergeBot
acdd462b1a Revert "Remove deprecated torch.symeig (#70988)"
This reverts commit d70ed68162.

Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful
2023-01-24 19:03:40 +00:00
Ivan Yashchuk
d70ed68162 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980
2023-01-23 22:51:40 +00:00
mfkasim1
64985123e4 Logcumsumexp for complex in CPU and CUDA (#90847)
Another PR towards solving #89205.
What's in this PR:

* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers

What's missing:

* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
lezcano
138a0188e0 Add support for logaddexp(float16) in CUDA and implement its reference (#91869)
The reference is implemented so that it generates efficient and
numerically stable triton code.

Fixes https://github.com/pytorch/pytorch/issues/91683

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91869
Approved by: https://github.com/ngimel
2023-01-10 00:19:24 +00:00
Peter Bell
ad7aefb608 Fix Meta tests for FFT functions (#91628)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91628
Approved by: https://github.com/kit1980
2023-01-05 00:58:26 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Natalia Gimelshein
bc93454e4a correctly set strides for expanded/unsqueezed dimensions (#90341)
Fixes https://github.com/pytorch/torchdynamo/issues/1959, #90260
However, I wasn't able to make existing stride tests fail before the fix, even though I'm comparing all, not just significant strides.
Separately running refs on meta tensors produces wrong strides as shown in #90260, however, it looks like in meta tests some other way of computing meta info is used (I've been running
```
pytest -s -v test/test_meta.py -k test_meta_outplace_expand_cuda_float64
```
and verified that it has sample input that should fail, and that it indeed compares all the strides, but the produced `meta_rs` results somehow still had correct strides).

Edit: @SherlockNoMad helped me figure out how to fail the tests, and now I've set the correct ops for checking. `expand` fails for some test inputs because it special-cases 0-dim input case, correctly modeling it in prims would require a lot of changes, so skipping that for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90341
Approved by: https://github.com/SherlockNoMad
2022-12-07 23:38:33 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Yanbo Liang
e1532af0bb Fix meta registration for aten._cdist_forward (#90042)
Error from [7k github model](https://github.com/pytorch/torchdynamo/issues/1884).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90042
Approved by: https://github.com/ezyang, https://github.com/eellison
2022-12-02 21:13:52 +00:00
Jane Xu
8695f0cced Rectify native_batch_norm schema by splitting it into two legit schemas (#88697)
Using the same repro from the issue (but with BatchNorm2D)

Rectifies native_batch_norm schema by splitting the schema into 2:
1. one will have NON-optional alias-able running_mean and running_var inputs
2. the other will just not have those parameters at all (no_stats variation)

**Calling for name suggestions!**

## test plan
I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
CI should pass.

## next steps
Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
Approved by: https://github.com/albanD
2022-11-23 23:23:17 +00:00
Driss Guessous
1d9e1fca97 Update sdp dispatch logic to enable fused backward (#89154)
# Summary
Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154
Approved by: https://github.com/cpuhrsch
2022-11-21 20:02:09 +00:00
PyTorch MergeBot
e1d58b1928 Revert "Update sdp dispatch logic to enable fused backward (#89154)"
This reverts commit 2e72ec7982.

Reverted https://github.com/pytorch/pytorch/pull/89154 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but the new test_sdp_math_gradcheck test breaks periodic slow gradcheck, i.e. 419ef2cdcf
2022-11-20 22:14:38 +00:00
Driss Guessous
2e72ec7982 Update sdp dispatch logic to enable fused backward (#89154)
# Summary
Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154
Approved by: https://github.com/cpuhrsch
2022-11-19 02:06:27 +00:00
lezcano
154e58c032 Add most in-place references/decompositions (#88117)
We add most in-place references in a generic way. We also implement a
wrapper to implement the annoying interface that `nn.functional`
nonlinearities have.

We fix along the way a couple decompositions for some non-linearities by
extending the arguments that the references have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117
Approved by: https://github.com/mruberry
2022-11-18 14:59:46 +00:00
Nikita Karetnikov
4270bb37da [primTorch] Improve narrow and narrow_copy: refs, tests, docs (#87045)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87045
Approved by: https://github.com/mruberry
2022-11-12 15:03:50 +00:00
PyTorch MergeBot
93d3bd626e Revert "[primTorch] Improve narrow and narrow_copy: refs, tests, docs (#87045)"
This reverts commit aa8279bcb8.

Reverted https://github.com/pytorch/pytorch/pull/87045 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41161182
2022-11-09 20:48:32 +00:00
Nikita Karetnikov
aa8279bcb8 [primTorch] Improve narrow and narrow_copy: refs, tests, docs (#87045)
Fixes #87019.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87045
Approved by: https://github.com/mruberry
2022-11-09 09:19:28 +00:00
Sherlock Huang
46730aec35 [Reland] Fix primTorch compute_elementwise_output_strides (#88525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88525
Approved by: https://github.com/desertfire
2022-11-05 05:42:07 +00:00
PyTorch MergeBot
2b117c8436 Revert "Fix primTorch compute_elementwise_output_strides (#88175)"
This reverts commit 1c8a0656d6.

Reverted https://github.com/pytorch/pytorch/pull/88175 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks cuda 11.6 in trunk. As the PR signal was green, this is probably a landrace
2022-11-03 16:53:04 +00:00
Sherlock Huang
1c8a0656d6 Fix primTorch compute_elementwise_output_strides (#88175)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88175
Approved by: https://github.com/ngimel
2022-11-03 08:38:55 +00:00
soulitzer
4c20c0509d Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216)
Fixes: https://github.com/pytorch/pytorch/issues/88010

This PR does a couple things to stop slow gradcheck from timing out:
- Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?)
- Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck
- because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.)
- Updates references to test_ops_gradients and TestGradients
- Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216
Approved by: https://github.com/albanD
2022-11-03 00:20:45 +00:00
Sherlock Huang
c00c34fb69 Fix meta for aten.upsample_bilinear2d.vec (#88158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158
Approved by: https://github.com/ngimel
2022-11-02 16:58:29 +00:00
lezcano
39d9d2ed70 Implement reference for lerp (#87424)
We follow the vectorised CPU implementation for numerical accuracy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87424
Approved by: https://github.com/ezyang
2022-11-02 11:21:01 +00:00
Sherlock Huang
de1f641f11 Fix meta function for aten.addmm (#88068)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068
Approved by: https://github.com/albanD
2022-11-01 17:05:48 +00:00
Sherlock Huang
c368c0faf0 Fix meta for aten.fill, constant_pad_nd, _adaptive_avg_pool2d (#88069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88069
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-11-01 15:36:06 +00:00
Sherlock Huang
c7ac333430 Fix args for meta__fused_moving_avg_obs_fq_helper (#88058)
Fixes https://github.com/pytorch/torchdynamo/issues/1802

There are a few problems,
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
**UPDATE: This is expected behavior, see discussion below for explanation.**

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor.
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang  who reported this bug
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88058
Approved by: https://github.com/bdhirsh, https://github.com/yanboliang
2022-10-31 19:00:16 +00:00
Edward Z. Yang
ff94494644 Revert "Revert "Unify meta tensor and fake tensor converter conversion (#87943)"" (#88045)
This reverts commit bc64999b83.

Check torch/_subclasses/meta_utils.py for "This is very tricky" for the bugfix explanation.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88045
Approved by: https://github.com/kit1980, https://github.com/Chillee
2022-10-31 17:50:14 +00:00
Sherlock Huang
0a4ca9d083 Fix meta for aten.angle and aten.index_copy (#88066)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88066
Approved by: https://github.com/albanD
2022-10-31 17:11:29 +00:00
Sherlock Huang
5723fd503c Fix meta function for aten.flip and aten.rot90 (#88065)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88065
Approved by: https://github.com/mruberry
2022-10-31 16:52:05 +00:00
PyTorch MergeBot
bc64999b83 Revert "Unify meta tensor and fake tensor converter conversion (#87943)"
This reverts commit baa715e790.

Reverted https://github.com/pytorch/pytorch/pull/87943 on behalf of https://github.com/kit1980 due to Broke several inductor tests
2022-10-29 18:39:28 +00:00