Commit Graph

617 Commits

Author SHA1 Message Date
PyTorch MergeBot
340ec1f460 Revert "Meff Attn Bias (#104310)"
This reverts commit 5453508115.

Reverted https://github.com/pytorch/pytorch/pull/104310 on behalf of https://github.com/DanilBaibak due to PR introduced cuda OOM issue ([comment](https://github.com/pytorch/pytorch/pull/104310#issuecomment-1650171538))
2023-07-25 16:37:32 +00:00
Jane Xu
5fec1f93dc Add meta registration for foreach_maximum_.List (#105864)
Will fix issues compiling for when amsgrad is True for Adam(W), see related failures in https://github.com/pytorch/benchmark/actions/runs/5628705163/job/15252867793

Also did some refactoring where common registrations could be deduplicated.

Test plan:
python test/inductor/test_compiled_optimizers.py -k test_adam

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105864
Approved by: https://github.com/albanD, https://github.com/mlazos
2023-07-25 00:39:13 +00:00
drisspg
5453508115 Meff Attn Bias (#104310)
# Summary

### Review Points
- Automatically pad tensors to create aligned masks when seqlen_kv is not multiple of 16. This will cause memory spike ~ 2 * attn_mask size which could in theory be big.  At appears though that doing this + mem_eff is faster than no_pad + math. SO seems to be worth it
- Using expand to view the attn_mask in 4d. This is a little different to how we enforce q,k,v to be viewed in 4d prior to calling. Also not supprint b*n_heads, seq_lenq, seq_lenkv case.
- Should enable, #96099

### Profiling
I ran a bunch of comparisons between sdpa.MATH and sdp.MemEffAttention.  I added a attn_bias of shape (1, 1, seqlen_q, seqln_k). For these experiments seqlen_q == seqlen_k. These were all ran on an a100 80gb gpu.
Configs:
```
    # Run a bunch of experiments
    batch_sizes = [8, 16, 32]
    num_heads = [16, 32]
    max_seq_lens = [15, 64, 128, 512, 555, 1024]
    embed_dims = [32, 64, 128]
    dtypes = [torch.float16, torch.bfloat16, torch.float32]
    pad_percentages = [None]
    backends = [SDPBackend.EFFICIENT_ATTENTION, SDPBackend.MATH]
    run_backward = True
    attn_mask = True
```

   The function calls `sdpa(input**).sum().backward()`.

   I calculated the geomean speedup of the efficient attention path of the math path for all these configs:
   `Geomean Speedup: 1.977`

An example comparision with batchsize = 8, num_heads = 32, embed_dim = 64, and dtype = torch.float16:
![attn_mask_compare_bsz_8_num_heads_32_embed_dim_64_dtype_fp16](https://github.com/pytorch/pytorch/assets/32754868/0d75bffe-350b-43f2-a37f-514f9158dcff)

 This was done using the current state of the branch where we force alignment of mask when the last dim is not divisible by 16, which shows up in seq_len = 15 and 555 case.

The full data can be found here:

[attn_mask_sweep.csv](https://github.com/pytorch/pytorch/files/11962399/attn_mask_sweep.csv)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104310
Approved by: https://github.com/cpuhrsch
2023-07-24 22:19:26 +00:00
Nikita Karetnikov
45e4706aff [pt2] add decomps for multilabel_margin_loss_forward ops (#105302)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Karetnikov
7e72126487 [pt2] add decomps for multi_margin_loss ops (#104578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-07-14 21:16:09 +00:00
Nikita Karetnikov
0a6888243b multi_margin_loss: check weight shape, make contiguous on CPU, add tests (#104852)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104852
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
Nikita Karetnikov
de67b52a88 Unify multi_margin_loss_shape_check on CPU and CUDA (#104851)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104851
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
b4d91b1c5b Revert "[Typing] Fix PEP 484 Violation (#105022)"
This reverts commit 4148b7bada.

Reverted https://github.com/pytorch/pytorch/pull/105022 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/105022#issuecomment-1635967734))
2023-07-14 14:45:09 +00:00
Brian Hirsh
c6b9c31a2c [inductor] fix incorrect strides in copy() decomp, fix hf_LongFormer + hf_BigBird errors (#100115)
Fixes https://github.com/pytorch/pytorch/issues/100067, https://github.com/pytorch/pytorch/issues/98268 and https://github.com/pytorch/pytorch/issues/93428.

See the comment [here](https://github.com/pytorch/pytorch/issues/100067#issuecomment-1523856970) for details. The bug was that the decomposition that inductor uses for `aten.copy` doesn't respect the strides of the input in all cases. The fixes that I added should work, but will be pretty slow - we allocate a tensor (potentially larger than `self` if `self` is a slice), and perform an `as_strided_scatter` + `as_strided`. Longer term, stride-agnostic IR should let us remove this decomp?  cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @anijain2305 @soumith @desertfire

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100115
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-07-13 14:40:57 +00:00
Michael Lazos
b99d605a30 Add meta registration for foreach_mul_ (#105107)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105107
Approved by: https://github.com/Chillee, https://github.com/voznesenskym
2023-07-13 04:45:22 +00:00
Nikita Shulga
4148b7bada [Typing] Fix PEP 484 Violation (#105022)
Not sure, how it worked before, but if arguments must be annotated is optional if they are defaulted to None

Towards enabling mypy-1.4.1 in lintrunner

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 5e1b9f4</samp>

> _We annotate the arguments of doom_
> _To show the `None` values of gloom_
> _We improve the type checking and readability_
> _With `Optional` annotations of metal-ity_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105022
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn, https://github.com/Skylion007
2023-07-12 10:20:48 +00:00
Michael Lazos
9861c4a3f8 Add lerp decomps + meta registrations (#104866)
as title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104866
Approved by: https://github.com/janeyx99
2023-07-10 22:07:57 +00:00
Jane Xu
e25f5732c8 Add meta registrations and distributed decomps: _foreach_div_.Scalar, sqrt_.default (#104779)
This PR unblocks #104780 by resolving spmd tracing test issues and by adding meta registrations for foreach inplace ops (div_ and sqrt_)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104779
Approved by: https://github.com/fegin, https://github.com/albanD
2023-07-10 17:38:46 +00:00
Nikita Karetnikov
c00dd43e43 [pt2] add metas for multilabel_margin_loss ops (#104388)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104388
Approved by: https://github.com/ezyang
2023-07-05 13:42:22 +00:00
Nikita Karetnikov
a3aa4da154 [pt2] add metas for multi_margin_loss ops (#104236)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104236
Approved by: https://github.com/ezyang
2023-07-05 13:40:05 +00:00
Nikita Karetnikov
ad58aba932 [pt2] add metas for adaptive_max_pool ops (#104167)
Fixes #103892.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104167
Approved by: https://github.com/ezyang
2023-07-05 07:02:07 +00:00
Nikita Karetnikov
b1c31b1d26 [pt2] metas and SymInt support for max_pool ops (#103951)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103951
Approved by: https://github.com/Chillee, https://github.com/kulinseth
2023-07-01 01:33:35 +00:00
Nikita Karetnikov
c4a6f86062 [pt2] add metas for max_unpool2d and max_unpool3d (#103821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103821
Approved by: https://github.com/Skylion007, https://github.com/Chillee
2023-07-01 01:33:35 +00:00
Yanbo Liang
77642da3b8 Fix broken meta registration for torch.full (#104451)
Fixes #104117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104451
Approved by: https://github.com/eellison
2023-06-30 05:14:52 +00:00
Driss Guessous
4a008d268a REDO of dropout support for mem eff #102038 (#103704)
THIS IS A new PR with the changes from #102038 + #103201 +  plus namespacing changes to fix bug.

# Summary
This PR builds off of:
- https://github.com/pytorch/pytorch/pull/101847
- https://github.com/pytorch/pytorch/pull/100583

It specifically adds dropout support to the memory efficient attention kernel. In the process of doing so roughly 3 changes were made:
- Update sdpa dispatching to allow for inputs requiring grad to be sent to efficient attention
- Update how memory efficient attention handles passing the rng state from forward to backward in order to enable cuda_graph support
- Fix a bug in the kernel that was causing incorrect gradients to be produced for num_keys > 64 with dropout and causal masking set. https://github.com/facebookresearch/xformers/pull/755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103704
Approved by: https://github.com/cpuhrsch
2023-06-26 23:05:03 +00:00
xuanqi
344bab2669 [RFC]: Functionalize assertions (#103757)
The idea here is to create do a graph mutation to:
* Create an initial dependency token at the beginning of the program.
* Replace non-functional version of assertion statements to functional version.
* The functional version of assertion statement will:
  * Accept a dependency token from output of previous functional assertion statement (or the initial dependency token if there isn't any).
  * Generate a dependency token as the output of assertion statement.
  * Augment the output to include the dependency token generated by last assertion statement.

The goal here is to:
* Form an explicit dependency chain and avoid potential reordering during other passes of compiling.
* Make the assertions a part of overall execution graph will affect the final output (or it could potentially be DCEed).

**NOTE:**
* Currently only cover `contrain_range` and WIP to support other assertions. Send out this PR to collect feedback first.
* Here it only focus on implementation itself. Will integrate it with current export in future PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103757
Approved by: https://github.com/avikchaudhuri
2023-06-24 00:23:35 +00:00
Nikita Karetnikov
e9705c52ac [pt2] add metas for _pdist_forward and _pdist_backward (#103817)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103817
Approved by: https://github.com/ezyang
2023-06-22 11:18:05 +00:00
Nikita Karetnikov
e48851033a [pt2] add metas for pad ops (#103815)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103815
Approved by: https://github.com/ezyang
2023-06-22 11:18:05 +00:00
Nikita Karetnikov
c40fa8b614 [inductor] remove fft and svd ops from fake_incorrect_kernels (#103616)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103616
Approved by: https://github.com/eellison
2023-06-22 03:01:43 +00:00
Kurt Mohler
ee83c646bb Replace _prims_common.check with torch._check* (#103240)
This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check*` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240
Approved by: https://github.com/albanD
2023-06-21 00:46:17 +00:00
xuanqi
b27c3558a4 [RFC]: Create aten native op for constrain_range (#103346)
At high current implementation of constrains functions (constrain_as_**) will raise exception for the following code snippets:
```
def f(x):
    a = x.item()
    constrain_as_size(a, 4, 7)
    return torch.empty((a, 4))

inp = torch.tensor([5])
ep = torch._export.export(f, (inp,))
```

The reason is because current constrain logic is:
1) Purely python so it won't survive AOT export (the full node is gone after AOT export since AOT export only maintains aten level op).
2) Utilize side effect to add range constraints for traced symbol's shape env ([code](9591e52880/torch/fx/experimental/symbolic_shapes.py (L370-L372))).
3) If runtime assertion is turned on (by default). [`_AddRuntimeAssertionsForConstraintsPass`](9591e52880/torch/_export/passes/add_runtime_assertions_for_constraints_pass.py (L98-L100)) will try to append assertion node based on range constrains extracted from shape env of symbol during another interpretation round.
4). However, since 1), in the round of AOT export, range constraints logic won't run for symbols generated during this round. And later there is no range constrains information available for assertion round and caused issue.
5) As a result of above, it will failure at `torch.empty((a, 4))` (there is no constrains for `a` that it must be positive).

The fix here is just to implement range constrain logic as a native aten op (CPU implementation as no-op) to make it be able to survive AOT export.

**NOTE:**
[Logic](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (L350-L365C15)) within [`constrain_range`](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (LL313C74-L313C74)) is split out as `constrain_range_int` to capture case when non `SymInt` is passed in and reused in the new `_constrain_range`. The reason is when non `SymInt` is provided:
* If it directly calls `sym_constrain_range`, the C++ version will be called which will be no-op.
* So in this case it calls `constrain_range_int` instead to be able to capture issue like user provides a input whose tensor's shape could be out of range during exporting, like the following for above code example:
```
...
inp = torch.tensor([10])
ep = torch._export.export(f, (inp,)) # immediately raise error
```

Differential Revision: [D46734204](https://our.internmc.facebook.com/intern/diff/D46734204)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103346
Approved by: https://github.com/tugsbayasgalan
2023-06-16 14:55:40 +00:00
Driss Guessous
155691a7d9 Implement meta functions for rshift and lshift (#103637)
Fixes #103606

Was using this script to exercise new code, cause I can never remember which test it is.
```
import torch

@torch.compile(fullgraph=True, dynamic=True)
def shift_right(tensor: torch.Tensor) -> torch.Tensor:
    return (tensor >> 2).to(torch.long)

def main():
    sample_input = torch.tensor([4, 4, 16, 32], dtype=torch.uint8)
    print(shift_right(sample_input))

if __name__ == "__main__":
    main()
```
And iterated through the error messages

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103637
Approved by: https://github.com/ezyang
2023-06-15 21:49:22 +00:00
Michael Lazos
00546333a5 Register more foreach op lowerings (#102654)
Adds the necessary foreach op lowerings for Adam

Adds two decomps for addcdiv and addcmul (need to verify that type promotion works correctly here)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102654
Approved by: https://github.com/jansel
2023-06-15 02:52:17 +00:00
PyTorch MergeBot
6ff6b49039 Revert "Register more foreach op lowerings (#102654)"
This reverts commit 05c01b9bfc.

Reverted https://github.com/pytorch/pytorch/pull/102654 on behalf of https://github.com/ZainRizvi due to This is breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/102654#issuecomment-1591639478))
2023-06-14 16:49:30 +00:00
Nikita Karetnikov
4a76fb49f3 [pt2] add metas for avg_pool3d and avg_pool3d_backward (#103392)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103392
Approved by: https://github.com/ezyang
2023-06-13 21:23:46 +00:00
Michael Lazos
05c01b9bfc Register more foreach op lowerings (#102654)
Adds the necessary foreach op lowerings for Adam

Adds two decomps for addcdiv and addcmul (need to verify that type promotion works correctly here)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102654
Approved by: https://github.com/jansel
2023-06-13 17:30:03 +00:00
Yinghai Lu
4c3799447f
Back out "Dropout support for memory efficient attention (#102038)" & "Two small mem_eff bug fixes (#103201)" (#103464)
Summary:
Original commit changeset: 04c4473d8510

Original Phabricator Diff: D46584152 & D46582033

Test Plan: Already explained in summary.

Reviewed By: yinghai

Differential Revision: D46633283

fbshipit-source-id: c23c2945408988f3c4339dfd5cd40ae46261716c

Co-authored-by: Shenxiu Liu <shenxiu@meta.com>
2023-06-12 18:56:48 -07:00
Bearnardd
2abad0c184 Add dtype check baddbmm (#102659)
Fixes part of the #100838 related to disabling support for non matching dtypes for input/batches for `baddbmm` operator.

* [x] added dtype checks
* [x] added test case

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102659
Approved by: https://github.com/ngimel
2023-06-13 00:31:06 +00:00
Nikita Shulga
4cfa06f706 [BE] Deprecate has_XYZ attributes (#103279)
Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods

Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`.

Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue.

Fixes https://github.com/pytorch/pytorch/issues/102484

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-06-10 05:17:17 +00:00
Nikita Karetnikov
2b3d955ffd [pt2] add meta and SymInt support for linalg_matrix_exp (#102945)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102945
Approved by: https://github.com/lezcano
2023-06-09 22:45:16 +00:00
Nikita Karetnikov
3a0f37735c [pt2] bug fix: invert condition in checkFloatingOrComplex (#102944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102944
Approved by: https://github.com/lezcano
2023-06-09 22:45:16 +00:00
Driss Guessous
606fb882c4 Dropout support for memory efficient attention (#102038)
# Summary
This PR builds off of:
- https://github.com/pytorch/pytorch/pull/101847
- https://github.com/pytorch/pytorch/pull/100583

It specifically adds dropout support to the memory efficient attention kernel. In the process of doing so roughly 3 changes were made:
- Update sdpa dispatching to allow for inputs requiring grad to be sent to efficient attention
- Update how memory efficient attention handles passing the rng state from forward to backward in order to enable cuda_graph support
- Fix a bug in the kernel that was causing incorrect gradients to be produced for num_keys > 64 with dropout and causal masking set. https://github.com/facebookresearch/xformers/pull/755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102038
Approved by: https://github.com/cpuhrsch
2023-06-08 21:50:12 +00:00
Yanbo Liang
686d7e4c48 [Inductor] Fix x.view(dtype) decomp and make inductor support it (#102920)
Fixes #99804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102920
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-06-07 17:10:54 +00:00
Ivan Zaitsev
821493715c Back out "Remove check from _prims_common, replace with torch._check* (#102219)", Back out "Forwatd fix for D46427687" (#103128)
Test Plan: revertitparrot

Reviewed By: malfet

Differential Revision: D46506433

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128
Approved by: https://github.com/malfet
2023-06-07 01:41:41 +00:00
Nikita Karetnikov
ec0aa965da [pt2] add meta for _linalg_solve_ex (#102454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102454
Approved by: https://github.com/lezcano
2023-06-06 08:06:55 +00:00
Nikita Karetnikov
4bda4a7e4d [pt2] add meta for lu_unpack (#102937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102937
Approved by: https://github.com/lezcano
2023-06-06 08:06:53 +00:00
Nikita Karetnikov
6ac3352a37 [pt2] add meta for _linalg_slogdet (#102464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102464
Approved by: https://github.com/ezyang
2023-06-05 03:17:08 +00:00
Kurt Mohler
a84bb2709a Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-03 02:23:21 +00:00
Shunting Zhang
86c7652503 [inductor] layout optimization for conv (#99773)
convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much.

Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16)
- TB: 1.64x -> 1.69x
- HF: 1.79x -> 1.78x (random noise)
- TIMM: 1.51x -> 1.65x

Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773
Approved by: https://github.com/jansel
2023-06-02 21:08:18 +00:00
PyTorch MergeBot
a7efa0ce35 Revert "Remove check from _prims_common, replace with torch._check* (#102219)"
This reverts commit fb79d43649.

Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))
2023-06-02 20:00:48 +00:00
Kurt Mohler
fb79d43649 Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-02 19:13:45 +00:00
Nikita Karetnikov
0f1621df1a [pt2] fix typos in checkFloatingOrComplex errors (#102456)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102456
Approved by: https://github.com/lezcano
2023-05-30 11:18:50 +00:00
Nikita Karetnikov
c3ea8cc58b [pt2] convert out params in register_meta (#101344)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344
Approved by: https://github.com/lezcano
2023-05-27 18:38:52 +00:00
Michael Lazos
69c7f710ba Add meta registrations for some foreach ops (#102225)
as title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102225
Approved by: https://github.com/ngimel
2023-05-25 02:59:11 +00:00
Peter Bell
ce42010722 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-24 22:17:32 +00:00
Nikita Karetnikov
42b974e8f7 [pt2] add meta for linalg_lu_solve (#101836)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836
Approved by: https://github.com/lezcano
2023-05-24 00:21:50 +00:00
PyTorch MergeBot
5147fe4969 Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)"
This reverts commit b9721bd705.

Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))
2023-05-23 23:06:21 +00:00
Peter Bell
b9721bd705 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-22 20:39:18 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Angela Yi
72a73ef67b Add aten.searchsorted.Tensor meta kernel (#101637)
Test Plan: CI

Differential Revision: D45933187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637
Approved by: https://github.com/ezyang
2023-05-18 06:55:11 +00:00
Peter Bell
66e398951a [inductor/decomp] Add aten._unsafe_index to disable range checks (#101602)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101602
Approved by: https://github.com/lezcano, https://github.com/ngimel
2023-05-17 23:36:24 +00:00
Nikita Karetnikov
42e65a2587 [pt2] add meta for linalg_lu_factor_ex (#101375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101375
Approved by: https://github.com/lezcano
2023-05-16 20:56:54 +00:00
kshitij12345
afea1a9fe9 [meta] error checking for inplace ops (#101532)
Fixes #100753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101532
Approved by: https://github.com/lezcano
2023-05-16 17:26:59 +00:00
Nikita Karetnikov
9eb1748b2b [pt2] add meta and SymInt support for linalg_lu (#101372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
ac4cc63ae2 [pt2] add meta for linalg_ldl_solve (#101367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101367
Approved by: https://github.com/lezcano
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
7dd8e08817 [pt2] add meta for linalg_ldl_factor_ex (#101362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101362
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Nikita Karetnikov
a8964d6377 [pt2] add meta and SymInt support for linalg_householder_product (#101315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Natalia Gimelshein
15a51e2012 simplify sdpa backward meta registration (#101128)
Per title.

there's an off chance that query_reshaped etc was actually discontiguous after reshape, but even in that case I'm pretty sure the computed gradients would still be contiguous, and we are properly transposing output gradients to produce correct strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101128
Approved by: https://github.com/drisspg
2023-05-11 03:30:07 +00:00
Nikita Karetnikov
c0d33f66c9 [pt2] remove unused meta_linalg_eigh (#100965)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100965
Approved by: https://github.com/ezyang
2023-05-10 15:45:36 +00:00
Nikita Karetnikov
6abde61f8e [pt2] add meta function for _linalg_eigh (#100964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100964
Approved by: https://github.com/ezyang
2023-05-10 15:45:15 +00:00
Natalia Gimelshein
bfe5f5bbe1 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-08 16:19:18 +00:00
Nikita Karetnikov
1e591a8b64 [pt2] add meta function for solve_triangular (#100829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829
Approved by: https://github.com/ezyang
2023-05-08 13:48:15 +00:00
Nikita Karetnikov
266c84e3ab [pt2] add meta function for linalg_qr (#100714)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100714
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-06 15:04:02 +00:00
Nikita Karetnikov
37f1be041a [pt2] enable svd in fake_tensor (#100130)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-05 06:27:59 +00:00
Michael Voznesensky
fe3ecfe0cf Add AotAutogradFallbackTests to dynamic suite (#100454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454
Approved by: https://github.com/ezyang
2023-05-04 04:28:45 +00:00
PyTorch MergeBot
c3aa59c8f5 Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196)"
This reverts commit 32615618e4.

Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Justin Chu
e779a30d50 [BE] Fix SIM109 compare-with-tuple (#100337)
Use {replacement} instead of multiple equality comparisons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100337
Approved by: https://github.com/Skylion007
2023-04-30 19:51:32 +00:00
Tugsbayasgalan Manlaibaatar
d4bf76c2a4 Persist torch.assert in aten graph (#100101)
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101
Approved by: https://github.com/jansel
2023-04-28 07:31:43 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Xiaodong Wang
cc01568efd [pt2] Register meta func to randperm.default (#99593)
Summary:
Looks we're missing the meta func for randperm.default. I get complaints like this when I compile randperm with dynamic shape which I think is because it gets into the real implementation but not the meta func.

```
RuntimeError: expected int but got s0
Exception raised from expect_int at fbcode/caffe2/c10/core/SymInt.h:128 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 5  at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const
# 6  at::_ops::randperm::redispatch(c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 7  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)
# 8  c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), &at::(anonymous namespace)::randperm>, at::Tensor, c10::guts::typelist::typelist<c10::SymInt, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*)

```

Differential Revision: D45137851

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99593
Approved by: https://github.com/ezyang
2023-04-25 08:55:43 +00:00
Wanchao Liang
ca24a96216 minor fix to fused adam meta registration (#99436)
This PR fixes the registration by adding `max_exp_avg_sqs` to the
output shape list too, and fix some type check issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99436
Approved by: https://github.com/mrshenli
2023-04-24 22:50:02 +00:00
Edward Z. Yang
10c938abef Handle meta['val'] for tuple of lists. (#99724)
Fixes https://github.com/pytorch/pytorch/issues/99356

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724
Approved by: https://github.com/wanchaol
2023-04-21 22:33:21 +00:00
Rodrigo Kumpera
38e964056b Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-18 15:15:46 +00:00
PyTorch MergeBot
1c042a2137 Revert "Reland python ops (#99170)"
This reverts commit d4de64ae8d.

Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 11:37:43 +00:00
Rodrigo Kumpera
d4de64ae8d Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-17 21:53:41 +00:00
Nikita Karetnikov
106ccf4a2a [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 21:21:45 +00:00
PyTorch MergeBot
f957334c2b Revert "[pt2] add meta function for linalg.cross (#99279)"
This reverts commit efc3887ea5.

Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird
2023-04-17 19:33:16 +00:00
Nikita Karetnikov
efc3887ea5 [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 03:05:20 +00:00
Rodrigo Kumpera
a910045add [PATCH] Back out "Move functional collectives implementation to python. (#98595) (#99168)
Summary:
Original commit changeset: ba36f8751adc

Original Phabricator Diff: D44788697

Test Plan: model loading is fine after reverting the diff

Reviewed By: zyan0, sayitmemory

Differential Revision: D44921259
---

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168
Approved by: https://github.com/izaitsevfb
2023-04-14 23:48:19 +00:00
XiaobingSuper
9c98f2ceb7 inductor: rewrite mkldnn fx fusion using pattern_matcher(binary) (#97141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97141
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 06:23:03 +00:00
XiaobingSuper
c214c50355 inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_unary) (#97007)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97007
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-04-12 05:52:54 +00:00
Guang Yang
c377a8590b Add nonzero_static() op to pytorch to unblock export (#97417)
Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR

Example:

Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])`

call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`
call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)
call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)

Test Plan:
**Unit Tests**
```
buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'
```

**PT2 Export with `nonzero_static()`**
Example of `GraphModule` in the exported graph
```
def forward(self, x):
    arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
    nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4);  arg0 = None
    return pytree.tree_unflatten([nonzero_static_default], self._out_spec)
```

Differential Revision: D44324808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417
Approved by: https://github.com/ezyang
2023-04-11 05:13:36 +00:00
Nikita Karetnikov
b411238d76 [pt2] add meta function for logcumsumexp (#98683)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683
Approved by: https://github.com/ezyang
2023-04-09 01:26:37 +00:00
Rodrigo Kumpera
24d9001527 Move functional collectives implementation to python. (#98595)
This simplifies a lot the work we need to add new ops.

This relands the previous PR, not sure why it was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595
Approved by: https://github.com/wconstab
2023-04-07 21:48:05 +00:00
Nikita Karetnikov
1c226f5aad [pt2] add meta functions for cummax and cummin (#98552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552
Approved by: https://github.com/Chillee
2023-04-07 17:58:28 +00:00
albanD
0210481dcb Fix _like meta registrations (#98160)
The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!).
zeros_like is special due to sparse and is fixed directly by always filling it with zeros.
Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions.
I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal.

test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160
Approved by: https://github.com/ezyang
2023-04-06 18:44:34 +00:00
PyTorch MergeBot
67d1a77086 Revert "Move functional collectives implementation to python. (#98315)"
This reverts commit 8b0374f83c.

Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace
2023-04-06 16:49:40 +00:00
Nikita Karetnikov
7b25976323 [pt2] add meta function for take (#98451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451
Approved by: https://github.com/ezyang
2023-04-06 14:48:35 +00:00
Rodrigo Kumpera
8b0374f83c Move functional collectives implementation to python. (#98315)
This simplifies a lot the work we need to add new ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315
Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze
2023-04-06 14:06:16 +00:00
PyTorch MergeBot
fa08e546f3 Revert "Add all_reduce_coalesced functional collective (#97157)"
This reverts commit a3fc3531f5.

Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
Rodrigo Kumpera
a3fc3531f5 Add all_reduce_coalesced functional collective (#97157)
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.

Might not work if any outputs is unused.

Test code:

```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os

import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
    spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner

def my_fun(a, b):
    c = a * 3
    tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
    return ((tensors[1] + tensors[0] + tensors[2]).sum(), )

@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):

    x = torch.arange(4).cuda() * (dist.get_rank() + 1)
    y = torch.arange(4).cuda() * (dist.get_rank() + 1)
    x = x.to(torch.float)
    y = y.to(torch.float) * 0.5
    res = make_fx(my_fun)(x, y)
    print(f"fx graph:\n{res.graph}")
    ind = compile_fx_inner(res, [x, y])
    print(f"inductor done:\n{ind}")

os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True

if __name__ == "__main__":
    inductor_main(None)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Shen Li
e8d39606eb [SPMD] Enable fused Adam in full train step tracing (#98113)
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
bccf2ef0ce Format DTensor dispatch.py and _meta_registrations.py (#98114)
Format-only changes with black and lintrunner to prepare for the commit on top.

Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114
Approved by: https://github.com/yifuwang, https://github.com/fegin
2023-04-01 15:54:13 +00:00
Shen Li
9ec6fdb29b Enable adam foreach in full train step tracing (#97897)
Main changes:

1. Registered several foreach ops to both meta and DTensor
2. Skip redundant getitem node when expanding foreach ops with DTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97897
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-03-30 16:47:10 +00:00
Shen Li
379fb47654 [SPMD] Support foreach optimizers with functionalization (#97853)
My first attempt was to apply the same solution as how proxy_tensor.py
handles other inplace ops. However, foreach is different in the way
that it's schema is `native_functions.yaml` does not return anything,
whereas ops like `addcmul_` and `addcdiv_` do return Tensors (Thanks
bdhirsh for teaching me this!). As a result, the proxy output
during tracing does not wrap anything, and hence we cannot correctly
connect it with subsequent operators. Modifying `native_functions.yaml`
is not a preferred solution. After discussing with bdhirsh, the
temporary solution is to do foreach functionalization as a graph
pass for now. Later, when https://github.com/pytorch/pytorch/issues/97852
is addressed, we will switch to default functionalization.

Edit: the latest version follows @bdhirsh 's suggestion on using
`make_fx` `decomposition_table` instead of implementing manual
fx.Graph tranforms to functionalize `_foreach_add_`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97853
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-03-30 11:27:10 +00:00
Christian Puhrsch
9d37cefcb0 Resubmit _int_mm (#96685)
Avoids any changes to gemm_and_bias

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685
Approved by: https://github.com/drisspg, https://github.com/ngimel
2023-03-27 16:14:07 +00:00
Shen Li
a8f7e0b213 [Easy] Improve error message for meta_mm (#97533)
Differential Revision: [D44376381](https://our.internmc.facebook.com/intern/diff/D44376381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97533
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2023-03-25 01:09:41 +00:00
Driss Guessous
98a5cf090d [SDPA] Remove the chunk_grad from mem-eff attention (#96880)
# Summary

There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized.

I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880
Approved by: https://github.com/cpuhrsch
2023-03-17 21:28:25 +00:00
Nikita Karetnikov
bf08d1387c [primTorch] handle out in sort meta function (#96719)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96719
Approved by: https://github.com/ezyang
2023-03-16 07:38:53 +00:00
Christian Puhrsch
0a53c9624a Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)" (#96885)
Summary:
Backing out  _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885
Approved by: https://github.com/drisspg
2023-03-16 05:32:55 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Nikita Karetnikov
ec536232a3 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-14 06:51:42 +00:00
PyTorch MergeBot
be220690d9 Revert "[primTorch] add meta implementation for upsample_nearest2d_backward (#96612)"
This reverts commit fe180596b8.

Reverted https://github.com/pytorch/pytorch/pull/96612 on behalf of https://github.com/malfet due to broke lint
2023-03-13 03:07:23 +00:00
Nikita Karetnikov
fe180596b8 [primTorch] add meta implementation for upsample_nearest2d_backward (#96612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96612
Approved by: https://github.com/ezyang
2023-03-13 00:25:23 +00:00
Nikita Karetnikov
cb7c796b4b Enable min.unary_out (#96441)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96441
Approved by: https://github.com/ngimel
2023-03-11 19:23:33 +00:00
Nikita Karetnikov
0d7c44096a Add baddbmm meta function (#96548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548
Approved by: https://github.com/ezyang
2023-03-11 19:09:24 +00:00
Nikita Karetnikov
8e0d5bf538 [primTorch] add meta implementation for aten.min.dim (#96442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442
Approved by: https://github.com/ngimel
2023-03-11 18:51:51 +00:00
Driss Guessous
11aab72dc9 [SDPA] Add an optional scale kwarg (#95259)
# Summary
This PR adds an optional kwarg to torch torch.nn.functional.scaled_dot_product_attention()
The new kwarg is a scaling factor that is applied after the q@k.T step of the computation. Made updates to the efficient kernel to support but flash and math were minimally updated to support as well.

Will reduce the complexity of: #94729 and has been asked for by a couple of users.

# Review Highlights
- As far as I know I did this the correct way and this both BC and FC compliant. However I always seem to break internal workloads so I would love if someone can advice I did this right?
- I named the optional arg 'scale'. This is probably dumb and I should name it 'scale_factor'. I will make this change but this is annoying and it will require someone thinking we should rename.
- 'scale' is interpreted as `Q@K.T * (scale)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95259
Approved by: https://github.com/cpuhrsch
2023-03-08 18:07:40 +00:00
Wonjoo Lee
3095c95828 Fixes for PyTorch/XLA functionalization integration (#94537)
Fixes for PyTorch/XLA functionalization integration

---
Some notable changes include:
- More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
- Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them
- Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
- Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
- Update MetaConverter to exclude XLA tensors in raising NotImplemented…
- Add `_propagate_xla_data` op
- Add meta tensor support for some ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537
Approved by: https://github.com/bdhirsh
2023-03-02 23:02:34 +00:00
Wanchao Liang
f397d1700f Inductor reduce_scatter_tensor (#95764)
This adds reduce_scatter to the functional collective and adds the
inductor lowering support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764
Approved by: https://github.com/kumpera
2023-03-02 22:05:30 +00:00
Will Constable
cc6da7b901 Inductor allgather_into_tensor (#95530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530
Approved by: https://github.com/kumpera
2023-02-27 21:38:36 +00:00
Christian Puhrsch
1fe2a9d122 Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)
Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339
Approved by: https://github.com/ngimel, https://github.com/jansel
2023-02-27 20:27:25 +00:00
HELSON
f43ce9553b [meta_tensor] polish error strings in meta registrations (#95052)
I found some error message should be formatted for detailed information. So I polished those error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95052
Approved by: https://github.com/bdhirsh
2023-02-27 20:12:09 +00:00
Edward Z. Yang
4833e47feb Add support for nonzero, some improvements to reduce guards (#95387)
This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#

It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1.

What's in the PR:

* nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question.
* The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise.
* PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`)
* Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`)
* I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful.
* Added `constrain_unify` to let you specify two unbacked SymInts must have the same value

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387
Approved by: https://github.com/voznesenskym
2023-02-24 00:27:45 +00:00
Yanan Cao (PyTorch)
039b4c8809 Add meta function for _upsample_bilinear2d_aa (#94982)
Differential Revision: D43353000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982
Approved by: https://github.com/ezyang
2023-02-19 07:11:20 +00:00
Rodrigo Kumpera
e22d791287 [PTD] Introduce tracing friendly collectives. (#93990)
This change adds torch.distributed.traceable_collectives.

This experimental API enables collectives to be fully traced by dynamo and FX.

See #93173 for the RFC

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990
Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang
2023-02-16 15:35:01 +00:00
PyTorch MergeBot
641dc0b844 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 782e4f5c02.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/jeanschmidt due to this commits breaks internal builds: https://fburl.com/sandcastle/dw0rqcbv
2023-02-13 09:20:37 +00:00
Jerry Zhang
782e4f5c02 [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-10 01:40:12 +00:00
PyTorch MergeBot
3a5a762443 Revert "[quant] Add quantize and dequantize operators to decomposition table (#93312)"
This reverts commit 3fd46a2f9c.

Reverted https://github.com/pytorch/pytorch/pull/93312 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it breaks trunk due to a landrace 3fd46a2f9c.  Please rebase and re-land it
2023-02-08 18:29:10 +00:00
Jerry Zhang
3fd46a2f9c [quant] Add quantize and dequantize operators to decomposition table (#93312)
Summary:
This PR tries to decompose the operators in torch.ops.quantized_decomposed namespace to more
primitive aten operators, this would free us from maintaining the semantics of the quantize/dequantize
operators, which can be expressed more precises in terms of underlying aten operators

Note: this PR just adds them to the decomposition table, we haven't enable this by default yet

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_q_dq_decomposition

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93312
Approved by: https://github.com/vkuzo, https://github.com/SherlockNoMad
2023-02-08 17:26:01 +00:00
Peter Bell
5817695bfa [pt2] Fix arange to match ATen behavior (#93353)
Fixes #92676

`arange` infers the output dtype from the argument types, but in order to reduce
falling back to ATen, inductor preferred to cast whole number float arguments to
int which gave the wrong output dtype. Instead, this decomposes floating point
arange into the prim equivalent for integers.

This also changes the signature of `prims.arange` to

```python
prims.iota(length, *, start, step, **factory_kwargs)
```

which only supports integers arguments. This is done because calculating the
output size from `start, end, step` is surprisingly complex and liable to off by
one errors so should not be duplicated in each backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93353
Approved by: https://github.com/ngimel, https://github.com/lezcano
2023-02-03 00:44:32 +00:00
Michael Suo
4e4293f15f Add meta registration for bucketize (#93893)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893
Approved by: https://github.com/zhxchen17
2023-02-02 21:03:08 +00:00
Driss Guessous
653dc73df0 [SDPA] Wire up FlashAttention's backward (#92917)
# Summary
This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml.

The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](33e0860c9c/flash_attn/flash_attn_interface.py (L126)) natively in PyTorch.  One thing that we don't have access to is ctx.save_for_backward in native PyTorch so in order to save these variables I extended the returned objects from the forward functions.

### MetaFunctions
I also updated the FlashAttention meta functions to mirror the real outputs now. As well I added a meta registration for backwards. I have an XLMR training script and while eager training now works with FlashAttention compiling this module fails with the inductor error down below.

### Questions?
Performance issues vs mem efficient when using torch.nn.mha_forward

TorchCompile -> See purposed solution below.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92917
Approved by: https://github.com/cpuhrsch
2023-02-02 04:02:30 +00:00
Driss Guessous
df14650f0b [SDPA] Update SDPA API and make function Public (#92189)
# Summary
In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function.

## Changes
### API
Previously the the function signature was:
`scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)`
Updated signature:
`scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor`

This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor.

#### Reasoning:
The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g.  (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated.

The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing.

Discussed with folks at FAIR/Xformers and +1 this API change.

#### Make function Public
In preparation for the pt 2.0 launch we make the function public to start to generate user feedback

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189
Approved by: https://github.com/cpuhrsch
2023-01-23 20:50:46 +00:00
Brian Hirsh
76cb2d0ede fix incorrect _embedding_bag meta (#92549)
Fixes https://github.com/pytorch/pytorch/issues/92286. See the issue for diagnosis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92549
Approved by: https://github.com/albanD, https://github.com/eellison
2023-01-18 22:50:31 +00:00
Avik Chaudhuri
bb11e072ae Squash and merge linalg meta kernels (#92335)
Squashed changes from https://github.com/pytorch/pytorch/pull/92021 and https://github.com/pytorch/pytorch/pull/92020 and https://github.com/pytorch/pytorch/pull/92019

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92335
Approved by: https://github.com/avikchaudhuri
2023-01-18 05:55:52 +00:00
yanbing-j
94a7c01159 Enable oneDNN implementation in LSTM op (#91158)
### Description
This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported.

### Performance improvement
In CPX 28C, with setting iomp and jemalloc.
We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original.

In single socket:
![image](https://user-images.githubusercontent.com/61222868/211182994-833debec-518a-4b35-8504-6b0fadb17930.png)

![image](https://user-images.githubusercontent.com/61222868/211183012-31e1253f-2c60-4c92-a656-c239a971b453.png)

In single core:
![image](https://user-images.githubusercontent.com/61222868/211183017-186e5d47-cb9a-4c1e-914f-fa718e769f1c.png)

![image](https://user-images.githubusercontent.com/61222868/211183022-53266857-5a9e-4a95-b300-33fa34811d08.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91158
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-01-18 04:41:18 +00:00
Driss Guessous
f219970990 Return empty attention weights when need_atten_weights = False (#91782)
# Summary
This PR updates the second return value from SDPA to return an empty tensor of size 0 not what it would be if need_attn_weights is True. Also updates the meta function to account for this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91782
Approved by: https://github.com/cpuhrsch
2023-01-06 19:06:48 +00:00
Xia, Weiwen
de9c82f41a [Meta] Register aten.pixel_shuffle.default for meta (#91605)
**Summary**
Fixes #91551
`aten.pixel_shuffle.default` is not registered for meta and it always generates contiguous (channels-first) layout of outputs. It can be reproduced by `torch.compile` (as described in the issue #91551) and running in FakeTensorMode.

**Test plan**
python test/inductor/test_torchinductor.py -k test_pixel_shuffle_channels_last
python test/test_proxy_tensor.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91605
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/anijain2305
2023-01-06 00:45:14 +00:00
Peter Bell
ad7aefb608 Fix Meta tests for FFT functions (#91628)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91628
Approved by: https://github.com/kit1980
2023-01-05 00:58:26 +00:00
XiaobingSuper
dfb651452a inductor: meta registration for mkldnn ops (#91299)
Fix https://github.com/pytorch/torchdynamo/issues/198, which supports Meta tensor for conv/linear fused ops to reduce the compilation time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91299
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel
2023-01-03 14:24:36 +00:00
Joel Schlosser
1c40ec46ff Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260)
Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260
Approved by: https://github.com/ezyang
2022-12-28 16:03:25 +00:00
Yanbo Liang
789b1437e9 Fix meta registration for aten._cudnn_rnn (#91333)
Found this issue from [weekly running 7k github models](https://github.com/pytorch/torchdynamo/issues/1884). This caused  regression on pass rate, there are 25 models failed due to this issue.
The reason is argument ```cx``` of ```aten._cudnn_rnn``` can be ```None```, but it doesn't handle well in meta registration, so throws the following error:
```
Traceback (most recent call last):
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 1059, in run_node
    return nnmodule(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/rnn.py", line 477, in forward
    result = _VF.rnn_tanh(input, hx, self._flat_weights, self.bias, self.num_layers,
  File "/scratch/ybliang/work/repos/pytorch/torch/_subclasses/fake_tensor.py", line 916, in __torch_dispatch__
    r = func(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_ops.py", line 284, in __call__
    return self._op(*args, **kwargs or {})
  File "/scratch/ybliang/work/repos/pytorch/torch/_meta_registrations.py", line 2108, in _cudnn_rnn
    cy = cx.new_empty(0 if cx is None else cell_shape)
AttributeError: 'NoneType' object has no attribute 'new_empty'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91333
Approved by: https://github.com/ezyang
2022-12-23 22:59:31 +00:00
Brian Hirsh
c47bdd7522 *_scatter ops should preserve input stride/storage_offset (#91029)
It turns out that we *do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029
Approved by: https://github.com/ezyang
2022-12-22 19:41:53 +00:00
Joel Schlosser
3226209636 LSTM SymInt-aware changes & meta registration (cuDNN) (#90944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90944
Approved by: https://github.com/ezyang
2022-12-16 21:42:32 +00:00
Joel Schlosser
b0cda0b38c LSTM SymInt-aware changes & meta registration (non-cuDNN CUDA) (#90701)
Adds meta registrations for cuDNN and vanilla CUDA ops underneath `lstm()` and makes the logic SymInt-aware.
TODO:
* cuDNN side does some [nasty stuff](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/RNN.cpp#L1567) with buffers; this needs larger redesign to figure out
* Indicate that AOT Autograd can be used when an LSTM is present (remove the check for this once it's fully supported)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90701
Approved by: https://github.com/ezyang
2022-12-16 18:08:45 +00:00
Driss Guessous
51c6c5e156 [SDPA] Standardizes the return shape for dense tensor of SDPA regardless of fused kernel called (#90776)
# Summary
Continues to fix up the meta output story of SDPA to be more correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90776
Approved by: https://github.com/cpuhrsch
2022-12-14 18:08:02 +00:00
Driss Guessous
42a5f6ee5d Create stub function for doing SDPA cpp and cuda dispatch (#90576)
## Summary
Torch.compile was previously not working for transformerencoder because torch.SDPA calls a native function on tensors that returns an int. This PR instead creates a dispatch stub for the function called in order to not create a separate fx node for this native function.
As well this pr adds meta functions for the fused kerenels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90576
Approved by: https://github.com/cpuhrsch
2022-12-13 03:19:40 +00:00
Peter Bell
79406378ae [primTorch] Add prim and ref for as_strided_scatter (#88426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88426
Approved by: https://github.com/mruberry
2022-12-08 00:17:39 +00:00
Sherlock Huang
42705bd7b3 Disallow registering meta function for CompositeImplicitAutograd ops (#90222)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90222
Approved by: https://github.com/ezyang
2022-12-06 04:22:31 +00:00
Yanbo Liang
e1532af0bb Fix meta registration for aten._cdist_forward (#90042)
Error from [7k github model](https://github.com/pytorch/torchdynamo/issues/1884).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90042
Approved by: https://github.com/ezyang, https://github.com/eellison
2022-12-02 21:13:52 +00:00
Edward Z. Yang
6fb6eb0a74 Support unspecialized integers with dynamic shapes (#89639)
Previously, we hackily wrapped unspecialized integers into
tensors and treated them as tensor inputs.  Sometimes, downstream
operations would not be able to deal with the tensor input.  Now,
we wrap them into SymInt, so more correct overload selection occurs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
Approved by: https://github.com/anjali411
2022-11-24 22:46:42 +00:00
anjali411
9c0bf9387c Meta impl for linalg_cholesky and linalg_cholesky_ex (#89430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89430
Approved by: https://github.com/ezyang
2022-11-22 17:05:34 +00:00
Edward Z. Yang
5582001bd5 Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038) (#89143)" (#89346)
This reverts commit 8e4c9828f4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346
Approved by: https://github.com/wconstab
2022-11-19 21:14:31 +00:00
PyTorch MergeBot
8e4c9828f4 Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)"
This reverts commit e686b8c3ba.

Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs
2022-11-17 17:02:36 +00:00
Edward Z. Yang
e686b8c3ba Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)
This reverts commit cf6003f046.

Differential Revision: [D41363992](https://our.internmc.facebook.com/intern/diff/D41363992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89143
Approved by: https://github.com/albanD
2022-11-17 13:55:06 +00:00
PyTorch MergeBot
cf6003f046 Revert "Towards unifying symbolic and non symbolic fake tensor (#89038)"
This reverts commit 37d54239c7.

Reverted https://github.com/pytorch/pytorch/pull/89038 on behalf of https://github.com/ezyang due to executorch segfaults
2022-11-16 16:52:47 +00:00
Edward Z. Yang
37d54239c7 Towards unifying symbolic and non symbolic fake tensor (#89038)
Fake tensor behaves pretty differently depending on if you have
symbolic shapes or not.  This leads to bugs; for example, we
weren't getting correct convolution_backward strides because we
bypassed the correct stride logic in fake tensor on symbolic
shapes.

This PR attempts to unify the two codepaths.  I don't manage to
unify everything, but I get most of it.  The algorithm is delicate
and I'm still hosing down test failures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89038
Approved by: https://github.com/anjali411
2022-11-16 14:02:43 +00:00
anjali411
dc40d3f93f Add meta impl for grid_sampler_2d_backward (#88745)
TODO: add an OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88745
Approved by: https://github.com/ezyang
2022-11-16 13:01:47 +00:00
anjali411
52be0c42ab meta function for max_pool2d_with_indices_backward (#88743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88743
Approved by: https://github.com/lezcano, https://github.com/ezyang
2022-11-13 18:31:56 +00:00
anjali411
d615d12289 Add meta impl for topk (#88694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694
Approved by: https://github.com/ezyang
2022-11-11 15:28:41 +00:00
anjali411
fc9e36dd42 Add meta support for scalar_tensor and argmax (#88590)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88590
Approved by: https://github.com/albanD
2022-11-11 01:31:00 +00:00
Sherlock Huang
133e61af7a OpOverload is_view (#88722)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88722
Approved by: https://github.com/ezyang
2022-11-09 19:03:12 +00:00
Edward Z. Yang
d81797e845 Meta function for aten.sort and aten.scatter* (#88705)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88705
Approved by: https://github.com/ezyang
2022-11-09 17:47:14 +00:00
Fabio Rocha
652af5ec15 upsample_*.vec ops are now CompositeImplicit (#85638)
It was previously CompositeExplicit but it was not really necessary.
See discussion in https://github.com/pytorch/pytorch/issues/85405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638
Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel
2022-11-09 09:58:04 +00:00
Edward Z. Yang
f0e6cea2ed Meta registrations for inplace operators (#88678)
Also, handle non-default alpha correctly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88678
Approved by: https://github.com/SherlockNoMad, https://github.com/albanD
2022-11-09 01:27:01 +00:00
Edward Z. Yang
a880ddc164 Meta implementation for unsqueeze_ (#88675)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88675
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:27:01 +00:00
Edward Z. Yang
1dab35ca1b Meta implementation for bernoulli (#88676)
For some reason bernoulli uses legacy memory format, see linked issue.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88676
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:26:58 +00:00
Edward Z. Yang
6bb7f4f29f Minor error message improvements on meta functions (#88677)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88677
Approved by: https://github.com/SherlockNoMad
2022-11-08 19:16:29 +00:00
Edward Z. Yang
245144a636 Propagate layout and pin memory in randint to inner constructor (#88673)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88673
Approved by: https://github.com/anjali411
2022-11-08 18:22:30 +00:00
Sherlock Huang
95d57b54e0 Handle pin_memory in refs.randn (#88473)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88473
Approved by: https://github.com/mruberry
2022-11-07 20:25:56 +00:00
Elias Ellison
7d95b1e344 Run all fallback kernels with FakeTensor (#88248)
This improves the memory compression of resnet18 from .84 -> .94 on inductor no-cudagraphs. It does mean that any extern kernel which incorrectly computes strides will be a hard error at runtime, but that's an issue we are going to have to face with dynamic shapes anyway. CC @ezyang, @SherlockNoMad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88248
Approved by: https://github.com/ezyang
2022-11-04 02:06:38 +00:00
Brian Hirsh
70782981f0 aot_dispatch test fix: always use functionalization in symbolic tests (#87647)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87647
Approved by: https://github.com/ezyang, https://github.com/Chillee
2022-11-02 14:36:49 +00:00
Tugsbayasgalan Manlaibaatar
2c7de4a144 Add meta implementation for aten.max.dim (#88005)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88005
Approved by: https://github.com/Chillee, https://github.com/bdhirsh
2022-11-01 18:37:24 +00:00
Sherlock Huang
c368c0faf0 Fix meta for aten.fill, constant_pad_nd, _adaptive_avg_pool2d (#88069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88069
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-11-01 15:36:06 +00:00
Sherlock Huang
c7ac333430 Fix args for meta__fused_moving_avg_obs_fq_helper (#88058)
Fixes https://github.com/pytorch/torchdynamo/issues/1802

There are a few problems,
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
**UPDATE: This is expected behavior, see discussion below for explanation.**

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor.
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang  who reported this bug
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88058
Approved by: https://github.com/bdhirsh, https://github.com/yanboliang
2022-10-31 19:00:16 +00:00
Sherlock Huang
0a4ca9d083 Fix meta for aten.angle and aten.index_copy (#88066)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88066
Approved by: https://github.com/albanD
2022-10-31 17:11:29 +00:00
Sherlock Huang
e8a97a3721 FakeTensorMode and Prims.add/sub/mul/div support scalar only inputs (#87759)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87759
Approved by: https://github.com/ngimel, https://github.com/mruberry, https://github.com/eellison
2022-10-28 04:34:25 +00:00
Sherlock Huang
b21fe312c0 Fix meta for index_add and index_put (#87775)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87775
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-10-26 20:33:23 +00:00
Sherlock Huang
0b162f5b49 Fix stride for prims.where (#87563)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87563
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-10-25 21:22:50 +00:00
Sherlock Huang
ece3758afc Fix _refs for aten.zeros/ones/empty/randn (#87569)
refs for aten.zeros/ones/empty/randn doesn't support .names overload.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87569
Approved by: https://github.com/ngimel
2022-10-25 20:06:57 +00:00
Sherlock Huang
eb99c1efce Prefer python meta function over c++ meta function (#87426)
This is a policy update for meta registration. **We now prefer python meta implementation over C++ meta function.**  This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist.

Here's the meta registration process:
1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`.  However, they will NOT register them into dispatcher.
2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd.
3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases
- 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta
- 2. the op is a view op, as the MetaTensor doesn't support aliased storage
- 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op)

Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5

cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-10-25 16:49:02 +00:00
lezcano
faf9c47abb Simplify a few diagonal-related functions (#87180)
`diag` was unnecessarily implemented as a kernel rather than as a composite
function, which made it unnecessarily difficult (explicit backward + all it entails).

We also change a few uses of `diag` on 2D tensors for `diagonal()`. The
latter returns a view rather than creating a new tensor.

We also upgrade its meta implementation to a fully-fledged
decomposition

I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry
2022-10-24 06:11:53 +00:00
Sherlock Huang
f3f1b44778 Fix meta for meta_fill_ (#87493)
Existing meta_fill_ doesn't correctly reflect the aliasing relationship for aten.fill. A new MetaTensor should be return instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87493
Approved by: https://github.com/eellison, https://github.com/bdhirsh
2022-10-22 12:41:03 +00:00
Edward Z. Yang
d73d4aa7de Audit for error prone isinstance int/float and add lint (#87345)
We recently fixed a bug on symbolic-shapes branch where
an isinstance(x, int) test failed when passed a SymIntNode.
To prevent this, I've added a lint for all the codepaths
where we may pass SymInt/SymFloat directly to reject
direct isinstance int/float tests, and instead use one of
the aliases.  The lint rule explains the options.  I then
go and fix all of them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-10-21 15:55:24 +00:00
Sherlock Huang
f7da9db9c1 Unify decomp registries into global_decomposition_table (#86857)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857
Approved by: https://github.com/ezyang
2022-10-20 21:29:05 +00:00
albanD
254b681dc6 Convert torch.Size() argument to sym size in test_proxy_tensor (#87304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87304
Approved by: https://github.com/ezyang
2022-10-20 14:20:19 +00:00
anjali411
6351220573 Add meta support for _adaptive_avg_pool2d_backward (#86359) (#87074)
This reverts commit 3edf79dc03.

Reland of https://github.com/pytorch/pytorch/pull/86359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87074
Approved by: https://github.com/ezyang
2022-10-17 16:15:04 +00:00
albanD
b40f4434ac conv backward impl (#87047)
~~Waiting for test run to see if this backward is actually exercised.
If not, I will add test before merging.~~
Test updated. Ready to go now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87047
Approved by: https://github.com/ezyang
2022-10-17 13:14:12 +00:00
albanD
86c2e44cb6 meta funcs for avg_pool2d and avg_pool2d_backward (#87043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87043
Approved by: https://github.com/ezyang
2022-10-17 13:14:10 +00:00
albanD
3a4c0900c7 Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795)
Take 3
Contains:
- symintification of split*
- floor support on SymFloat
- pad_backward, gather, scatter meta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795
Approved by: https://github.com/z-a-f
2022-10-17 02:09:40 +00:00
Brian Hirsh
34c86adec4 symintify all of derivatives.yaml (#86610)
Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing.

* with the exception of `split()`, which is tougher to land because it requires internal changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610
Approved by: https://github.com/albanD
2022-10-14 20:15:48 +00:00
Tugsbayasgalan Manlaibaatar
cff333bdb5 Enable max.unary_out (#86855)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86855
Approved by: https://github.com/jerryzh168, https://github.com/bdhirsh
2022-10-13 17:14:53 +00:00
PyTorch MergeBot
f912b58544 Revert "Enable max.unary_out (#85926)"
This reverts commit 16a0fa1204.

Reverted https://github.com/pytorch/pytorch/pull/85926 on behalf of https://github.com/osalpekar due to The internal diff for this commit shows a number of pytorch quantization test failures. Here is a sample output: AssertionError: Tensor-likes are not close! Mismatched elements: 319 / 320 (99.7%). Greatest absolute difference: 0.056652069091796875 at index (0, 0, 4, 5) (up to 1e-05 allowed). Link to the diff: [D40232598](https://www.internalfb.com/diff/D40232598). Link to the Sandcastle job that is failing: https://www.internalfb.com/intern/sandcastle/job/18014399302908587/
2022-10-11 23:53:12 +00:00
PyTorch MergeBot
2aa981ab74 Revert "Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488)"
This reverts commit 978b46d7c9.

Reverted https://github.com/pytorch/pytorch/pull/86488 on behalf of https://github.com/osalpekar due to Broke executorch builds internally with the following message: RuntimeError: Missing out variant for functional op: aten::split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] . Make sure you have loaded your custom_ops_generated_lib
2022-10-11 23:39:50 +00:00
PyTorch MergeBot
3edf79dc03 Revert "Add meta support for _adaptive_avg_pool2d_backward (#86359)"
This reverts commit a56a8c0fc0.

Reverted https://github.com/pytorch/pytorch/pull/86359 on behalf of https://github.com/clee2000 due to causing unexpected success for functorch on master but PR is green (landrace?) https://github.com/pytorch/pytorch/actions/runs/3227306657/jobs/5282180524 a56a8c0fc0
2022-10-11 16:33:41 +00:00
anjali411
a56a8c0fc0 Add meta support for _adaptive_avg_pool2d_backward (#86359)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86359
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-11 13:37:25 +00:00
Jianyu Huang
17074389de index op with int32 support (#86318)
Differential Revision: D40089960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86318
Approved by: https://github.com/malfet
2022-10-11 06:12:17 +00:00
anjali411
9eb771583c symintify rand and randint functions and meta suport for randint (#86358)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86358
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-10 17:07:11 +00:00
Tugsbayasgalan Manlaibaatar
16a0fa1204 Enable max.unary_out (#85926)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85926
Approved by: https://github.com/bdhirsh
2022-10-10 16:53:33 +00:00
albanD
978b46d7c9 Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488)
symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops

add meta_bernoulli_

meta kernel for at::gather

get pytorch_struct to pass: meta for scatter_add, fix backward

symintify split ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86488
Approved by: https://github.com/ezyang
2022-10-10 15:54:28 +00:00
albanD
55663b7f81 Reland 3 of Symintify getitem and add the required helper functions (#86207) (#86487)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86487
Approved by: https://github.com/ezyang
2022-10-10 15:54:28 +00:00
PyTorch MergeBot
5b69b87d5a Revert "Symintify getitem and add the required helper functions (#86207)"
This reverts commit fd5085c445.

Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/seemethere due to  Fails internal tests, see: https://www.internalfb.com/intern/sandcastle/job/22517998926071860/insights
2022-10-07 16:10:30 +00:00
PyTorch MergeBot
75df4b5e3d Revert "Merge more symbolic meta kernels and symint changes from branch (#86334)"
This reverts commit 08e3999fa4.

Reverted https://github.com/pytorch/pytorch/pull/86334 on behalf of https://github.com/seemethere due to Trying to revert https://github.com/pytorch/pytorch/pull/86207, this PR causes merge conflicts with the initial revert so will have to revert this as well
2022-10-07 16:03:30 +00:00
Brian Hirsh
08e3999fa4 Merge more symbolic meta kernels and symint changes from branch (#86334)
symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops

add meta_bernoulli_

meta kernel for at::gather

get pytorch_struct to pass: meta for scatter_add, fix backward

symintify split ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86334
Approved by: https://github.com/ezyang
2022-10-06 23:29:04 +00:00
albanD
fd5085c445 Symintify getitem and add the required helper functions (#86207)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207
Approved by: https://github.com/ezyang, https://github.com/Chillee, https://github.com/bdhirsh
2022-10-06 04:46:19 +00:00
PyTorch MergeBot
168ba066e3 Revert "Symintify getitem and add the required helper functions (#86207)"
This reverts commit 17addb307e.

Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/malfet due to Broke lint, by double-registering `meta_index_put`, but no CI was run during the outage
2022-10-05 22:42:56 +00:00
albanD
17addb307e Symintify getitem and add the required helper functions (#86207)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207
Approved by: https://github.com/ezyang
2022-10-05 21:19:00 +00:00
Edward Z. Yang
d07b85393a SymInt fixes from symbolic-shapes branch (#86242)
symintify a few inplace meta functions

symintify resize_(), nbytes(), functionalization input mutations

meta funcs for avg_pool2d_backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86242
Approved by: https://github.com/Chillee
2022-10-05 04:52:02 +00:00
Horace He
82d9592f1b Batch of symintifications to allow more models to pass in inference (#86104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104
Approved by: https://github.com/ezyang
2022-10-04 04:01:58 +00:00
Will Constable
833edeb020 Register py metas to py dispatcher so they are used by functionalization (#86057)
- this ensures python metas are always used during symbolic tracing/functionalization without overshadowing c++ metas during eager runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86057
Approved by: https://github.com/ezyang
2022-10-02 00:00:46 +00:00
Horace He
39130ccf73 Registered _like metas (#85793)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85793
Approved by: https://github.com/ezyang
2022-09-28 17:23:07 +00:00
PyTorch MergeBot
b44a4a8b51 Revert "Registered _like metas (#85793)"
This reverts commit a4e75ccf85.

Reverted https://github.com/pytorch/pytorch/pull/85793 on behalf of https://github.com/huydhn due to Sorry, reverting as this breaks an aot_autograd mac test on functorch. https://github.com/pytorch/pytorch/pull/85794 was reverted before but it was at the top of the stack so the revert still fail 823dc33b00
2022-09-28 17:18:29 +00:00
Horace He
a4e75ccf85 Registered _like metas (#85793)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85793
Approved by: https://github.com/ezyang
2022-09-28 14:07:57 +00:00
Horace He
2f4a517d67 Ported matmul compositeimplicitautograd impl into core (#85239)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239
Approved by: https://github.com/ezyang, https://github.com/lezcano
2022-09-21 09:25:24 +00:00
Elias Ellison
a3afb2c2f6 Fake: fix conv_transpose2d striding (#82846)
The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda.

You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82846
Approved by: https://github.com/ezyang
2022-09-20 18:00:59 +00:00
Sherlock Huang
29eba319b4 Use alias for nop decomp (#84727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84727
Approved by: https://github.com/Chillee
2022-09-16 18:50:56 +00:00
Horace He
5ea2eb304e Converted batch norm over to use symints (#84113)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84113
Approved by: https://github.com/wconstab, https://github.com/ezyang
2022-09-12 05:36:24 +00:00
Natalia Gimelshein
72f0f24a76 remove unneeded _to_copy meta (#84460)
Fixes #84335

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84460
Approved by: https://github.com/Chillee
2022-09-02 18:08:39 +00:00
Edward Z. Yang
d39490a711 Add meta function for repeat (#84349)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349
Approved by: https://github.com/Krovatkin
2022-09-01 20:44:52 +00:00
Edward Z. Yang
cd96f3f676 Use register_meta for everything in meta_registrations (#84297)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297
Approved by: https://github.com/Chillee
2022-08-31 23:58:24 +00:00
PyTorch MergeBot
65f98eb47d Revert "Add meta function for repeat (#84349)"
This reverts commit 44bc6db8f8.

Reverted https://github.com/pytorch/pytorch/pull/84349 on behalf of https://github.com/janeyx99 due to Land race with the revert causing test_fx failures 44bc6db8f8
2022-08-31 18:27:59 +00:00
Edward Z. Yang
44bc6db8f8 Add meta function for repeat (#84349)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349
Approved by: https://github.com/Krovatkin
2022-08-31 17:20:21 +00:00
PyTorch MergeBot
14093b5979 Revert "Use register_meta for everything in meta_registrations (#84297)"
This reverts commit 8cd296f680.

Reverted https://github.com/pytorch/pytorch/pull/84297 on behalf of https://github.com/suo due to broke test_proxy_tensor on master
2022-08-31 16:32:24 +00:00
Edward Z. Yang
8cd296f680 Use register_meta for everything in meta_registrations (#84297)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297
Approved by: https://github.com/Chillee
2022-08-31 02:13:21 +00:00
Mario Lezcano
f5a3515083 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-25 09:28:55 +00:00
PyTorch MergeBot
5321bf52f2 Revert "Make linalg.inv composite of linalg.solve (#80074)"
This reverts commit 4737b33614.

Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628
2022-08-25 00:43:00 +00:00
Mario Lezcano
4737b33614 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
Horace He
7ebdb4c72f Refactored ops on size to be dispatcher ops (#83719)
An example of how the graph looks now.
```
def forward(self, x_1):
    size = torch.ops.math.size(x_1, 0)
    size_1 = torch.ops.math.size(x_1, 1);  x_1 = None
    ones = torch.ops.aten.ones.default([1], device = device(type='cpu'), pin_memory = False)
    expand_sym_int = torch.ops.aten.expand.SymInt(ones, [size, size_1]);  ones = size = size_1 = None
    cos_default = torch.ops.aten.cos.default(expand_sym_int);  expand_sym_int = None
    return (cos_default,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83719
Approved by: https://github.com/ezyang
2022-08-23 15:48:00 +00:00
Elias Ellison
8a6b076196 lift numpy tensor, add randperm support (#83191)
Couple changes needed to trace huggingface w fake tensors.

Similar to https://github.com/pytorch/pytorch/pull/81927, need to call liftfresh for tensors created from numpy tensors. Also adds randperm for meta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83191
Approved by: https://github.com/bdhirsh
2022-08-10 22:27:51 +00:00
Elias Ellison
1c0f7bd6d2 Enable complex for meta tensors (#79975)
There weren't really any fundamental blockers
- add support for `aten::complex`
- update `angle` for complex
- remove the error in the fallback kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79975
Approved by: https://github.com/ezyang
2022-07-27 22:19:14 +00:00
Horace He
fc389cc0a0 Added new_empty.symint overload and a new_empty ref (#82049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82049
Approved by: https://github.com/ezyang
2022-07-27 00:31:57 +00:00
Robert
47b4ec8aa7 Conv2d shape calculation for meta tensors (#79834)
Fixes #79512

This PR adds support for convolutional meta modules and computes the output shape correctly for some meta input tensor.
Currently in progress, no tests written so far.

**Feature implementations**:
- [x] `Conv1d`
- [x] `Conv2d`
- [x] `Conv3d`

**Tests**:
- [x] `Conv1d`
- [x] `Conv2d`
- [x] `Conv3d`

cc @albanD @anjali411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79834
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-07-23 05:58:56 +00:00
Huy Do
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
Horace He
2529ff4bd9 Registered python meta functions to a table (#81092)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81092
Approved by: https://github.com/ezyang, https://github.com/anjali411
2022-07-21 21:45:43 +00:00
Horace He
a5fb41e3d3 Revert "Revert "Refactored prim utils into _prims_utils folder (#81746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81746
Approved by: https://github.com/anijain2305, https://github.com/Krovatkin
2022-07-20 23:43:57 +00:00
PyTorch MergeBot
e43a02c314 Revert "Refactored prim utils into _prims_utils folder (#81088)"
This reverts commit 80231d0a72.

Reverted https://github.com/pytorch/pytorch/pull/81088 on behalf of https://github.com/jeanschmidt due to breaking internal tests
2022-07-19 19:56:41 +00:00
Horace He
80231d0a72 Refactored prim utils into _prims_utils folder (#81088)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81088
Approved by: https://github.com/ngimel
2022-07-19 03:55:51 +00:00
lezcano
eb0889cf7d Add support for multiple inputs to out_wrapper and strict dtype checking (#80601)
Reland of https://github.com/pytorch/pytorch/pull/79941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80601
Approved by: https://github.com/albanD
2022-07-05 12:31:21 +00:00
PyTorch MergeBot
184a065ba7 Revert "Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)"
This reverts commit dc7066a8f0.

Reverted https://github.com/pytorch/pytorch/pull/79941 on behalf of https://github.com/suo due to broke master dc7066a8f0
2022-06-30 03:29:30 +00:00
lezcano
dc7066a8f0 Add support for multiple inputs to out_wrapper and strict dtype checking (#79941)
When a function returns multiple parameters in PyTorch, the `out`
parameter takes a tuple of tensors (see `linalg.svd` for example).
The current implementation in `out_wrapper_multi` modelled this wrong,
as it assumed that it would take a number of different named
parameters.

This PR implements the correct behaviour in `out_wrapper`. As a small
side-effect, we now need to call `@out_wrapper()` when the output is
just one tensor.

This PR also implements an additional optional parameter that checks
whether the dtype of the given `out` is exactly the dtype that the meta
function requires. This is the behaviour that we currently have in
PyTorch, and this check is necessary in eager when we call with these
tensors into external libraries.

We also make the functions with several outputs return a namedtuple,
similar to what we do in PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79941
Approved by: https://github.com/mruberry, https://github.com/ezyang
2022-06-30 02:47:16 +00:00
Edward Z. Yang
4331bc436e Ensure torch._refs registrations also get triggered on import torch (#80270)
Fixes https://github.com/pytorch/pytorch/issues/79938

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80270
Approved by: https://github.com/ngimel
2022-06-26 02:23:03 +00:00
lezcano
ff5a588e6e Port cholesky to structured kernels (#79300)
Yeah.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79300
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
2022-06-24 02:37:45 +00:00
PyTorch MergeBot
79507d2a9d error when registering meta kernels to composite ops in core
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79741

Approved by: https://github.com/Chillee, https://github.com/albanD
2022-06-21 02:17:13 +00:00
Elias Ellison
9705fb03b3 Add support for a couple ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79581

Approved by: https://github.com/Chillee
2022-06-20 22:25:39 +00:00
lezcano
648a6658ec Remove python implementation for eigh meta
Following https://github.com/pytorch/pytorch/pull/79072#discussion_r898210048

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79786

Approved by: https://github.com/ngimel, https://github.com/bdhirsh
2022-06-17 18:52:28 +00:00
kshitij12345
31ada133cb [meta] nansum, nanmedian (and few minor clean-ups) (#79411)
meta support for `nansum` and `nanmedian`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79411
Approved by: https://github.com/anjali411
2022-06-14 16:21:13 +00:00
kshitij12345
a732bbea23 [meta] Add meta support for fft ops (#79311)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79311
Approved by: https://github.com/ezyang
2022-06-13 01:56:42 +00:00
kshitij12345
bd1a35dfc8 [meta] diag ops, trace (#79341)
meta registration for `diag.out` and update test skips/expectedFailures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79341
Approved by: https://github.com/ezyang
2022-06-12 18:45:03 +00:00
kshitij12345
7b307e5fca [meta] angle, angle.out (#79278)
meta registration for `angle, angle.out`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79278
Approved by: https://github.com/anjali411
2022-06-10 20:06:31 +00:00
Brian Hirsh
7b3a0ff87a Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
lezcano
af6321f3d8 Port linalg_qr to structured
This PR simplifies the logic of `linalg.qr` using structured kernels. I
also took this chance and merged a few `copy_` operations with other
ops.

This PR removes a the previous magma implementation as is never faster
than that of cusolver and it's rather buggy. This has the side-effect
that now `qr` is not supported in Rocm. Ivan confirmed that this is
fine, given how incredibly slow was QR on Rocm anyway (we were marking
some tests as slow because of this...).

This PR also corrects the dispatch in geqrf. Before, if we called it
with a matrix for which `input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)` is false, and we have cublas but not cusolver, we would end up calling magma rather than cublas. This is not what the heuristic suggested.
Probaly we should benchmark these heuristics again, but that's beyond the scope of this PR.

Note. It looks like `torch.geqrf` maybe broken in MAGMA as per the
previous comment in `linalg_qr_helper_magma`. IvanYashchuk wdyt?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79054

Approved by: https://github.com/IvanYashchuk, https://github.com/ezyang
2022-06-09 14:41:30 +00:00
Edward Z. Yang
225bf132ab Black torch._meta_registrations
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79026

Approved by: https://github.com/Chillee
2022-06-09 03:03:09 +00:00
Edward Z. Yang
50f2af84da Add embedding_bag meta functions
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78997

Approved by: https://github.com/Chillee, https://github.com/Lezcano
2022-06-08 22:03:27 +00:00
Edward Z. Yang
41bd5b85fd cdist meta function
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78993

Approved by: https://github.com/Lezcano, https://github.com/Chillee
2022-06-08 01:57:00 +00:00
Edward Z. Yang
d09e3674d8 addbmm meta function
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78992

Approved by: https://github.com/Lezcano, https://github.com/Chillee
2022-06-07 23:24:57 +00:00
lezcano
c7d6cec078 Add linalg.lu_solve
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.

We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.

We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.

Fixes https://github.com/pytorch/pytorch/issues/61657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77634

Approved by: https://github.com/malfet
2022-06-07 22:28:28 +00:00
Edward Z. Yang
157d478a30 Fix omission of shape in size check in index.
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78897

Approved by: https://github.com/Lezcano, https://github.com/anjali411
2022-06-05 23:10:55 +00:00
Edward Z. Yang
99882fc492 Make check() strongly typed, fix erroneous call sites
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78896

Approved by: https://github.com/Lezcano, https://github.com/anjali411
2022-06-05 23:10:55 +00:00
Edward Z. Yang
83d40a4dba linalg_cholesky_ex meta function
Taken from https://github.com/albanD/subclass_zoo/blob/main/python_meta_tensor.py

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78604

Approved by: https://github.com/bdhirsh, https://github.com/ngimel, https://github.com/Lezcano
2022-06-03 23:11:02 +00:00
Edward Z. Yang
6120a8e05d Implement meta function for aten::index.Tensor
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78527

Approved by: https://github.com/bdhirsh, https://github.com/ngimel, https://github.com/Lezcano
2022-06-03 23:11:02 +00:00
Edward Z. Yang
1bd21dd152 _linalg_qr_helper meta function
Taken from https://github.com/albanD/subclass_zoo/blob/main/python_meta_tensor.py

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78603

Approved by: https://github.com/Lezcano, https://github.com/mruberry
2022-06-03 20:27:05 +00:00
Edward Z. Yang
9446f9678a repeat_interleaves meta function
Taken from https://github.com/albanD/subclass_zoo/blob/main/python_meta_tensor.py

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78602

Approved by: https://github.com/mruberry
2022-06-02 21:24:46 +00:00
Edward Z. Yang
a1765f0176 addr ref
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78014

Approved by: https://github.com/ngimel
2022-05-25 01:40:11 +00:00
Edward Z. Yang
9e0e949484 Fix bugs, increase meta coverage
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77499

Approved by: https://github.com/mruberry
2022-05-19 21:04:57 +00:00
Edward Z. Yang
baeffdbc6c reflection_pad2d support
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77681

Approved by: https://github.com/mruberry
2022-05-19 14:43:35 +00:00
Edward Z. Yang
d5ed73badd Make it possible to register decompositions to Meta key
Decompositions can be used to fill in meta support where necessary,
assuming the operations they decompose to support meta key.
This PR adds register_meta kwarg to register_decomposition that
optionally lets you register the meta to the C++ dispatch table
for meta tensors.  I use this to then get the meta function for
where and huber_loss for free.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77353

Approved by: https://github.com/mruberry
2022-05-12 23:20:16 +00:00
anjali411
767af8e335 Add meta tensor support for some operations using python registration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76916

Approved by: https://github.com/ezyang
2022-05-10 17:55:06 +00:00