Commit Graph

22654 Commits

Author SHA1 Message Date
Jez Ng
ddb0c26511 [inductor] Re-enable more fixed tests (#110798)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110798
Approved by: https://github.com/Skylion007
2023-10-09 04:36:51 +00:00
Kazuaki Ishizaki
a603dcc307 Fix typo under test directory (#110826)
This PR fixes typo `the the` of comments in files under `test` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110826
Approved by: https://github.com/Skylion007
2023-10-08 20:52:38 +00:00
Sam Larsen
8a8668e1ae [inductor] Implement Fx graph caching to improve warm compilation time. (#103453)
Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR.

Test Plan:
* New unit tests exercising saving and load from the cache.
* New unit tests to exercise the cache key calculations.
* Ran several benchmarks to see cache hit and resulting compilation times.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453
Approved by: https://github.com/eellison
2023-10-08 20:32:15 +00:00
Jon Chuang
844ea6408b feat(dynamo): handle accumulate kwargs ("func", "initial") (#110686)
Follow up to: https://github.com/pytorch/pytorch/pull/110683

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110686
Approved by: https://github.com/ezyang
2023-10-08 07:06:52 +00:00
cdzhan
fa8e4ea212 Add support for hasattr on ListVariable (#110438)
Fixes #109502

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110438
Approved by: https://github.com/jansel
2023-10-08 05:34:00 +00:00
Animesh Jain
58637c4b43 [dynamo] Remove SuperSource (#110475)
The motivation for removing this is already present in the pre-PR comments. Copying it

~~~
# NB - SuperSource is a weird one.
# it is our only source with 2 bases, so we use the objec
# as the base, rather than the type, since an invocation
# like super(Foo, foo) is represented here, the source object base is more spiritually
# aligned with the instance, rather than the type.
# This whole construction is questionable tho, and we should probably find a way to
# avoid this exception to our otherwise nice source parentage invariant.
~~~

Instead of using super(a, b), we can use `type(b).__mro__[index]`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110475
Approved by: https://github.com/jansel
2023-10-08 04:45:06 +00:00
albanD
1824ea3c0f Add a test to make sure all modules in the codebase are importable (#110598)
As per title, running import on any of these files lead to a crash.
I'm very curious how the code in them is used!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110598
Approved by: https://github.com/janeyx99, https://github.com/malfet
2023-10-08 03:52:30 +00:00
Oguz Ulgen
defa0d3a2d Add a side table for triton kernels to avoid using itertools.partial (#110633)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110633
Approved by: https://github.com/jansel
2023-10-08 02:01:59 +00:00
albanD
57cc886639 Fix public binding check to check all submodules (#110601)
Fix https://github.com/pytorch/pytorch/issues/86619

The test to make sure modules are importable is being added at https://github.com/pytorch/pytorch/pull/110598
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110601
Approved by: https://github.com/zou3519
2023-10-08 00:36:31 +00:00
Eric Grinstein
0a5bb1c2eb Feature/stft no window warn (#110695)
Fixes #88919

@mruberry @peterbell10

This PR adds a warning to the .cpp STFT and ISTFT functions if a window is not provided.
It also describes the warning in the documentation on `functional.py`.
Finally, it adds unit tests to check if the warning is being produced.

I have audited for internal calls of `stft` and `istft` on Pytorch and haven't found any.

Thank you for the opportunity to contribute!

Eric
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110695
Approved by: https://github.com/ezyang
2023-10-07 20:24:36 +00:00
angelayi
096b14eae8 Fix numel test to be > 2 (#110731)
This makes it consistent with the comment.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110731
Approved by: https://github.com/angelayi
2023-10-07 19:18:59 +00:00
fduwjj
2dc5e166a5 [TP][Inference] Enable DTensor TP inference (#110751)
In https://github.com/pytorch/pytorch/pull/109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110751
Approved by: https://github.com/bdhirsh, https://github.com/wanchaol
2023-10-07 18:57:27 +00:00
Adnan Akhundov
98b79e9488 [inductor] Add AOTI ABI shim function for torch.nonzero (#110766)
Summary: `torch.nonzero` doesn't have inductor lowering (yet). To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_zero_grid_with_unbacked_symbols
...
----------------------------------------------------------------------
Ran 4 tests in 78.650s

OK
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110766
Approved by: https://github.com/chenyang78
ghstack dependencies: #110713, #110745, #110764
2023-10-07 08:32:27 +00:00
Adnan Akhundov
abb00f66d8 [inductor] Add AOTI ABI shim function for repeat_interleave.Tensor (#110745)
Summary: `repeat_interleave.Tensor` doesn't have inductor lowering. To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_repeat_interleave
...
----------------------------------------------------------------------
Ran 4 tests in 70.526s

OK
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110745
Approved by: https://github.com/chenyang78
ghstack dependencies: #110713
2023-10-07 08:18:01 +00:00
Yang Chen
432df71820 [inductor] added a config to always add tensor constants (#110491)
Summary:
In some scenarios, we want to update constants at runtime.
In such cases, we have to keep the original constants in
the generated code without applying any constant-inlining
optimizations.

This PR adds a config to force us to add tensor constants.

Differential Revision: D49895154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110491
Approved by: https://github.com/mikekgfb
2023-10-07 07:51:54 +00:00
Chien-Chin Huang
d54e20f457 [FSDP][state_dict] Add a unittest for local_state_dict resharding (#110625)
This PR adds a unittest to demonstrate the ability for LOCAL_STATE_DICT to do resharding.

Differential Revision: [D44260141](https://our.internmc.facebook.com/intern/diff/D44260141/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110625
Approved by: https://github.com/wz337
2023-10-07 07:22:41 +00:00
Stephen Jia
c2e7a0d689 [core IR] Add decomps for aten.sum and aten.squeeze variants (#110645)
Summary:
## Context

Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant.

Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10

Test Plan: Github CI + Meta Internal CI

Differential Revision: D49965952

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645
Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales
2023-10-07 04:21:51 +00:00
Oguz Ulgen
e8ef8bfdce [Inductor] Allow matmul to have flexiable layout when we are not autotuning (#110726)
Fixes #102804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110726
Approved by: https://github.com/Chillee
2023-10-07 04:08:37 +00:00
jjsjann123
bf0866fc16 deprecating nvfuser c++ API (#110318)
deprecating nvfuser c++ API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110318
Approved by: https://github.com/davidberard98
2023-10-07 02:25:21 +00:00
Chien-Chin Huang
90bf6e3938 [FSDP][optim_state_dict] Enable cpu_offload config for optimzer state_dict (#108434)
We had the option but never used cpu_offload as optimizer state_dict offloads the tensors to CPU by default. And this is usually most users want as the tensors are required to be moved to CPU eventually. However, we may want to disable offloading to CPU in some cases, epsecially for the debugging purpose. This PR lets optimizer state_dict read the flag.

Differential Revision: [D48913340](https://our.internmc.facebook.com/intern/diff/D48913340/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108434
Approved by: https://github.com/wz337
2023-10-07 01:14:49 +00:00
soulitzer
563728f61c [reland] Update custom Function preserve torch function when inputs r… (#110679)
…eturned as-is

reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837

Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679
Approved by: https://github.com/albanD
2023-10-07 00:27:45 +00:00
Wanchao Liang
1c97808f81 [dtensor] support lt/gt op (#110585)
This PR enables lt/gt aten op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110585
Approved by: https://github.com/fduwjj
ghstack dependencies: #110584
2023-10-07 00:06:36 +00:00
Wanchao Liang
9378a2ceda [dtensor] support aten.where and enable implicit scalar promotion (#110584)
This PR adds support for aten.where and support implicit scalar
promotion, basically when we meet scalar tensors in dispatching logic,
we implicitly convert it those to replicated dtensor

The latter also enables bunch of ops in op db to pass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110584
Approved by: https://github.com/fduwjj
2023-10-07 00:06:36 +00:00
Yue Dong
e3bf5000a7 Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628)
Summary:
As title, this diff hides the contiguous requirement for user input mesh when initializing DeviceMesh.

In the current implementation, when testing with inter-node model parallelism, an exception is thrown during mesh validation when the following input is provided:
```
mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1)
device_mesh = DeviceMesh(
                "cuda",
mesh.contiguous(),
mesh_dim_names=("dp", "mp")
)
```

Test Plan:
**Unit Test**:
```
buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:device_mesh -- test_validate_device_mesh

Test UI: https://www.internalfb.com/intern/testinfra/testrun/3940649876878399
Network: Up: 0B  Down: 0B
Jobs completed: 6. Time elapsed: 1:58.7s.
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
```

**Test with MP**
```
mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1)
device_mesh = DeviceMesh(
                "cuda",
mesh.contiguous(),
mesh_dim_names=("dp", "mp")
)
```
Without the change: exception.
After this change: initialzied sucessfully.

Differential Revision: D49942839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110628
Approved by: https://github.com/wanchaol, https://github.com/xw285cornell, https://github.com/fduwjj
2023-10-06 23:54:13 +00:00
chilli
6b1007b2a7 Fix error in div lowering with integers (#102809)
Fixes https://github.com/pytorch/pytorch/issues/101016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102809
Approved by: https://github.com/ngimel
ghstack dependencies: #110501, #110504, #110591, #110668, #110687
2023-10-06 23:21:40 +00:00
Jon Chuang
9b55194f81 fix(dynamo): Incorrect accumulate implementation, bad tests (#110683)
Root cause of: https://github.com/pytorch/pytorch/issues/110287

Fixed many tests that didn't actually test due to unreliability of `CompileCounter.frame_count` in detecting graph breaks: https://github.com/pytorch/pytorch/issues/110730

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110683
Approved by: https://github.com/voznesenskym
2023-10-06 23:07:56 +00:00
Nikita Shulga
65d40a72c4 Delete rogue print from test_quantize_pt2e.py (#110732)
Introduced by https://github.com/pytorch/pytorch/pull/110308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110732
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi, https://github.com/jerryzh168
2023-10-06 22:16:10 +00:00
Avik Chaudhuri
44d34fe65c different bounds for same Dim name (#110638)
Previously,`Dim` definitions that shared the same name but had different ranges were allowed to appear in the `dynamic_shapes` argument of an `export` call. They would correspond to the *same* dynamic dimension (identified by the shared name) with an effective range would be the *intersection* of the different ranges.

However this behavior can be confusing, because having different definitions with the same name is more likely than not  unintentional. Therefore, this PR makes it a user error.

We still allow different definitions with the same name to exist at the same time (no global uniqueness) as long as they are not confused in the same `export` call. Redefinitions with the same bounds are also allowed, in case they are accidentally created by executing the same code multiple times.

Differential Revision: [D49965944](https://our.internmc.facebook.com/intern/diff/D49965944/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110638
Approved by: https://github.com/zhxchen17
2023-10-06 21:22:52 +00:00
Adnan Akhundov
f74937741e Remove runtime assertions between export and AOT compilation (#110710)
Summary: The runtime assertions inserted in the `torch._export.export` by the `_AddRuntimeAssertionsForInlineConstraintsPass` lead to errors in AOT Inductor like #109884. In `torch._export.aot_compile` export and AOT compilation are run consecutively which would lead to the above issue if any assertions are inserted.

In this PR, we're adding a new parameter / flag to `torch._export.aot_compile`, `remove_runtime_assertions`, to remove the assertions inserted during export before AOT compilation. The flag is set to `False` for BC.

Additionally, we remove the flag `add_runtime_assertions_for_inline_constraints` recently added to `torch._dynamo.config`, as it can lead to undesirable `torch._export` behavior and is 's no longer required for the AOT Inductor testing purposes.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110710
Approved by: https://github.com/zhxchen17, https://github.com/chenyang78
2023-10-06 21:09:35 +00:00
cdzhan
7cc0020a80 [decomp] Fix different return type in threshold_backward vs. eager (#110689)
due to type promotion with floating point scalar in decompositions.py

Fixes part of #100838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689
Approved by: https://github.com/ezyang
2023-10-06 20:59:58 +00:00
kshitij12345
b8a3998c23 add batch rule for missing inplace ops (#110692)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110692
Approved by: https://github.com/ezyang
2023-10-06 20:53:28 +00:00
Yanbo Liang
1b1bc08557 [Dynamo] SizeVariable can be indexed by symint (#110349)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110349
Approved by: https://github.com/williamwen42
2023-10-06 20:48:07 +00:00
PyTorch MergeBot
ff0358b038 Revert "[C10] PG observability hooks. (#108815)"
This reverts commit 0c7a877745.

Reverted https://github.com/pytorch/pytorch/pull/108815 on behalf of https://github.com/albanD due to Add a new torch.distributed.hooks namespace but does not document it, test was added this morning ([comment](https://github.com/pytorch/pytorch/pull/108815#issuecomment-1751327751))
2023-10-06 19:49:49 +00:00
Rodrigo Kumpera
0c7a877745 [C10] PG observability hooks. (#108815)
Expose a set of observability hooks into C10D such that our users can
detect collectives failure both faster and more easily.

The design is similar to NCCL desync debug that it minimized the
overhead by doing most of the work out of the main thread.

This PR introduces a new module torch.distributed.hooks that exposes the following set of methods:

    register_collective_start_hook
    register_collective_end_hook
    register_process_group_hook

The process group hook exposes PG creation on the member ranks and call them inline from the
the PG creation code. This is fine since this happens during initialization and a limited number of times.

The collective start/end hooks are fired from a single background thread. It reads
events from a C++ queue and dispatches over.

Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown
and have it as background thread. This is not possible with more reasonable choices like a condvar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108815
Approved by: https://github.com/wconstab, https://github.com/fduwjj
2023-10-06 18:52:46 +00:00
Joel Schlosser
17348b0f51 Implement split_with_sizes backward for NT (#110647)
Needed internally. Note that `split_with_sizes()` for NT is currently supported only on `dim=-1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110647
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #110646
2023-10-06 18:44:22 +00:00
Jesse Cai
33da6c8951 [sparse] Add i8i8->i32 support for cuSPARSELt (#110499)
Summary:

With the release of cuSPARSELt v0.5.0, we now have support for
int8 int8 -> int32 matmul.

This PR adds support for this via out_dtype.

Test Plan:
```
python test/test_sparse_semi_structured.py -k int32
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110499
Approved by: https://github.com/cpuhrsch
2023-10-06 18:32:47 +00:00
soulitzer
69ea214cc2 [reland] Update singleton int to error when inequality relation is undefined (#110672)
reland of https://github.com/pytorch/pytorch/pull/110044
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110672
Approved by: https://github.com/ezyang
2023-10-06 17:50:25 +00:00
Jeff Daily
e8f1f4ed66 [quant][pt2][ROCm] follow-up PR 109908 for miopen_batch_norm (#110653)
Fixes recent broken unit tests caused by PR #109908 because cudnn and miopen have separate batch norm functions.

```
2023-10-05T09:35:01.6606614Z _______________ TestQuantizePT2EQAT.test_qat_conv_bn_fusion_cuda _______________
2023-10-05T09:35:01.6606948Z Traceback (most recent call last):
2023-10-05T09:35:01.6607362Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 323, in test_qat_conv_bn_fusion_cuda
2023-10-05T09:35:01.6607767Z     self._verify_symmetric_xnnpack_qat_graph(
2023-10-05T09:35:01.6608217Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 130, in _verify_symmetric_xnnpack_qat_graph
2023-10-05T09:35:01.6608658Z     self._verify_symmetric_xnnpack_qat_graph_helper(
2023-10-05T09:35:01.6609105Z   File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 173, in _verify_symmetric_xnnpack_qat_graph_helper
2023-10-05T09:35:01.6609623Z     m = prepare_qat_pt2e(m, quantizer)
2023-10-05T09:35:01.6610171Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/quantize_pt2e.py", line 178, in prepare_qat_pt2e
2023-10-05T09:35:01.6610561Z     _fuse_conv_bn_qat(model)
2023-10-05T09:35:01.6611072Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 501, in _fuse_conv_bn_qat
2023-10-05T09:35:01.6611497Z     m = _fuse_conv_bn_qat_helper(m, is_cuda=True)
2023-10-05T09:35:01.6612065Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 575, in _fuse_conv_bn_qat_helper
2023-10-05T09:35:01.6612492Z     _get_conv_bn_getitem_nodes(r.replacements)
2023-10-05T09:35:01.6613058Z   File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 383, in _get_conv_bn_getitem_nodes
2023-10-05T09:35:01.6613465Z     assert bn_node is not None
2023-10-05T09:35:01.6613716Z AssertionError
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110653
Approved by: https://github.com/jerryzh168, https://github.com/pruthvistony
2023-10-06 15:30:55 +00:00
Jack Taylor
6b92c367c5 Add test_jit_cuda_fuser to ROCM_BLOCKLIST (#110440)
Adds the nvfuser related unit test suite to ROCM_BLOCKLIST as should not be run on ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110440
Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/lezcano
2023-10-06 08:47:15 +00:00
Michael Voznesensky
7d98549ca9 retain_graph=True in compiled_autograd (#110367)
Adds support for retain_graph=True - known as keep_graph_ internally in the autograd engine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110367
Approved by: https://github.com/jansel
2023-10-06 08:22:10 +00:00
Jon Chuang
63fe5de89b feat(optim): add SGD sparse multitensor to testing path (#110562)
Follow up to: https://github.com/pytorch/pytorch/pull/110454, which defines the infra for sparse multi tensor optimizer testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110562
Approved by: https://github.com/janeyx99
2023-10-06 07:48:25 +00:00
kshitij12345
371d8ba599 vmap: decompose real and imag instead of registering batch rule (#110508)
Clean-up

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110508
Approved by: https://github.com/zou3519
2023-10-06 06:01:12 +00:00
Banit Agrawal
64583c4d04 [CUDA Host Allocator] Add support of CudaHostRegister (#108488)
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.

Differential Revision: D45843715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488
Approved by: https://github.com/zdevito
2023-10-06 04:13:02 +00:00
Jon Chuang
57e9969021 feat(optim): Add adadelta multi_tensor support for complex, with has_complex shortcut (#110631)
Partial fix: https://github.com/pytorch/pytorch/issues/110606

More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805

CC: @janeyx99, @mlazos, @lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110631
Approved by: https://github.com/lezcano
2023-10-06 03:34:41 +00:00
Jon Chuang
11047be10e feat(optim): Add NAdamsupport for complex, with has_complex shortcut (#110634)
Partial fix: https://github.com/pytorch/pytorch/issues/110606

More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805

CC: @janeyx99 @mlazos @lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110634
Approved by: https://github.com/lezcano
2023-10-06 03:31:48 +00:00
Jon Chuang
347ea3fe0d feat(optim): Add RAdam support for complex, with has_complex shortcut (#110635)
Partial fix: https://github.com/pytorch/pytorch/issues/110606

More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805

CC: @janeyx99 @mlazos @lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110635
Approved by: https://github.com/lezcano
2023-10-06 03:29:26 +00:00
Catherine Lee
8a09fe4a05 [ez] Remove print in heuristics aggregation (#110621)
Move print to the beginning instead because putting it at the end makes it so you have to scroll through when debugging, and nothing in that function indicates that it should be printing anything

Also the line for printing disabled issues out of the for loop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110621
Approved by: https://github.com/huydhn
2023-10-06 02:04:53 +00:00
PyTorch MergeBot
dac895c10a Revert "Multiprocessing support for NT (#110292)"
This reverts commit f17fe89e14.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))
2023-10-06 01:07:40 +00:00
Tobias Ringwald
555c83d097 Added a UserWarning when using torch.{std,var,std_mean,std_var} with dof<=0 (#109824)
Fixes #109696.

This PR adds a `UserWarning` when calling
- `torch.var`
- `torch.var_mean`
- `torch.std`
- `torch.std_mean`

with an effective `dof<=0`. Until now, only `torch.cov` warned about this. The code also handles edge cases, such as `torch.empty`
```
>>> import torch; torch.std_mean(torch.empty(0), correction=0)
<stdin>:1: UserWarning: std_mean(): degrees of freedom is <= 0 (Triggered internally at /app/aten/src/ATen/native/ReduceOps.cpp:1671.)
(tensor(nan), tensor(nan))
```

multi-dim reductions

```
>>> import torch; torch.std_mean(torch.empty(10, 30, 20, 50), correction=600, dim=(1, 2))
<stdin>:1: UserWarning: std_mean(): degrees of freedom is <= 0 (Triggered internally at /app/aten/src/ATen/native/ReduceOps.cpp:1671.)
[... snip ...]
```

and a negative `correction`.

```
>>> import torch; torch.std_mean(torch.randn(0), correction=-5)
(tensor(nan), tensor(nan))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109824
Approved by: https://github.com/soulitzer
2023-10-06 01:03:47 +00:00
PyTorch MergeBot
81ce5d5725 Revert "pin_memory support for NT (#110404)"
This reverts commit 3597325bc7.

Reverted https://github.com/pytorch/pytorch/pull/110404 on behalf of https://github.com/kit1980 due to Previous PR in the stack caused CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110404#issuecomment-1749850211))
2023-10-06 01:03:17 +00:00
PyTorch MergeBot
330db8278b Revert "Update singleton int to error when inequality relation is undefined (#110044)"
This reverts commit 07331c65e6.

Reverted https://github.com/pytorch/pytorch/pull/110044 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110044#issuecomment-1749805209))
2023-10-05 23:55:37 +00:00
PyTorch MergeBot
1c3fae46ee Revert "Support SingletonSymNode mul with coefficient (#110369)"
This reverts commit eb8feb8ff8.

Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))
2023-10-05 23:51:28 +00:00
PyTorch MergeBot
236afe73a2 Revert "Update custom Function preserve torch function when inputs returned as-is (#109825)"
This reverts commit 4e73eee93f.

Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))
2023-10-05 23:49:41 +00:00
PyTorch MergeBot
fdf6055ea7 Revert "Add symbolic singleton int (#110370)"
This reverts commit a7145cb3a4.

Reverted https://github.com/pytorch/pytorch/pull/110370 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110370#issuecomment-1749801188))
2023-10-05 23:47:09 +00:00
Edward Z. Yang
f274c7b32c Add functional collective all_to_all_single and support it in Inductor (#110195)
Copy of https://github.com/pytorch/pytorch/pull/106655 from yf225
rebased on top of item() support changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110195
Approved by: https://github.com/Skylion007
2023-10-05 23:11:51 +00:00
Jerry Zhang
7b6042111f [quant][pt2e] Refactor conv related annotation for XNNPACKQuantizer (#110308)
Summary:
Since we changed IR that we are working with to pre autograd aten IR, it's easier
to use plain pattern match instead of relying on source_matcher_utils now, this
PR refactors the annotation for conv to use aten ops directly.

Also fixed reentrant test after this change.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110308
Approved by: https://github.com/kimishpatel
2023-10-05 22:36:18 +00:00
Aaron Gokaslan
668eb55488 [BE]: Enable some basic pytest style rules (#110362)
Adds some basic flake8-pytest-style rules from ruff with their autofixes. I just picked a couple uncontroversial changes about having a consistent pytest style that were already following. We should consider enabling some more in the future, but this is a good start. I also upgraded ruff to the latest version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110362
Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/kit1980
2023-10-05 21:40:43 +00:00
Wanchao Liang
c95cf4b4c9 [dtensor] add grad placements kwarg to to_local API (#110629)
When we convert to local tensor, dtensor can't track autograd or
gradient layout of the local tensor anymore, if user do sth not expected, there
needs to be a way for user to hint about the gradient layout of the
local tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110629
Approved by: https://github.com/zdevito
2023-10-05 21:34:01 +00:00
chilli
ada65508d2 Add option to flop counter formula registration to get raw values (#110591)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110591
Approved by: https://github.com/awgu
ghstack dependencies: #110501, #110504
2023-10-05 21:14:41 +00:00
William Wen
71beca4899 [dynamo, logging] Report name of defining class along side function name in Dynamo logs (#110190)
Implement https://github.com/pytorch/pytorch/issues/109236

Sample code:
```python
import torch

class AAA:
    class DUMMY:
        class DUMMY2:
            pass
    def dummy(self):
        def dummy2():
            pass
    class BBB:
        @staticmethod
        def CCC():
            class DDD:
                if True:
                    @staticmethod
                    def EEE():
                        x = [torch.ones(3, 3) for _ in range(5)]
                        return x
            return DDD

def fn():
    return AAA.BBB.CCC().EEE()

opt_fn = torch.compile(fn, backend="eager")

opt_fn()
```

Logs:
```bash
$TORCH_LOGS="trace_source" python playground2.py
[2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:21 in fn (fn)
[2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]     def fn():
[2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:22 in fn (fn)
[2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]         return AAA.BBB.CCC().EEE()
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:11 in CCC (AAA.BBB) (inline depth: 1)
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]             @staticmethod
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:13 in CCC (AAA.BBB.CCC.DDD) (inline depth: 1)
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]                 class DDD:
[2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:17 in <listcomp> (AAA.BBB.CCC.DDD.EEE)
[2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]                             x = [torch.ones(3, 3) for _ in range(5)]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110190
Approved by: https://github.com/ezyang, https://github.com/mlazos
2023-10-05 20:41:38 +00:00
Jon Chuang
c99de9f37c fix(optim): adagrad sparse multitensor incorrect early exit (#110454)
Fixes https://github.com/pytorch/pytorch/issues/110444#issuecomment-1745181530

This PR:
Passes

Main:
```
test/optim/test_optim.py::TestOptim::test_adagrad_sparse FAILED [0.0058s]

==================================================================================================================================== FAILURES =====================================================================================================================================
__________________________________________________________________________________________________________________________ TestOptim.test_adagrad_sparse __________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 1448, in test_adagrad_sparse
    self._test_rosenbrock_sparse(
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 128, in _test_rosenbrock_sparse
    self.assertEqual(params, params_c, atol=1e-6, rtol=1e-6)
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/torch/testing/_internal/common_utils.py", line 3309, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 0.09999999999993325 at index (1,) (up to 1e-06 allowed)
Greatest relative difference: 0.06249999999996089 at index (1,) (up to 1e-06 allowed)

```

CC: @janeyx99
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110454
Approved by: https://github.com/janeyx99
2023-10-05 20:37:57 +00:00
CK Luk
ecdd1bcf03 Back out "[Inductor] Break the loop fusion when node2 depends on node1 mutations (#109172)" (#110622)
Summary:
Original commit changeset: 03980fb054d5

Original Phabricator Diff: D49519512

Bisecting shows that this diff is the cause of S369683. Since this affects Ads production, need to back out this diff immediately.

Test Plan: See S369683

Reviewed By: ezyang

Differential Revision: D49958638

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110622
Approved by: https://github.com/yanboliang
2023-10-05 20:09:09 +00:00
Chien-Chin Huang
88616349d7 [state_dict][1/N] Implement the basic functions of distributed.checkpoint._state_dict (#105902)
This PR implements the basic functions of distributed.checkpoint._state_dict. This PR currently contains the flattening of optimizer state_dict which makes the PR too large. A later version may split it into 2 for a better code review.

Differential Revision: [D47647719](https://our.internmc.facebook.com/intern/diff/D47647719/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D47647719/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105902
Approved by: https://github.com/wz337
2023-10-05 20:04:15 +00:00
Bin Bao
298f01d9a2 [aotinductor] Avoid generating redundant kernel loading code (#110510)
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code.  This solves https://github.com/pytorch/pytorch/issues/105553.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110510
Approved by: https://github.com/chenyang78, https://github.com/jansel
2023-10-05 19:59:38 +00:00
Antoni Viros i Martin
efdf155383 Add requirement for input to AllGatherIntoTensor to be contiguous (#109561)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109561
Approved by: https://github.com/Chillee
2023-10-05 17:04:48 +00:00
Catherine Lee
d6e5898e8d Quieter logs in CI (#110033)
To reduce the amount of logs
* for successes, only print the part that says what tests ran and don't print the rest.  Zip the log into an artifact.  The line listing al the test names is really long, but if you view source of the raw logs, it will not wrap so it will only be one line.  The log classifier can also be configured to ignored this line. Gets rid of lines like `test_ops.py::TestCommonCPU::test_multiple_devices_round_cpu_int64 SKIPPED [0.0010s] (Only runs on cuda) [  9%]`
* for failures/reruns, print logs.  Do not zip.

Also
* change log artifact name

Examples of various logs:
a074db0f7f failures
1b439e24c4 failures

possibly controversial haha
should i include an option for always printing?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110033
Approved by: https://github.com/huydhn
2023-10-05 16:40:37 +00:00
Joel Schlosser
3597325bc7 pin_memory support for NT (#110404)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
ghstack dependencies: #110292
2023-10-05 16:33:22 +00:00
ydwu4
cc1de49340 [HigherOrderOp] fallthrough some keys by default. (#110478)
Fixes #109253

Test Plan:
Added a new test that shows default fallthrough keys can be overrided.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110478
Approved by: https://github.com/ezyang
2023-10-05 16:25:42 +00:00
chilli
f767a6c57a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 15:47:30 +00:00
PyTorch MergeBot
1e4c0641ce Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)"
This reverts commit 9648df1a6a.

Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))
2023-10-05 15:28:23 +00:00
Chien-Chin Huang
1a729618ef [FSDP][optim_state_dict] Make the new optimizer allgather fusion work with fine-tuning models (#110540)
With use_orig_params=True, it is possible that some parameters with the same FlatParameter are in the optimizer while others parameters are frozen. This PR makes the allgather fusion logic support the case.

Differential Revision: [D49922028](https://our.internmc.facebook.com/intern/diff/D49922028/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110540
Approved by: https://github.com/awgu, https://github.com/rohan-varma
2023-10-05 15:17:10 +00:00
Joel Schlosser
f17fe89e14 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-05 15:04:48 +00:00
Andrew Or
7c72238e4b Back out "Enable pickling model prepared with QAT qconfig" (#110392)
Summary:
D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out.

we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday.

Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes.

Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392
Approved by: https://github.com/junesg, https://github.com/jerryzh168
2023-10-05 14:41:00 +00:00
Oleg Khabinov
cf1b494afd [AOTInductor] Store loaded kernels in the model (#110554)
Defining kernels as static vars is problematic for subsequent model loading on non-default CUDA devices.

Assuming those kernels were loaded in context of the device #0, so, they are not nullptr anymore, therefore kernels won't work on devices other than the device #0.

This change makes devices remembered at model level in AOT mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110554
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-10-05 10:17:05 +00:00
PyTorch MergeBot
21019620ee Revert "[Dynamo] SizeVariable can be indexed by symint (#110349)"
This reverts commit 510ec7e3c5.

Reverted https://github.com/pytorch/pytorch/pull/110349 on behalf of https://github.com/PaliC due to breaking internal tests (check diff) ([comment](https://github.com/pytorch/pytorch/pull/110349#issuecomment-1748021641))
2023-10-05 04:42:33 +00:00
andrewor14
62cad5b5b0 [quant][pt2] Support cudnn_batch_norm in QAT fusion (#109908)
Summary: Today, we get different batch norm ops depending on
the device the model is placed on at export time. Exporting
`model.cpu()` gives `_native_batch_norm_legit`, while exporting
`model.cuda()` gives `cudnn_batch_norm`. QAT fusion currently
only supports the former and silently ignores the latter. This
commit fixes this by additionally matching on the latter op
during QAT fusion.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_fusion
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_relu_fusion

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D49615145](https://our.internmc.facebook.com/intern/diff/D49615145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109908
Approved by: https://github.com/jerryzh168
2023-10-05 04:08:44 +00:00
chilli
9648df1a6a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 01:34:57 +00:00
Amadeusz Skrzypczak
653f966df0 Fix type promotion of float8_e5m2 and float8_e4m3fn (#110279)
There is an issue with float8 type promotion, because _promoteTypesLookup doesn't contain records for few types between bfloat16 and float8.
I have simply moved float8 types just after bfloat16, however I'm not sure if it doesn't break serialization.

Please, decide if it can stay like this, or should I insert missing records filled with "ud" into _promoteTypesLookup instead of moving types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110279
Approved by: https://github.com/albanD
2023-10-05 01:28:48 +00:00
Bin Bao
c121f957c2 [aotinductor] Enable test_non_default_cuda_device on CI (#110509)
Summary: test_non_default_cuda_device needs to run on a multi-gpu CI instance

Differential Revision: [D49937115](https://our.internmc.facebook.com/intern/diff/D49937115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110509
Approved by: https://github.com/angelayi, https://github.com/khabinov, https://github.com/chenyang78
2023-10-05 01:25:50 +00:00
Jane Xu
9f40ffeec6 [optim] disable large_tensor tests for ROCm (#110559)
Closes #105825 #105820 #105754 by replacing with an incode skip.

Fixes #105825, fixes #105820, fixes #105754

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110559
Approved by: https://github.com/albanD
2023-10-05 01:21:21 +00:00
Edward Z. Yang
6a974bec5d Change flash attention outputs to be SymInt instead of int (#110533)
Fixes https://github.com/pytorch/pytorch/issues/110322

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
Justin Chu
f3aba45049 [ONNX] Create onnxscript-torchlib specific xfails/skips for fx tests (#110536)
Creates xfail_onnxscript/skip_onnxscript so that it is clear torchlib needs to support it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110536
Approved by: https://github.com/BowenBao
2023-10-05 00:39:05 +00:00
Xuehai Pan
449271f3f1 [pytree] Extract reusable generic tests for pytree (#110395)
Part of #109684

- #109684

Changes:

- Add new functions `tree_structure`, `tree_leaves`, `tree_map_` and `tree_map_only_` to Python pytree.
- Extract reusable tests for pytree to `TestGenericPytree`.
- Change `treespec_dumps` and `treespec_loads` in C++ pytree to call Python pytree and use JSON string as serialization type.
- Rename `torch.utils.pytree` -> `torch.utils._cxx_pytree`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110395
Approved by: https://github.com/zou3519
2023-10-04 23:40:50 +00:00
Jon Chuang
37afa0c349 fix(inductor): Increase coverage of Inductor ATen lowering (#110473)
Add sqrt to decomp testing path and fix missing `minimum`, `clamp_min`,`clamp_max` lowerings and/or registrations.

Follow up to: https://github.com/pytorch/pytorch/pull/110468#issuecomment-1745718602 (requires upstream to merge to avoid merge conflict)

CC: @janeyx99

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110473
Approved by: https://github.com/janeyx99
2023-10-04 23:40:46 +00:00
soulitzer
a7145cb3a4 Add symbolic singleton int (#110370)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110370
Approved by: https://github.com/ezyang
ghstack dependencies: #110044, #110369
2023-10-04 22:56:26 +00:00
soulitzer
eb8feb8ff8 Support SingletonSymNode mul with coefficient (#110369)
We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions.

Constraints:
- [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided.
- When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below.

Design:

Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369
Approved by: https://github.com/ezyang
ghstack dependencies: #110044
2023-10-04 22:56:15 +00:00
soulitzer
07331c65e6 Update singleton int to error when inequality relation is undefined (#110044)
Previously, something like j0 >= 3, would return False. In sympy however, it is not possible to make it so that both j0 >= 3 and j0 < 3 return False. In sympy, you only get to dispatch on Ge, and the remaining are derived, e.g. defining Ge(j0 >= 3) to be False would force Lt(j0, 3) to be True, which is not what we want.

In this PR, we make it so that both j0 >=3 and j0 < 3 error, so that in a future PR when we create the symbolic counterpart of this singleton, the behaviors can be the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110044
Approved by: https://github.com/ezyang
2023-10-04 22:55:53 +00:00
soulitzer
4e73eee93f Update custom Function preserve torch function when inputs returned as-is (#109825)
Fixes https://github.com/pytorch/pytorch/issues/109805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825
Approved by: https://github.com/albanD
2023-10-04 22:45:11 +00:00
Avik Chaudhuri
6fc09aee36 constant output errors (#110472)
When mapping between the original signature of a program and the graph-captured signature of its exported program, we emit errors when we see unexpected original or graph-captured inputs or outputs.

These errors can arise because of various reasons, e.g.:
1. some input or output has been lifted because of mutation
2. some type is not pytree-registered for flattening / unflattening
3. some type cannot be realized with graph operations

(This is probably not an exhaustive list.)

Previously we used to emit errors based on a vanilla id-based membership check between the two sides, mostly anticipating (1) as the reason for errors. But this does not do justice to errors because of (2) or (3).

This PR emits a different error when it finds (3) to be a probable cause. Specifically, it considers only Tensor and Sym* types to be "supported": no other type seems to be realizable by graph operations.

When (2) is a probable cause, we sometimes also hit the same error because we would expect the supported types to show through upon registration. But this kind of error may need some more work in the future.

Differential Revision: [D49885828](https://our.internmc.facebook.com/intern/diff/D49885828/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110472
Approved by: https://github.com/ydwu4
2023-10-04 21:56:20 +00:00
Dmitry Nikolaev
901aa85b58 fix TEST_ROCM definition to disable test_jit_cudnn_extension on rocm (#110385)
Define TEST_ROCM before modification TEST_CUDA. Otherwise TEST_ROCM will always be false and will not disable test_jit_cudnn_extension for rocm
Fixes https://github.com/pytorch/pytorch/issues/107182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110385
Approved by: https://github.com/jithunnair-amd, https://github.com/kit1980
2023-10-04 20:02:02 +00:00
Yang Chen
46a5558cd5 [AOTInductor] Simplified AOTInductor interface and model class (#110411)
Summary:
This PR removed several APIs from the AOTInductor interface,
which are not used by the client.

It also simplified AOTInductor's model class by removing
the dim info for input/output tensors. We included dim info
before to return max output shapes, which was used by the client
to allocate memory for output tensors. Now, we allocate output
tensor memory from the .so so that we don't need to maintain
such information any more. The deletion of dim info from
the model class also simplified the codegen quite a bit.

Test Plan: ci

Reviewed By: khabinov

Differential Revision: D49835430

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110411
Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/jansel
2023-10-04 18:35:24 +00:00
Oguz Ulgen
baa9af155e Add more tests for native triton kernels (#110486)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110486
Approved by: https://github.com/jansel
ghstack dependencies: #110403
2023-10-04 18:26:45 +00:00
Oguz Ulgen
f04b1a0d27 [AOTInductor] Implement autograd eager backend for native triton kernels (#110403)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2023-10-04 17:56:56 +00:00
Yukio Siraichi
0e55cc4986 [HigherOrderOp] Flatten outputs of wrap. (#109433)
Fix: #109247

This PR flattens `wrap` outputs by inlining `pytree.tree_flatten` function after calling
the inner function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109433
Approved by: https://github.com/zou3519
ghstack dependencies: #110290
2023-10-04 13:43:55 +00:00
Yukio Siraichi
f68f49c462 Refactor expect tests on test_higher_order_ops.py. (#110290)
This PR inlines the expecteds strings onto the `assertExpectedInline` calls, so that, when
change is needed, we may do that by using the `expectedtest` machinery: setting the
environment variable `EXPECTTEST_ACCEPT=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110290
Approved by: https://github.com/zou3519
2023-10-04 13:43:55 +00:00
Ken Jin
31d635803b [Dynamo] Fx proxy for builtin all with list iterators (#109972)
Fixes https://github.com/pytorch/pytorch/issues/109057.
Fixes https://github.com/pytorch/pytorch/issues/103620.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109972
Approved by: https://github.com/ezyang
2023-10-04 07:59:26 +00:00
Yu Guo
2bf3ca1be7 [torchdynamo] preserve deterministic_algorithms_warn_only in convert_context (#110457)
Summary: preserve deterministic_algorithms_warn_only  in dynamo context

Test Plan: modified unit tests to test warn_only

Differential Revision: D49872622

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110457
Approved by: https://github.com/jansel
2023-10-04 07:12:32 +00:00
Jez Ng
dddf581da7 [dynamo] Add graph break on requires_grad_() (#110053)
Fixes #107861.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110053
Approved by: https://github.com/eellison
2023-10-04 06:22:16 +00:00
Jon Chuang
3fd938369f add foreach_abs meta registration and inductor decomp (#110468)
Fixes https://github.com/pytorch/pytorch/issues/110458

Somehow it is on allowlist but not on testing path.

CC @janeyx99

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110468
Approved by: https://github.com/janeyx99
2023-10-04 06:09:37 +00:00
Yanbo Liang
510ec7e3c5 [Dynamo] SizeVariable can be indexed by symint (#110349)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110349
Approved by: https://github.com/williamwen42
2023-10-04 03:20:18 +00:00