Commit Graph

32108 Commits

Author SHA1 Message Date
Jerry Zhang
1b51d29b66 [quant][pt2e] Enable constant folding for quantize ops (#109343)
Summary:
This PR added constant folding for quantize ops so that instead of storing fp32 weight in the
quantized model, we'll get int8/int16 etc. weight

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_fold_quantize

also will verify in executorch later

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343
Approved by: https://github.com/kimishpatel, https://github.com/jgong5
2023-09-27 06:04:45 +00:00
Angela Yi
ddbf1aab64 [export] Add dynamic_shapes to _export.aot_compile (#110101)
Summary: Following the new dynamic_shapes API (introduced in https://github.com/pytorch/pytorch/pull/108448), we will also add a dynamic_shapes API to _export.aot_compile

Test Plan: CI

Differential Revision: D49653815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110101
Approved by: https://github.com/gmagogsfm
2023-09-27 04:10:22 +00:00
Edward Z. Yang
f7c9ef88f5 Add masked_select abstract impl (#110103)
Fixes https://github.com/pytorch/pytorch/issues/109871

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110103
Approved by: https://github.com/bdhirsh
2023-09-27 04:07:58 +00:00
SS-JIA
dec140f1ea [core IR] Add a core decomposition for aten.all (#110093)
## Context

Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093
Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano
2023-09-27 01:31:41 +00:00
Yukio Siraichi
51a8c166a6 Add test for ShapeEnv recording fallback. (#109944)
This PR adds a test for the previous PR in this stack: #109904. In summary, it calls
functions decorated with `@record_shapeenv_event`, that don't have an explicit `ShapeEnv`
parameter, with arguments that don't hold a `ShapeEnv` instance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109944
Approved by: https://github.com/ezyang
2023-09-27 00:50:14 +00:00
SS-JIA
9928c10e71 [core IR] Add glu as a core decomposition (#110043)
## Context

Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043
Approved by: https://github.com/peterbell10, https://github.com/lezcano
ghstack dependencies: #110046
2023-09-27 00:23:05 +00:00
Yang Chen
4d0ae7c9da [inductor] support _scaled_dot_product_flash_attention fallback (#110085)
Summary:
This PR supports _scaled_dot_product_flash_attention fallback kernel.
Note that in the abi_compatible mode, we retrieve outputs by passing
output argument pointers rather than relying on std::get.

It also fixes an issue related to dynamic shapes, where we wrongfully
query undefined dynamic symbols.

Test Plan: ci

Reviewed By: frank-wei

Differential Revision: D49620191

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110085
Approved by: https://github.com/desertfire
2023-09-27 00:09:56 +00:00
Shiyan Deng
19ca883f8b [pytorch][jit] allow passing in obj loader in unpickle api (#109730)
Summary: We are trying to use wired message to pass python objects like KJT. In order to make JIT be able to unpickle it, we need to provide a type resolver as well as an obj loader. This diff modify the interface to let we be able to do that.

Test Plan:
Rely on current CI to make sure existing usage doesn't break.

In the next diff, test e2e

Differential Revision: D49438569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109730
Approved by: https://github.com/davidberard98
2023-09-26 23:50:20 +00:00
Edward Z. Yang
3262c5358f Use _check_is_size for validate_dim_length (#109849)
_check_is_size has some extra juice for unbacked SymInts, use it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109849
Approved by: https://github.com/yanboliang
2023-09-26 23:33:31 +00:00
Wanchao Liang
27443eadeb [dtensor][7/n] remove reduction rule (#109144)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109144
Approved by: https://github.com/fduwjj
ghstack dependencies: #108263, #108264
2023-09-26 22:24:50 +00:00
Wanchao Liang
2dd9a79d22 [dtensor][6/n] refactor reduction to use op strategy (#108264)
This PR refactors the reduction op to use strategy based propagation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108264
Approved by: https://github.com/fduwjj
ghstack dependencies: #108263
2023-09-26 22:24:50 +00:00
Wanchao Liang
986d255db2 [dtensor][5/n] switch random ops to op strategy (#108263)
This PR switches the random ops to use op strategy instead of rule
based, this is a first series of PRs to refactor ops after we refactor
op dispatch logic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108263
Approved by: https://github.com/fduwjj
2023-09-26 22:24:42 +00:00
Richard Zou
bb9779ecd2 Revert D49640259: Revert D49615962: [optests] Test names in failure dicts should be prefixed with test class (#110094)
Summary: Revert D49640259: Revert D49615962: [optests] Test names in failure dicts should

Test Plan: revert-hammer

Differential Revision: D49645397

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110094
Approved by: https://github.com/izaitsevfb
2023-09-26 21:16:36 +00:00
PyTorch MergeBot
194d9aa0f2 Revert "[Dynamo] Match closures by code ID (#109427)"
This reverts commit 3de0857503.

Reverted https://github.com/pytorch/pytorch/pull/109427 on behalf of https://github.com/voznesenskym due to Fails test `PYTORCH_TEST_WITH_DYNAMO=1 python test_ops.py -k test_out_warning__refs_cat_cpu ([comment](https://github.com/pytorch/pytorch/pull/109427#issuecomment-1736101561))
2023-09-26 18:54:36 +00:00
Angela Yi
a7409695bb [export] Verifier for exported program (#109519)
Summary:
X-link: https://github.com/pytorch/executorch/pull/292

Added a verifier for the graph signature in a exported program

Test Plan: CI

Differential Revision: D48926643

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109519
Approved by: https://github.com/zhxchen17
2023-09-26 18:47:43 +00:00
Jane Xu
0a60219fe3 [foreach] Fix 0-size handling for real for real (#109402)
@crcrpar's last attempt to fix the 0-size problem unfortunately did not pass all cases. See my comment in https://github.com/pytorch/pytorch/issues/100701. When we have a tail tensor of size 0, the old code would mess with the chunk logic to check the previous tensor's length. This is flawed because:
1. if the previous tensor was also 0 sized, (so a tensor list of [tensor, tensor, tensor, ..., 0-sized tensor, 0-sized tensor],) chunks would still be 0 and the nested for loop would be missed.
2. the nested forloop pronounces side effects on tensorListMeta that _shouldn't_ be there! This can mess up the compute in unexpected ways that I haven't really needed to reason through.

We noticed that the problem had not been fixed due to an internal report. This PR solves the issue by:
- removing the finagling of chunks when the tail tensor is 0-sized
- adding a surefire way for the kernel to be launched in the case where the last tensor is 0-sized AND there's content in the metadata, signifying there is stuff to compute still.

## test plan

As I went through the code, I also added some comments explaining what's up and modified our tensor inputs to ensure that this case is tested in the test_parity test in test_foreach.py. Yes, I do realize there is quite a bit of duplication and that this file could be due for a refactor. That said, the primary goal of this PR is to fix the pretty egregious bug and refactoring can be a followup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109402
Approved by: https://github.com/albanD
2023-09-26 17:38:20 +00:00
Rodrigo Kumpera
317e39a8ad [C10d] Cleanup collective sequence number. (#109136)
Sequence numbers must be associated with a Work object
if we want to use it as a way to report collective progress.

The API surface change is introducing Work::getSequenceNumber, which
should eventually be exposed to python.

The bulk of this change is changing gloo to make the sequence number
be always in use and weave it to the dozens subclasses of Work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109136
Approved by: https://github.com/fduwjj
2023-09-26 17:17:04 +00:00
Li-Huai (Allan) Lin
d91492a7a4 [MPS] Fix sort with empty tensor. (#109584)
Fixes #107284
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109584
Approved by: https://github.com/kulinseth, https://github.com/albanD
ghstack dependencies: #109557, #109574
2023-09-26 16:30:38 +00:00
Bin Bao
993530ee4f [aotinductor] Relax the CUDAGuard device index check (#110030)
Summary: Although AOTInductor only supports running on a single cuda device, it does work in the case where there is a mix of cpu and cuda ops. So instead of asserting if a CUDA index appears for the first time, we check if there is only one cuda device index. This solves https://github.com/pytorch/pytorch/issues/109655

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110030
Approved by: https://github.com/jansel
2023-09-26 16:23:23 +00:00
leslie-fang-intel
0dcea70bfd fix sfdp patern 13 accuracy issue (#110001)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110001
Approved by: https://github.com/eellison
2023-09-26 15:23:45 +00:00
PyTorch MergeBot
2393864070 Revert "[optests] Test names in failure dicts should be prefixed with test class (#110045)"
This reverts commit 76fcec74c4.

Reverted https://github.com/pytorch/pytorch/pull/110045 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110045#issuecomment-1735711094))
2023-09-26 14:56:08 +00:00
rzou
ea20db8aa0 [optests] Excise unused operator_compile_check (#110011)
The recommendation is to just use `opcheck`, which has superceded all
uses of `operator_compile_check`.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110011
Approved by: https://github.com/ezyang
ghstack dependencies: #109912
2023-09-26 13:24:21 +00:00
PyTorch MergeBot
812bf847b7 Revert "Add test for ShapeEnv recording fallback. (#109944)"
This reverts commit a4dec8d306.

Reverted https://github.com/pytorch/pytorch/pull/109944 on behalf of https://github.com/atalman due to New test failing internally ([comment](https://github.com/pytorch/pytorch/pull/109944#issuecomment-1735512734))
2023-09-26 13:11:22 +00:00
Peter Bell
92d86cd1ad [inductor] Fix triton compiler error in multilayer any (#109325)
Fixes #109196

When we have a split reduction and the tensor is not an even multiple of the split size,
we use `ops.masked` to pad to an even multiple. In the case here we generated:
```python
tmp5 = tl.where(mask, tmp4, 0)
```

which implicitly promotes our boolean value to `int32`. The fix is to give the default
value the same dtype as `result`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109325
Approved by: https://github.com/lezcano
2023-09-26 12:29:29 +00:00
PyTorch MergeBot
1b90f07f5a Revert "Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)" (#109906)"
This reverts commit d0fe8fa5db.

Reverted https://github.com/pytorch/pytorch/pull/109906 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/109906#issuecomment-1735416852))
2023-09-26 12:10:25 +00:00
wz337
8140494afd [3/N][2D] Enable training with new 2D flow (#110034)
Replacing https://github.com/pytorch/pytorch/pull/109553 as it gets reverted.

This PR enables training with new 2D flow and adds associated test. In addition, this PR moves the tensor/parallel/_data_parallel_utils.py that are fsdp specific back to tensor/parallel/fsdp.py to avoid circular dependency for ddp.py and test/distributed/tensor/parallel/test_ddp_2d_parallel.py.

state_dict related changes would be in later PRs.

cc. @fegin, @fduwjj, @wanchaol, @awgu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110034
Approved by: https://github.com/fduwjj
2023-09-26 09:14:15 +00:00
Animesh Jain
0673aa3d28 [dynamo][guards-log] Print nn module guard saved dict versions for debugging (#110028)
This is the output for nn module guards

~~~
[DEBUG] GUARDS:
[DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False           # _dynamo/variables/builder.py:1356 in wrap_fx_proxy_cls
[DEBUG] ___check_obj_id(L['self'], 139820807110912)                   # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_0(L['self']) # versions(mod=9998, _parameters=1194395, _buffers=1194397, _modules=1194423, _forward_hooks=1194405, _forward_pre_hooks=1194411, _backward_hooks=1194402, _backward_pre_hooks=1194400)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0], 139817945727568)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_1(L['self'].mods[0]) # versions(mod=10001, _parameters=1194428, _buffers=1194430, _modules=1194522, _forward_hooks=1194438, _forward_pre_hooks=1194444, _backward_hooks=1194435, _backward_pre_hooks=1194433)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[1], 139817945560640)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_2(L['self'].mods[1]) # versions(mod=10001, _parameters=1194660, _buffers=1194662, _modules=1194753, _forward_hooks=1194670, _forward_pre_hooks=1194676, _backward_hooks=1194667, _backward_pre_hooks=1194665)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0].linear, 139817945727856)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_3(L['self'].mods[0].linear) # versions(mod=10004, _parameters=1470004, _buffers=1194467, _modules=1194493, _forward_hooks=1194475, _forward_pre_hooks=1194481, _backward_hooks=1194472, _backward_pre_hooks=1194470)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] ___check_obj_id(L['self'].mods[1].linear, 139817945561120)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_4(L['self'].mods[1].linear) # versions(mod=10004, _parameters=1470008, _buffers=1194699, _modules=1194725, _forward_hooks=1194707, _forward_pre_hooks=1194713, _backward_hooks=1194704, _backward_pre_hooks=1194702)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] utils_device.CURRENT_DEVICE == None                           # _dynamo/output_graph.py:373 in init_ambient_guards
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110028
Approved by: https://github.com/ezyang
ghstack dependencies: #110023, #110039
2023-09-26 08:53:07 +00:00
SS-JIA
5df8aca994 [core IR] Add a core decomposition for floor_divide (#110046)
## Context

Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table.

This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition

```
# TorchInductor-only decomposition. It should not be taken to core.
# See https://github.com/pytorch/torchdynamo/pull/1120
```

but couldn't discern the reason why this is the case. cc: @lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046
Approved by: https://github.com/peterbell10
2023-09-26 08:39:21 +00:00
Yukio Siraichi
26e8cc0465 Add test for ShapeEnv state when not recording. (#109945)
This PR adds a test for checking `ShapeEnv` state when it's built with
`should_record_events=False`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109945
Approved by: https://github.com/ezyang
ghstack dependencies: #109904, #109944
2023-09-26 07:20:46 +00:00
Animesh Jain
2ac7e52d34 [dynamo][nn_module_guards] Config flag to disable nn_module_guards (#110039)
This flag is requested by @Chillee who is seeing recompilations with simple gpt experiments. We are observing recompilations because `_parameters` ordered dict keeps changing from run to run, and its unclear why that is happening.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110039
Approved by: https://github.com/Chillee
ghstack dependencies: #110023
2023-09-26 06:35:23 +00:00
rzou
76fcec74c4 [optests] Test names in failure dicts should be prefixed with test class (#110045)
We want to use the same failures dict for multiple TestCase. This happens
common in e.g. fbgemm. To move towards that, we need to prefix each test name
with their test class to avoid ambiguity

Differential Revision: [D49615962](https://our.internmc.facebook.com/intern/diff/D49615962/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110045
Approved by: https://github.com/williamwen42
2023-09-26 03:21:12 +00:00
Jez Ng
41bb5c27a2 Enable typechecking for _inductor/fx_passes/joint_graph.py (#109955)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109955
Approved by: https://github.com/Skylion007
ghstack dependencies: #109951, #109952, #109954
2023-09-26 02:49:43 +00:00
Jez Ng
86762f33d1 Enable typechecking for _inductor/fx_passes/pad_mm.py (#109954)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109954
Approved by: https://github.com/Skylion007
ghstack dependencies: #109951, #109952
2023-09-26 02:49:43 +00:00
Jez Ng
55f8553078 Enable typechecking for _inductor/fx_passes/pre_grad.py (#109952)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109952
Approved by: https://github.com/Skylion007
ghstack dependencies: #109951
2023-09-26 02:49:42 +00:00
Jez Ng
89fc66fb36 Enable typechecking for _inductor/fx_passes/split_cat.py (#109951)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109951
Approved by: https://github.com/Skylion007
2023-09-26 02:49:40 +00:00
rzou
f8fcc54f70 Add torch.library.impl_abstract (#109912)
Changelog:
- torch.library.impl_abstract optionally accepts a torch.library.Library
  object. If passed in, then the lifetime of the registration is tied to
  the Library object.
- we've also changed torch.library.impl_abstract to work on all
  operators, including overloads.
- we refactored the `torch._custom_ops.*` and `torch._custom_op.*`
  impl_abstract APIs and put them under torch._library. This is the
  final resting place for them. I will follow-up with deleting
  all the `torch._custom_ops.*` stuff later.
- There is a new "SimpleOperatorRegistry" where we actually collect the
  abstract_impl. We will expand this to also hold the other
  torch._custom_ops.* APIs when we move those to torch.library

NB: Previously we had designed
`impl_abstract` assuming a very high-level Python-only custom op API.
We've revisited that since; now, impl_abstract works for all custom ops,
no matter python or C++, no matter the schema. The new refactored design
reflects this better.

Test Plan:
- existing and new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109912
Approved by: https://github.com/ezyang
2023-09-26 01:59:50 +00:00
Animesh Jain
b481349d3c [dynamo][guards-log] Do not print duplicate guard entries (#110023)
Cleans up logs for nn module guards. They always get duplicated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110023
Approved by: https://github.com/ezyang
2023-09-26 01:59:25 +00:00
Yuqing Jiang
56659844f9 [profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751)
Summary:
https://github.com/pytorch/pytorch/issues/109263
Show the shape of tensorlist when the length is < 30.

Test Plan:
{F1097707985}
and unit tests

Reviewed By: davidberard98

Differential Revision: D49351902

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751
Approved by: https://github.com/davidberard98
2023-09-26 01:03:54 +00:00
Bin Bao
4bf1cd6961 [aotinductor] Rename aot_runtime to aoti_runtime (#110007)
Summary: Make the naming more explicit

Differential Revision: D49593528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007
Approved by: https://github.com/houseroad
2023-09-26 00:46:54 +00:00
Yanbo Liang
a81cb0de16 [Dynamo] Support python class member_descriptor (#109956)
Fixes Meta internal cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109956
Approved by: https://github.com/jansel
2023-09-26 00:03:41 +00:00
Edward Z. Yang
5f6216b12c Add torch.fx.experimental.recording to uninteresting_files() (#109887)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109887
Approved by: https://github.com/Chillee
2023-09-25 23:22:29 +00:00
Mu-Chu Lee
7af30ea54c [AOTInductor] Bug fix for redefining symbol name (#110041)
Summary:
Bug fix for redefining symbol name.

Test Plan:
python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --cold-start-latency --only OPTForCausalLM

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110041
Approved by: https://github.com/desertfire
2023-09-25 23:03:06 +00:00
Andrei Gheorghe
6275f91654 Improved DDP checkpoint documentation (#106985)
Amended the documentation for the specified case.

Fixes #84589

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106985
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2023-09-25 22:54:24 +00:00
Sam Larsen
7ed06e8317 [inductor] enable mypy checking in torch/_inductor/codegen/cpp.py (#109729)
Summary: Add enough typehints / ignores to enable mypy checking in torch/_inductor/codegen/cpp.py

Test Plan: lintrunner

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109729
Approved by: https://github.com/Skylion007
2023-09-25 22:53:05 +00:00
Pritam Damania
ab70183c53 [RFC] Allow "spawn" start method for torchinductor workers. (#108850)
Context: https://github.com/pytorch/pytorch/issues/108586

This PR adds a config to torchinductor such that users can specify the multiprocessing context for TorchInductor workers in codecache.

This would allow users a choice of using "spawn" in multithreaded environments instead of "fork" being hardcoded as the default.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108850
Approved by: https://github.com/ezyang, https://github.com/zdevito
2023-09-25 21:30:17 +00:00
Yukio Siraichi
a4dec8d306 Add test for ShapeEnv recording fallback. (#109944)
This PR adds a test for the previous PR in this stack: #109904. In summary, it calls
functions decorated with `@record_shapeenv_event`, that don't have an explicit `ShapeEnv`
parameter, with arguments that don't hold a `ShapeEnv` instance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109944
Approved by: https://github.com/ezyang
ghstack dependencies: #109904
2023-09-25 20:59:41 +00:00
Mwiza Kunda
5c4b5baf21 Fix python decomps for OpOverloadPackets and add tests (#107707)
- Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments)

- Add out parameter wrappers to python decomps for aten ops that have out overloads

CC. @ezyang @albanD @lezcano

Fixes #107713

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707
Approved by: https://github.com/lezcano
2023-09-25 20:53:30 +00:00
PyTorch MergeBot
c1a2f35805 Revert "Disallow skipping dynamo (#109476)"
This reverts commit 7bb1d10c2f.

Reverted https://github.com/pytorch/pytorch/pull/109476 on behalf of https://github.com/atalman due to Failing internal CI ([comment](https://github.com/pytorch/pytorch/pull/109476#issuecomment-1734402581))
2023-09-25 20:20:50 +00:00
fwenguang
c4f2b6dbd2 [profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002)
At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type.
PyCMethod_Type is subtype of PyCFunction_Type, ref to
f2eaa92b0c/Objects/methodobject.c (L372).
So there should use PyCFunction_Check to check arg->ob_type.

Fixes #109877

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002
Approved by: https://github.com/ezyang
2023-09-25 20:17:25 +00:00
PyTorch MergeBot
83deaa16ed Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178)"
This reverts commit b7a95f4fdb.

Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))
2023-09-25 20:05:25 +00:00