Commit Graph

12789 Commits

Author SHA1 Message Date
Yu Guo
2bf3ca1be7 [torchdynamo] preserve deterministic_algorithms_warn_only in convert_context (#110457)
Summary: preserve deterministic_algorithms_warn_only  in dynamo context

Test Plan: modified unit tests to test warn_only

Differential Revision: D49872622

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110457
Approved by: https://github.com/jansel
2023-10-04 07:12:32 +00:00
Xiaodong Wang
562c68e56f [nccl] denoise warning msg (#110433)
Summary: This is too noisy for anything set with TORCH_NCCL_USE_COMM_NONBLOCKING. Just warn once.

Test Plan: GH CI

Differential Revision: D49846339

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110433
Approved by: https://github.com/awgu
2023-10-04 06:21:53 +00:00
zdevito
3fe3439242 Use LLVMSymbolizer directly for unwind inside fbcode (#108800)
Using LLVMSymbolizer directly avoids having to call fork which has caused timeouts in some circumstances.

Differential Revision: [D49070589](https://our.internmc.facebook.com/intern/diff/D49070589/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108800
Approved by: https://github.com/aaronenyeshi
2023-10-04 04:04:08 +00:00
Howard Huang
efb73fe8e4 Fix send()/recv() to adhere to timeout (#109611)
Summary: Point to point ops don't enqueue their work to the `workMetaList_` which means that the NCCL watchdog does not watch over them, hence they do not respect the collective timeouts.

Test Plan:
While trying to add a test I found we dont have tests which validate the nccl watch dog. It looks like this is because we dont have a good way to detect when nccl watchdog has thrown an error (exception is thrown in a side thread) in our testing framework / `MultiprocessTestCase`

I manually tested this change with the script in https://github.com/pytorch/pytorch/issues/109401, but need to look more closely at how to automate a test for NCCL watchdog

Differential Revision: D49418976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109611
Approved by: https://github.com/wconstab
2023-10-03 23:27:45 +00:00
Xiaodong Wang
a0bffe7ed7 [S366352] Print nccl version during initialization (#110305)
Summary: print nccl version during initialization

Differential Revision: D49603220

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110305
Approved by: https://github.com/Skylion007, https://github.com/fegin, https://github.com/rohan-varma
2023-10-03 23:09:48 +00:00
cyy
c31fcdaa4f [3/N] Add -Wdeprecated and related fixes (#109698)
This PR follows #108626. Hopefully we can enable the warning in the next PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-10-03 22:50:53 +00:00
Mu-Chu Lee
836ba6430a [AOTInductor] Initial functionality for Inf and NaN checker (#109526)
Summary:
Add initial functionality for Inf and NaN checker for AOTInductor.

Test Plan:
Included in commit. Skipped for CI as SIGABRT can't be captured by pytest.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D49379751](https://our.internmc.facebook.com/intern/diff/D49379751)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109526
Approved by: https://github.com/chenyang78
2023-10-03 22:39:42 +00:00
Octavian Guzu
b5c3a17c2c [fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-buffer-overflow-far-from-bounds (size 4) in c10::IValue::IValue() (#110441)
Summary: This diff fixes a heap underflow found by fuzzing in torch/csrc/jit/runtime/vararg_functions.cpp

Test Plan:
CI and
```
arc lionhead crash reproduce 1753074381791061
```
doesn't crash anymore.

Differential Revision: D49537535

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110441
Approved by: https://github.com/Skylion007
2023-10-03 18:48:12 +00:00
Yang Chen
da63c7f2c3 [AOTInductor] remove CUDA dependency for cpp backend (#110409)
Summary:
Previously, we link against cuda libs even for pure cpp backend.
This caused issues for cases where the inference platform does not
have GPUs. This diff removed cuda dependency for cpp backend.

Reviewed By: bertmaher, muchulee8, mikekgfb

Differential Revision: D49800712

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110409
Approved by: https://github.com/bertmaher, https://github.com/desertfire
2023-10-03 18:36:00 +00:00
Bert Maher
aecfe5d168 [aoti] Remove pessimizing move (#110446)
"`std::move` of a temporary prevents copy elision" says the compiler,
and I am pretty sure it is right.  Since AtenTensorHandle* implicitly converts
to RAIIAtenTensorHandle, I simply called emplace_back; happy to put an explicit
ctor if that makes folks happier.

Differential Revision: [D49842542](https://our.internmc.facebook.com/intern/diff/D49842542/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110446
Approved by: https://github.com/desertfire, https://github.com/Skylion007
ghstack dependencies: #110445
2023-10-03 17:44:58 +00:00
Bert Maher
174e46b853 [inductor][easy] Free functions in headers should be declared inline (#110445)
If multiple files include model.h, you end up with duplicate symbols errors.

Differential Revision: [D49842167](https://our.internmc.facebook.com/intern/diff/D49842167/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110445
Approved by: https://github.com/desertfire, https://github.com/Skylion007
2023-10-03 17:44:49 +00:00
David Berard
4069d1de59 [distributed] Remove recordStream for callback that ends a profiler event (#109933)
**Background**: recordStreams can result in memory spikes, so we don't want them to appear in FSDP (https://dev-discuss.pytorch.org/t/fsdp-cudacachingallocator-an-outsider-newb-perspective/1486). @ awgu is working on fixing this, but it turns out profiler was causing recordStream to get called when it is enabled.

Why profiler was causing recordStream to get called: NCCL calls add profiler events manually; they register a callback to be executed when the future for the collective is completed; this indicates the end of the CPU-side profiler event for the callback:

c2c7c4035f/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (L1822-L1824)

In order to guarantee safety, ivalue::Future::invokeCallback calls `recordStream` on the future's storage buffers; this marks the fact that other streams (e.g. the one that the callback runs on) may need to use the storage.

c2c7c4035f/aten/src/ATen/core/ivalue_inl.h (L1171-L1173)

**Change**: The end-profiler-event callback doesn't actually use the future, so we don't need to recordStream on it. This PR introduces an optional parameter `uses_future` for adding callbacks; a user can set this variable to "false" to unsafely skip the recordStream, if the user knows that the future will not be used in the lambda.

**Tests**: (a) unit tests; (b) added an assert in recordStream: c2c7c4035f/c10/cuda/CUDACachingAllocator.cpp (L3260) and verified that it doesn't get triggered when running basic distributed tests w/ profiler enabled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109933
Approved by: https://github.com/wconstab
2023-10-03 14:40:43 +00:00
cyy
d58a91b2a6 [4/N] Move remaining c10::variant calls to std::variant (#110382)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382
Approved by: https://github.com/Skylion007
2023-10-02 23:52:04 +00:00
sunghyunjun
b5268456f9 Fix optimize_for_inference to support modules that don't have a forward method (#110013)
Fixes #108662

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110013
Approved by: https://github.com/davidberard98
2023-10-02 20:13:44 +00:00
RihamSelim
92242f599a [PyTorch] Add Expanded call stack to nodes [Take 2] (#110229)
Summary:
Adding back D46578700 / PR https://github.com/pytorch/pytorch/pull/108426

Note: The changes were originally reverted due to memory regression, these changes are putting the code behind a gflag so it is only used by binaries that require expanded stack for BPF Profiling.

Original Diff comment:
To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process.
The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process.
Each Node has an extended attribute that has an index to a vector of stack frames expanded_node_stacks_
node_stack_attr_symbol_ is only needed to make accessing the stack vector index attribute easier from BPF.

Test Plan:
- Verified using BPF Program in subsequent diffs
- Perf testing for loading large model: P822455246

Differential Revision: D49565461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110229
Approved by: https://github.com/zdevito
2023-10-02 19:52:41 +00:00
Li-Huai (Allan) Lin
a3c1e3c95c Generalize toAccumulateType() (#108248)
Trying to address this comment: https://github.com/pytorch/pytorch/pull/106666#discussion_r1297397554

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108248
Approved by: https://github.com/kulinseth, https://github.com/albanD
2023-10-02 16:34:36 +00:00
cyy
d0ad848aa5 Enable misc clang-tidy checks (#110283)
This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283
Approved by: https://github.com/albanD
2023-09-30 10:39:52 +00:00
Nikita Shulga
ad8aef0f98 [BE] [3/N] Use nested namespaces (#110314)
Mostly in torch/csrc/jit/runtime and in `ATen/cuda/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314
Approved by: https://github.com/seemethere
2023-09-30 02:23:48 +00:00
davidgens-cerebras
ee0bff209c [LTC] correct AdaptiveAvgPool3d channel dim index for shape inference (#109822)
Fixes #109821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109822
Approved by: https://github.com/mikaylagawarecki, https://github.com/alanwaketan
2023-09-29 22:54:12 +00:00
PyTorch MergeBot
b083058e45 Revert "Make unbind() overrideable for NT subclass (#109122)"
This reverts commit f5a23ca78d.

Reverted https://github.com/pytorch/pytorch/pull/109122 on behalf of https://github.com/PaliC due to breaking slow tests ([comment](https://github.com/pytorch/pytorch/pull/109122#issuecomment-1741555305))
2023-09-29 22:41:56 +00:00
Octavian Guzu
9c7071b0e3 [fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-use-after-free (size 8) in std::_Function_base::_M_empty() (#110289)
Summary: This diff fixes a heap UAF found by fuzzing in torch/csrc/jit/mobile/interpreter.cpp

Test Plan:
CI and
```
arc lionhead crash reproduce 1009060456885023
```
doesn't crash anymore.

Reviewed By: malfet

Differential Revision: D49538326

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110289
Approved by: https://github.com/malfet
2023-09-29 22:32:38 +00:00
Mu-Chu Lee
d6d3f6cfe5 Add weight update for DSOModel. (#110273)
Summary: Add weight update for DSOModel and AOTInductorModel

Test Plan: buck2 test accelerators/workloads/models/slimdsnn:slimdsnn_dso_test - SlimDSNN.DSO_Update_Constants

Reviewed By: mikekgfb

Differential Revision: D49748685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110273
Approved by: https://github.com/hl475
2023-09-29 18:14:01 +00:00
jjsjann123
e6b5e0ecc6 removing the functionality of nvfuser python APIs (#110124)
Removing the functionalities from nvfuser python APIs.

Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support.

I'll have the next PR to actually remove the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124
Approved by: https://github.com/davidberard98
2023-09-29 01:45:00 +00:00
skc7
bbb95878e9 [LLVM] Update apis incompatible with llvm versions in codegen (#110200)
Opaque pointers support is disabled in llvm 14 and enabled by default from llvm 15 and above.
setOpaquePointers api usage is deprecated from llvm 16. Removed this API.

Update CreateMalloc and CreateFree apis for latest llvm release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110200
Approved by: https://github.com/Skylion007
2023-09-28 21:49:30 +00:00
cyy
168f516fae [3/N] Move c10::variant to std::variant (#110141)
This PR moves more c10::variant calls to std::variant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141
Approved by: https://github.com/Skylion007
2023-09-28 18:43:55 +00:00
cyy
7f5fd92372 Reland use std::make_unique after internal changes (#109742)
check internal
follow up of #109780
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109742
Approved by: https://github.com/ezyang
2023-09-28 17:24:08 +00:00
Bert Maher
5f417fd710 [aot_inductor] Lightweight model runner (#110158)
It's useful to have a simple, lightweight way to run a model that adds
essentially no overhead to calling the model's generated `run_impl` method.
This C API is a super thin wrapper around AOTInductorModel: Create, Run, and
Delete are provided, and do very little work beyond dispatch to the appropriate
helpers.

Note the Create function also provides additional functionality beyond the
Container API; it allows the user to pass in a weight map defined in userland,
which is a requirement for several serving use cases.

Differential Revision: [D49670711](https://our.internmc.facebook.com/intern/diff/D49670711/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110158
Approved by: https://github.com/desertfire, https://github.com/chenyang78
2023-09-28 14:59:41 +00:00
cyy
a81d083b1c [Reland] Add -Wdeprecated and related fixes (#110019)
This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing
```
((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR)))
```
to
```
((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR)))
```
in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon.
We also enabled -Wdeprecated on c10.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019
Approved by: https://github.com/clee2000
2023-09-28 03:34:29 +00:00
Joel Schlosser
f5a23ca78d Make unbind() overrideable for NT subclass (#109122)
Goal: avoid making unbind composite implicit so we can override it within `__torch_dispatch__()` for the NT subclass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109122
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2023-09-28 01:26:22 +00:00
Sherlock Huang
ec5bbef8af [AOTInductor] Switch ProxyExecutor to use AtenTensorHandle (#109748)
Summary: Switch ProxyExecutor to use AtenTensorHandle.

Test Plan: E2E Test

Differential Revision: D49471659

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109748
Approved by: https://github.com/yifuwang, https://github.com/desertfire, https://github.com/chenyang78
2023-09-27 17:51:30 +00:00
Lei, Zhenyuan
633bd0765e Integrate xpu into torch.Generator and torch.seed (#109866)
Integrate torch.xpu.Generator into torch.Generator
Integrate torch.xpu.seed into torch.seed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109866
Approved by: https://github.com/ezyang
2023-09-27 17:44:45 +00:00
Kaichao You
34ded74399 [Dynamo] fix signature in dynamo types (#110081)
The type signature is obsolete. This PR fixes the type signature, leaves comments in the C code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110081
Approved by: https://github.com/jansel
2023-09-27 09:30:04 +00:00
Yang Chen
4d0ae7c9da [inductor] support _scaled_dot_product_flash_attention fallback (#110085)
Summary:
This PR supports _scaled_dot_product_flash_attention fallback kernel.
Note that in the abi_compatible mode, we retrieve outputs by passing
output argument pointers rather than relying on std::get.

It also fixes an issue related to dynamic shapes, where we wrongfully
query undefined dynamic symbols.

Test Plan: ci

Reviewed By: frank-wei

Differential Revision: D49620191

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110085
Approved by: https://github.com/desertfire
2023-09-27 00:09:56 +00:00
Shiyan Deng
19ca883f8b [pytorch][jit] allow passing in obj loader in unpickle api (#109730)
Summary: We are trying to use wired message to pass python objects like KJT. In order to make JIT be able to unpickle it, we need to provide a type resolver as well as an obj loader. This diff modify the interface to let we be able to do that.

Test Plan:
Rely on current CI to make sure existing usage doesn't break.

In the next diff, test e2e

Differential Revision: D49438569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109730
Approved by: https://github.com/davidberard98
2023-09-26 23:50:20 +00:00
Rodrigo Kumpera
317e39a8ad [C10d] Cleanup collective sequence number. (#109136)
Sequence numbers must be associated with a Work object
if we want to use it as a way to report collective progress.

The API surface change is introducing Work::getSequenceNumber, which
should eventually be exposed to python.

The bulk of this change is changing gloo to make the sequence number
be always in use and weave it to the dozens subclasses of Work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109136
Approved by: https://github.com/fduwjj
2023-09-26 17:17:04 +00:00
Animesh Jain
0673aa3d28 [dynamo][guards-log] Print nn module guard saved dict versions for debugging (#110028)
This is the output for nn module guards

~~~
[DEBUG] GUARDS:
[DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False           # _dynamo/variables/builder.py:1356 in wrap_fx_proxy_cls
[DEBUG] ___check_obj_id(L['self'], 139820807110912)                   # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_0(L['self']) # versions(mod=9998, _parameters=1194395, _buffers=1194397, _modules=1194423, _forward_hooks=1194405, _forward_pre_hooks=1194411, _backward_hooks=1194402, _backward_pre_hooks=1194400)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0], 139817945727568)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_1(L['self'].mods[0]) # versions(mod=10001, _parameters=1194428, _buffers=1194430, _modules=1194522, _forward_hooks=1194438, _forward_pre_hooks=1194444, _backward_hooks=1194435, _backward_pre_hooks=1194433)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[1], 139817945560640)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_2(L['self'].mods[1]) # versions(mod=10001, _parameters=1194660, _buffers=1194662, _modules=1194753, _forward_hooks=1194670, _forward_pre_hooks=1194676, _backward_hooks=1194667, _backward_pre_hooks=1194665)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0].linear, 139817945727856)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_3(L['self'].mods[0].linear) # versions(mod=10004, _parameters=1470004, _buffers=1194467, _modules=1194493, _forward_hooks=1194475, _forward_pre_hooks=1194481, _backward_hooks=1194472, _backward_pre_hooks=1194470)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] ___check_obj_id(L['self'].mods[1].linear, 139817945561120)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_4(L['self'].mods[1].linear) # versions(mod=10004, _parameters=1470008, _buffers=1194699, _modules=1194725, _forward_hooks=1194707, _forward_pre_hooks=1194713, _backward_hooks=1194704, _backward_pre_hooks=1194702)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] utils_device.CURRENT_DEVICE == None                           # _dynamo/output_graph.py:373 in init_ambient_guards
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110028
Approved by: https://github.com/ezyang
ghstack dependencies: #110023, #110039
2023-09-26 08:53:07 +00:00
Yuqing Jiang
56659844f9 [profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751)
Summary:
https://github.com/pytorch/pytorch/issues/109263
Show the shape of tensorlist when the length is < 30.

Test Plan:
{F1097707985}
and unit tests

Reviewed By: davidberard98

Differential Revision: D49351902

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751
Approved by: https://github.com/davidberard98
2023-09-26 01:03:54 +00:00
Bin Bao
4bf1cd6961 [aotinductor] Rename aot_runtime to aoti_runtime (#110007)
Summary: Make the naming more explicit

Differential Revision: D49593528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007
Approved by: https://github.com/houseroad
2023-09-26 00:46:54 +00:00
fwenguang
c4f2b6dbd2 [profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002)
At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type.
PyCMethod_Type is subtype of PyCFunction_Type, ref to
f2eaa92b0c/Objects/methodobject.c (L372).
So there should use PyCFunction_Check to check arg->ob_type.

Fixes #109877

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002
Approved by: https://github.com/ezyang
2023-09-25 20:17:25 +00:00
PyTorch MergeBot
83deaa16ed Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178)"
This reverts commit b7a95f4fdb.

Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))
2023-09-25 20:05:25 +00:00
Moritz Hennen
09c598745c Rename torch._C._TensorBase to TensorBase (#109940)
I have gone ahead and implemented the renaming of the type `torch._C._TensorBase` to a non-private class name `TensorBase`.
The changes also include leaving `torch._C._TensorBase` as an alias to the new type: 70458768fb/torch/csrc/autograd/python_variable.cpp (L2196-L2197) both in the c++ code and in the corresponding `__init__.pyi.in` file:
70458768fb/torch/_C/__init__.pyi.in (L1522)

Fixes #109438

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109940
Approved by: https://github.com/ezyang
2023-09-25 19:10:22 +00:00
Randolf Scholz
837272f150 Python 3.10 Union operator | support for JIT (#109293)
Fixes #101777

- [x] Duplicated the tests from `test/jit/test_union.py` into [`test/jit/test_union_pep604.py`](https://github.com/pytorch/pytorch/pull/109293/files#diff-b981f6493093482b43b0e62057b0c01b004b3e932d4e63a1166c3808c0172b83), using PEP604 style Unions
- [x] Exchanged custom `get_args` and `get_origin`  with `typing.get_args` and `typing.get_origin` which have the same functionality and became part of the standard library in 3.8
- [x] Added utility function `pep604union_to_union` in `tree_views.h` which converts a `BinOP("|")` node into the corresponding `Union`. This function intercepts `ScriptTypeParser::parseTypeFromExpr` and `ScriptTypeParser::parseTypeFromExprImpl` and patches the expression.
- [ ] There is a single failing test, I commented it out for the moment to see if CI complains about anything else. I tried several hours to figure out how to patch it, but I am not experienced with C++ development and debugging.

From what I could gather, the following fails:

```python
    def test_union_optional_of_union_return(self):
        @torch.jit.script
        def fn() -> None | str | int:
            y: Optional[int | str] = "foo"
            return y
```

In the section:

75b954b715/torch/csrc/jit/frontend/script_type_parser.cpp (L232-L243)

When using regular `Union`, the `resolver` path is taken, whereas with the patch pep604 union, `resolveType` doesn't work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109293
Approved by: https://github.com/ezyang
2023-09-25 15:35:54 +00:00
Pritam Damania
5565a29568 Release GIL in torch.cuda ops wherever possible. (#109159)
Most `torch.cuda` ops (ex: `torch.cuda.synchronize`) do not release GIL in C++ land. This has the potential of causing deadlocks and freeze the python process. For example, `torch.cuda.synchronize` could hold GIL and get blocked on some operation. However, that operation might never complete in python land since GIL is held by `torch.cuda.synchronize`.

In this PR, I've tried to release GIL as much as possible in `torch.cuda` ops.

See https://github.com/pytorch/pytorch/issues/109074 for an example of how holding GIL causes a deadlock.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109159
Approved by: https://github.com/ezyang
2023-09-25 14:35:31 +00:00
cyy
b7a95f4fdb [1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178)
Following our previous IWYU work  #100304 on C10, it makes more sense to try IWYU on torch_cpu. This PR does exactly that. Meanwhile, it fixes issue #48684.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101178
Approved by: https://github.com/ezyang
2023-09-24 05:01:20 +00:00
cyy
dee100945e [2/N] Move c10::variant to std::variant (#109723)
This PR moves most of c10::variant calls to std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723
Approved by: https://github.com/ezyang
2023-09-24 02:47:43 +00:00
Oleg Khabinov
54faedf5f2 [AOTInductor] Load model on arbitrary device (#109816)
Reviewed By: desertfire

Differential Revision: D49402404

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109816
Approved by: https://github.com/chenyang78
2023-09-23 04:45:20 +00:00
Modi Mo
f0d71de4ac Update caffe2 with LLVM-18 API change (#109408)
Summary: https://github.com/llvm/llvm-project/pull/66295 modified some internal LLVM APIs, update these places with the changes under LLVM version guard

Test Plan: CI

Differential Revision: D49340871

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109408
Approved by: https://github.com/Skylion007
2023-09-22 21:40:58 +00:00
Rodrigo Kumpera
c26270c733 [C10D] Even more store scalability work. (#109218)
Fix a bug socket.cpp in timeout detection that only shows up with 10k ranks.

Make the minimum wait time in _store_based_barrier to be adaptative based on
the number of ranks.

Longer timeouts give more room for the store to do productive work when swamped.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109218
Approved by: https://github.com/XilunWu
ghstack dependencies: #109217
2023-09-22 21:27:09 +00:00
Rodrigo Kumpera
92de1d3222 [C10D] Push store scalability a bit further. (#109217)
This is a bunch of small changes to improve store scalability:

- stagger client connection to avoid a stampede.
- warn if somaxconn is too small.
- increase the backlog to 16k.

Differential Revision: [D49238587](https://our.internmc.facebook.com/intern/diff/D49238587)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109217
Approved by: https://github.com/XilunWu
2023-09-22 17:23:46 +00:00
Andrew Calvano
2512017814 Fix for out of bounds read in torch mobile flatbuffer loader (#108439)
Remove redundant (and unsafe) `mobile::serialization::ModuleBufferHasIdentifier(data)` as ` mobile::serialization::VerifyModuleBuffer(verifier)` validates the same thing but in boundary-check safe manner.

Test Plan: Out of bounds read crash no longer reproduces

Differential Revision: D48914114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108439
Approved by: https://github.com/manuelcandales, https://github.com/malfet
2023-09-22 14:26:33 +00:00
Nikita Shulga
f092eecc92 Handle C++ exceptions raised during finfo/iinfo calls (#109743)
Partially fixes https://github.com/pytorch/pytorch/issues/109737
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109743
Approved by: https://github.com/albanD
ghstack dependencies: #109744
2023-09-22 14:17:58 +00:00
Bin Bao
d7dfa91e12 [inductor] Refactor some libtorch c shim interfaces (#109834)
Summary: Change the returned values to be in the back of the parameters, because 1) it is more consistent with AOTInductor runtime API convention; 2) because the out-variant ops have the out tensor at the beginning of parameters, this makes the return values more distinguished from those

Test Plan:
```
buck test mode/opt caffe2/torch/fb/model_transform/experimental/benchmark/test/aotinductor:test_aot_inductor_benchmark
```

Differential Revision: D49522928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109834
Approved by: https://github.com/chenyang78
2023-09-22 12:45:23 +00:00
Brian Hirsh
63526a63f5 Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023)
I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`.

Changes:

(1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys).

(2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization

(3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023
Approved by: https://github.com/zou3519
ghstack dependencies: #108654, #109662, #109632
2023-09-22 07:09:04 +00:00
Brian Hirsh
dae9aa8925 fix subclass custom sizes dynamic shapes caching (#108654)
This PR fixes the ownership/lifetime handling for tensor subclasses that override sizes/strides, when tensors get resized.

This is needed now, because `FunctionalTensor` is a subclass that has a custom size/stride (so it can plumb requests to its inner tensor), and is also a core piece of infra (it's used during tracing in AOTAutograd, which means that metadata mutation and resizing that happens to work with torch.compile today needs to work with FunctionalTensor).

After a bunch of discussion with @ezyang and @soulitzer, I updated `PyInterpreter::sym_sizes()` (and friends) so that:
(1) They allocate a py::capsule buffer and stash it on the tensor on the first call to size/stride
(2) On a size/stride call where we noticed that the number of **dimensions** on the tensor has changed (so our buffer it stale), we re-allocate the buffer
(3) On a size/strude cal where we notice that the number of dimensions is the same, but the values are different (this happens whenever a tensor experiences a metadata mutation, like `.transpose_()`), we inplace-modify the buffer and put the new ints/symints into it

I also ended up doing the SmallVector optimization, which was required to fix some tests in AOTAutograd. Ideally we should look into those tests, and nail down the parts of our codebase that rely on SmallVector not re-allocating on a resize... but I'm saving this for a followup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108654
Approved by: https://github.com/ezyang
2023-09-22 07:09:04 +00:00
cyy
cd99cdc3af fix std::move warnings from gcc (#105780)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105780
Approved by: https://github.com/Skylion007
2023-09-22 05:55:21 +00:00
rzou
8124a6c40c [TORCH_LIBRARY] Add impl_abstract_pystub (#109529)
We want users to be able to define custom ops in C++ but put the
abstract impl in Python (since it is easier to write them in Python and
the abstract impl better models device semantics and data-dependent
operators).

`m.impl_abstract_pystub(opname, python_module, context)` declares the
abstract_impl of the operator to exist in the given python module.
When the abstract_impl needs to be accessed (either via FakeTensor or
Meta), and it does not exist, the PyTorch Dispatcher will yell
with a descriptive error message.

Some details:
- We construct a new global AbstractImplPyStub mapping in
  Dispatcher.cpp. Read/write to this map is protected by the Dispatcher
  lock.
- We add a new Meta Tensor fallback kernel. The fallback errors out if there is
  no meta kernel, but also offers a nicer error message if we see that there is
  a pystub.
- We create a `torch._utils_internal.throw_abstract_impl_not_imported_error`
  helper function to throw errors. This way, we can throw different error
  messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we
  added a PyInterpreter::throw_abstract_impl_not_imported_error.

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/)

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2023-09-22 04:55:36 +00:00
Bin Bao
8856c1628e [inductor] Change AOTInductor to return output tensors (#109790)
Summary:
Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits:

* It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable.
* As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance.
* With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability.

This change also combines D49494954 from Yang and https://github.com/pytorch/pytorch/pull/109560 from Angela.

Differential Revision: D49502318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109790
Approved by: https://github.com/chenyang78
2023-09-22 02:31:52 +00:00
Gustav Larsson
8dcdc74915 torch->onnx export support: quantized::linear_relu (#109755)
- Adds support for quantized::linear_relu
  - Adds weight unpacking pattern matcher
  - Adds to export for opset 10 and 13.
- Adds QAT test modeled after conv2d+relu fusion test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109755
Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi
2023-09-21 23:24:20 +00:00
Daniil Kutz
175ccfc4c8 Verify flatbuffer module fields are initialized (#109794)
Fixes #109793

Add validation on flatbuffer module field to prevent segfault

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109794
Approved by: https://github.com/malfet
2023-09-21 23:19:17 +00:00
PyTorch MergeBot
b5fde4c382 Revert "[Reland] Remove calls of c10::either (#109708)"
This reverts commit 0735f6c0d5.

Reverted https://github.com/pytorch/pytorch/pull/109708 on behalf of https://github.com/atalman due to Broke windows periodic tests ([comment](https://github.com/pytorch/pytorch/pull/109708#issuecomment-1730356321))
2023-09-21 22:04:25 +00:00
cyy
e9e93c5350 [Reland] Move torch::make_unique to std::make_unique (#109780)
We can first try to move torch::make_unique to std::make_unique despite reverting of #108866 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109780
Approved by: https://github.com/ezyang
2023-09-21 18:30:21 +00:00
Edward Z. Yang
09622d8d49 Allow inferring size-nature from sizes passed to empty constructor (#109720)
This removes the need for many constrain_as_size calls as we now
infer them from error checking for sizes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720
Approved by: https://github.com/aakhundov
2023-09-21 17:57:40 +00:00
Edward Z. Yang
0351e2042b Avoid throwing exception in ClosingTHPObjectPtr (#109758)
Previously, if ClosingTHPObjectPtr was destructed because we
were unwinding the stack from an exception, we would attempt to call
close() which just isn't going to work.  Two fixes:

1. Detect if we're unwinding due to a Python error, and don't try
   to do more Python stuff if so.

2. If close() fails somehow, write an unraisable exception, don't
   try to throw because that will terminate if you're in an
   exception.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109758
Approved by: https://github.com/jansel
2023-09-21 17:04:14 +00:00
Nikita Shulga
cddd0db241 Add finfo properties for float8 dtypes (#109744)
Add float8 finfo checks to `test_type_info.py`
Fixes https://github.com/pytorch/pytorch/issues/109737
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109744
Approved by: https://github.com/drisspg
2023-09-21 03:41:48 +00:00
Bin Bao
9c2715bbb2 [inductor] Clean up AOTInductor runtime ABI (#109678)
Summary: Change the AOTInductor runtime interface to avoid referring to aten data structures directly, mostly at::Tensor and ProxyExecutor. This a combination of https://github.com/pytorch/pytorch/pull/109436,  https://github.com/pytorch/pytorch/pull/109498, https://github.com/pytorch/pytorch/pull/109450, https://github.com/pytorch/pytorch/pull/109606, plus a few internal build changes.

Reviewed By: frank-wei

Differential Revision: D49374820

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109678
Approved by: https://github.com/frank-wei, https://github.com/chenyang78
2023-09-21 00:25:24 +00:00
Nikita Shulga
4e3b03217d [BE] Replace 8 with CHAR_BIT (#109740)
Defined in [limits.h](https://en.cppreference.com/w/c/types/limits) as number of bits per byte

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109740
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi
2023-09-20 23:42:25 +00:00
Peter Bell
7ce69d5dbe [RELAND] Remove some unnecessary <iostream> includes from headers (#108150)
In almost all cases this is only included for writing the output formatter, which
only uses `std::ostream` so including `<ostream>` is sufficient.

The istream header is ~1000 lines so the difference is non-trivial.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108150
Approved by: https://github.com/albanD, https://github.com/malfet
ghstack dependencies: #108149
2023-09-20 21:55:15 +00:00
cyy
0735f6c0d5 [Reland] Remove calls of c10::either (#109708)
While there were FB issues encountered when removing c10::either #109299 , we should be able to change OSS code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109708
Approved by: https://github.com/clee2000
2023-09-20 21:23:10 +00:00
soulitzer
8bc00dfffd Hashing for constant and singleton SymInt/SymBool (#109170)
Bugfix:
- previously, SymBool does not implement `__eq__`, Python falls back to default `__eq__ `and `__hash__`
- in this PR, we make SymBool implement `__eq__`
- symbolic SymBool now raises an error when hashed just like SymInt/SymFloat

New feature:
- previously, SymInt and SymFloat are unhashable (even if you are singleton or constant)
- in this PR, SymInt and SymBool are hashable if singleton/constant

Stay the same:
- SymNode are hashable due to default Python behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109170
Approved by: https://github.com/ezyang
ghstack dependencies: #109169
2023-09-20 20:37:15 +00:00
soulitzer
5252fcb133 Handle constant SymBool in unary and binary operations (#109169)
In this PR:
- When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool.
- Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169
Approved by: https://github.com/ezyang
2023-09-20 20:37:15 +00:00
Rodrigo Kumpera
9a1b6d44bb [C10d] Add PG::enableCollectivesTiming to make it dynamically enabled. (#108814)
Collectives timing gates the tracking when a collective starts on device.

Currently it's enabled by set the NCCL_ENABLE_TIMING env var.

The goal of this PR is to make it possible to dynamically enable that flag so users of the PG hooks don't have to set that flag in order to have their hooks work.

The design is that once set, all new collectives will have such behavior so we track it on each Work object.

We make enableTiming_ atomic in PGNCCL to avoid races on non-TSO hardware.

To ensure consistency, we copy its value during Work construction and replace all previous usage of enableTiming_ from the PG with usages from the Work, which now has an immutable value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108814
Approved by: https://github.com/wconstab, https://github.com/fduwjj
ghstack dependencies: #108813
2023-09-20 19:47:41 +00:00
PyTorch MergeBot
cdb51d2ad0 Revert "[2/N] Add -Wdeprecated and related fixes (#109564)"
This reverts commit 5b50641bac.

Reverted https://github.com/pytorch/pytorch/pull/109564 on behalf of https://github.com/atalman due to Need to revert as followup revert of first PR 108626 ([comment](https://github.com/pytorch/pytorch/pull/109564#issuecomment-1728137207))
2023-09-20 17:15:57 +00:00
cyy
567e8ebf94 [1/N] Move c10::variant to std::variant (#103675)
This PR moves some calls of c10::variant to std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103675
Approved by: https://github.com/ezyang
2023-09-20 15:21:24 +00:00
Aleksei Nikiforov
a019e5cbff s390x onnx: byteswap data when serializing it (#107963)
This change fixes test_pad, test_pad_with_dynamic_input_shape, test_reshape, test_resize and test_resize_after_concat in test/onnx/test_pytorch_onnx_shape_inference.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107963
Approved by: https://github.com/justinchuby
2023-09-20 14:27:45 +00:00
cyy
5b50641bac [2/N] Add -Wdeprecated and related fixes (#109564)
This PR follows #108626.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109564
Approved by: https://github.com/ezyang
2023-09-20 07:03:25 +00:00
Edward Z. Yang
518308a740 Trace through pytree API with dynamo. (#108533)
Fix: #107315

This PR enables dynamo to trace through the `pytree` API by inlining its functions. In
order to do so, a few details of `pytree` had to be changed.

In summary, this PR:

- Introduces `TreeSpecVariable` for representing `TreeSpec` instances
- Specializes `<type>.__bases__` call, returning a `TupleVariable`
- Enables the call to `id` builtin function for every variable that implements
  `as_python_constant` method
- Specializes `ConstantVariable.call_method` for its (un)flatten functions
- Implements `UserDefinedObjectVariable.as_python_constant`
- Modifies `pytree` by:
    - Make `SUPPORTED_NODES` a map of ids (instead of types) to `NodeDef`
    - Removed `functools.wraps` function, since it can't be inlined

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108533
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
ghstack dependencies: #109201
2023-09-20 00:04:56 +00:00
Digant Desai
5845fc2fa6 [PyTorch][Coreml] Bubble up NSError from loadModel (#109444)
Summary: This can help debug issues esp fc/bc issues with coreml tools, when a model fails to load.

Test Plan:
On a macbook fbsource,
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple --auto-test-schemes --force-with-wrong-xcode
```
It builds and runs the Playground app using a bunch of coreml models on my iPhone. Here is one for example,
https://pxl.cl/3nSPn

Also forcefully triggering MLModel ctor failure to test this code by setting a `modelURL=nil`, and as expected got this,
```
libc++abi: terminating due to uncaught exception of type c10::Error: Error loading MLModel Error details:  Localized_description: nil value for URL Domain: com.apple.CoreML Code: 3 User Info: {
    NSLocalizedDescription = "nil value for URL";
} Input Shapes: N/A

Exception raised from compile at xplat/caffe2/torch/csrc/jit/backends/coreml/objc/PTMCoreMLBackend.mm:162 (most recent call first):
(no backtrace available)
```

Instead of a previous message would have been,
```
Loading MLModel failed
```

Unrelated issues
* P829736691 - with running MaskRCNN on Coreml with the Playground app. Only happens some times.
* P829741377 - with Metal Operator Tests with the Playground app.

Differential Revision: D49349726

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109444
Approved by: https://github.com/kimishpatel
2023-09-19 20:08:37 +00:00
Brian Hirsh
25e81f19f3 reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)" (#109518)
Reland - the previous PR was reverted by internal with this error:
```
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module>
    from ._utils_internal import _functionalize_sync as _sync
ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal'
```

I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for *another* `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444))

Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518
Approved by: https://github.com/albanD
2023-09-19 13:25:24 +00:00
cyy
ac603bc2f8 [Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566)
This is reland of #87603 with definitions of c10::stoXX kept for further investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109566
Approved by: https://github.com/huydhn
2023-09-19 07:15:25 +00:00
Edward Z. Yang
2c1554a032 Make SymFloat behave symmetrically with float in torch.tensor (#109513)
Previously, SymFloat would force double precision.  That's wrong;
instead, we must respect default dtype.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109513
Approved by: https://github.com/voznesenskym
2023-09-19 01:52:41 +00:00
Pritam Damania
550b0ec3d4 Release GIL around VariableInfo::zeros to avoid deadlocks (#109454)
See https://github.com/pytorch/pytorch/issues/109074#issue-1891369807 and https://github.com/pytorch/pytorch/issues/109074#issuecomment-1718825855
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109454
Approved by: https://github.com/albanD
2023-09-18 22:28:48 +00:00
PyTorch MergeBot
4d44d8c00a Revert "Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179)"
This reverts commit 852f1b8417.

Reverted https://github.com/pytorch/pytorch/pull/109179 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this is breaking periodic buck build, so please fix the issue and reland the change https://github.com/pytorch/pytorch/actions/runs/6207458526/job/16852695272 ([comment](https://github.com/pytorch/pytorch/pull/109179#issuecomment-1724168571))
2023-09-18 18:41:12 +00:00
Bin Bao
6ffa59031a [inductor] Fix CudaStreamGuard in AOTInductor ABI compatible mode (#109471)
Summary: Use a RAII class to wrap around at::cuda::CUDAStreamGuard. Previous implementation didn't follow the exact CUDAStreamGuard behavior.

Test Plan: CI

Differential Revision: D49355542

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109471
Approved by: https://github.com/chenyang78
2023-09-18 15:54:58 +00:00
Catherine Lee
0cae3b5df5 Revert "[PyTorch] Add Expanded call stack to nodes (#108426)" (#109468)
This reverts commit c657d9ecc5. https://github.com/pytorch/pytorch/pull/108426

The diff got reverted internally via a backout diff without getting exported to github.

Do not import this PR

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109468
Approved by: https://github.com/kit1980
2023-09-17 23:46:20 +00:00
Ken Jin
f9e72acc8f Guard default dtype in torchdynamo (#109459)
Fixes https://github.com/pytorch/pytorch/issues/109458

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109459
Approved by: https://github.com/ezyang
2023-09-17 22:51:33 +00:00
PyTorch MergeBot
71420a98ab Revert "Remove c10::either (#109299)"
This reverts commit 9d297cc773.

Reverted https://github.com/pytorch/pytorch/pull/109299 on behalf of https://github.com/clee2000 due to sorry but there are a few internal usages and when I tried swapping them out, I got some errors.  I will get someone to look at them on Monday ([comment](https://github.com/pytorch/pytorch/pull/109299#issuecomment-1722579387))
2023-09-17 22:05:47 +00:00
PyTorch MergeBot
525e4f42d0 Revert "replace torch::make_unique with std::make_unique (#108866)"
This reverts commit 03e35efbf7.

Reverted https://github.com/pytorch/pytorch/pull/108866 on behalf of https://github.com/clee2000 due to Sorry but I found more usages of `torch::make_unique` internally, I can go change all of these, but I'd prefer if that gets done before this gets merged ([comment](https://github.com/pytorch/pytorch/pull/108866#issuecomment-1722577925))
2023-09-17 21:57:30 +00:00
PyTorch MergeBot
49b18ae546 Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)"
This reverts commit 0ad595954a.

Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))
2023-09-17 20:57:41 +00:00
cyy
75b954b715 [4/N] Enable clang-tidy in torch/csrc/autograd (#109455)
The PR enables clang-tidy checks in torch/csrc/autograd.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109455
Approved by: https://github.com/Skylion007
2023-09-17 17:11:50 +00:00
cyy
51d2d825ab [3/N] apply clang-tidy in torch/csrc/autograd (#109368)
This PR applies clang-tidy fixes in torch/csrc/autograd/FunctionsManual.cpp. There are also other fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109368
Approved by: https://github.com/Skylion007
2023-09-17 07:26:59 +00:00
Bin Bao
0f646b1d15 [inductor] Add a C shim layer for libtorch (#109391)
Summary:
This PR adds a limited C shim layer for libtorch. The ultimate goal is to ban any direct reference to aten/c10 data structures or functions, to avoid ABI breakage by providing stable C interfaces.

To make the review and landing easier, we broke the changes into several steps. In this PR (a combination of https://github.com/pytorch/pytorch/pull/109022 and https://github.com/pytorch/pytorch/pull/109351), we add C interfaces for certain libtorch functions and modify the wrapper codegen to generate calls to those interfaces. There are a few other items to be addressed in future PRs:

* The AOTInductor runtime interface still takes lists of aten tensors as input and output
* The interaction with ProxyExecutor (general fallback support) needs to move away from aten tensor
* Remove all references to aten/c10 headers in the AOTInductor-generated code

Differential Revision: D49302669

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109391
Approved by: https://github.com/chenyang78
2023-09-16 16:46:26 +00:00
cyy
852f1b8417 Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179)
We can remove these functions in favor of std ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109179
Approved by: https://github.com/colesbury
2023-09-16 07:22:50 +00:00
cyy
a14d30d8d1 [1/N] apply clang-tidy in torch/csrc/autograd (#109032)
This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-09-15 23:28:43 +00:00
Brian Hirsh
0ad595954a python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)
Added two new utils to help with turning python functionalization on in AOTAutograd (next PR):

(1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic

(2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917
Approved by: https://github.com/ezyang
ghstack dependencies: #106404
2023-09-15 20:19:25 +00:00
Brian Hirsh
f22b303f65 Add TorchDispatch version of functionalization (#106404)
This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic.

This idea came from Ed - later in the stack, I want to be able to run functionalization **underneath** torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later.

This PR provides the basic new classes, and some light testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404
Approved by: https://github.com/ezyang
2023-09-15 20:19:25 +00:00
Edward Z. Yang
d3a64ff249 Display subclass name when tolist() fails due to tensor subclass (#109376)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109376
Approved by: https://github.com/wanchaol
2023-09-15 19:42:39 +00:00
cyy
9d297cc773 Remove c10::either (#109299)
We can replace it with std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109299
Approved by: https://github.com/colesbury, https://github.com/ezyang
2023-09-15 19:34:31 +00:00
Oleg Khabinov
cc03e3a892 [AOTInductor] Do not hardcode directory with .cubin files (#109151)
Reviewed By: frank-wei, chenyang78

Differential Revision: D49081883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109151
Approved by: https://github.com/chenyang78
2023-09-15 18:38:05 +00:00
Andrew Gallagher
a873f523ba [aarch64][caffe2/torch/csrc/profiler] Support aarch64 in inline assembly (#104707)
Summary:
Port x86 inline assembly to aarch64:
- Use `sp` instead of `%rsp` for stack pointer; move to second caller-
  saved register `x1` instead of `%rsi`
- Use `x29` instead of `%rbp` for base pointer; move to third caller-
   saved register `x2` instead of `%rdx`

Test Plan:
```
$ buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file
```

Reviewed By: jasonjk-park

Differential Revision: D47242468

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104707
Approved by: https://github.com/aaronenyeshi
2023-09-15 14:34:55 +00:00
Paul Gesel
0cbca85707 Add check to prevent NumPy ndarray from being treated as tuple when indexing (#108954)
Fixes #108689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108954
Approved by: https://github.com/lezcano
2023-09-15 08:51:58 +00:00