Commit Graph

1246 Commits

Author SHA1 Message Date
Nikita Shulga
5596cefba6 Fix segfault during NumPy string tensor conversion (#155364)
By checking dtype first, but add elemnt_size check as well

Fixes https://github.com/pytorch/pytorch/issues/155328

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155364
Approved by: https://github.com/Skylion007
2025-06-07 01:55:00 +00:00
Yuanhao Ji
75b24c273b Export torch::utils::tensor_to_numpy (#154178)
Fixes #154105

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154178
Approved by: https://github.com/albanD, https://github.com/Skylion007, https://github.com/youkaichao
2025-06-04 05:48:27 +00:00
Aaron Gokaslan
0cd18ba1ca [BE][Ez] Update deprecated pybind11 functions (#154798)
Some checks failed
pull / linux-jammy-py3.9-gcc11 (push) Has been cancelled
pull / linux-docs (push) Has been cancelled
pull / linux-jammy-py3.9-gcc11-no-ops (push) Has been cancelled
pull / linux-jammy-py3.9-gcc11-pch (push) Has been cancelled
pull / linux-jammy-py3.10-clang15-asan (push) Has been cancelled
pull / linux-focal-py3.9-clang10-onnx (push) Has been cancelled
pull / linux-focal-py3.9-clang10 (push) Has been cancelled
pull / linux-focal-py3.13-clang10 (push) Has been cancelled
pull / linux-focal-cuda12.6-py3.10-gcc11-build-distributed (push) Has been cancelled
pull / linux-focal-cuda12.6-py3.10-gcc11-test (push) Has been cancelled
pull / linux-focal-cuda12.6-py3.10-gcc11 (push) Has been cancelled
pull / linux-jammy-py3-clang12-mobile-build (push) Has been cancelled
pull / linux-jammy-cuda11.8-cudnn9-py3.9-clang12 (push) Has been cancelled
pull / linux-focal-py3_9-clang9-xla (push) Has been cancelled
pull / linux-focal-cpu-py3.10-gcc11-bazel-test (push) Has been cancelled
pull / linux-jammy-py3.9-gcc11-mobile-lightweight-dispatch-build (push) Has been cancelled
pull / linux-jammy-rocm-py3.10 (push) Has been cancelled
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 (push) Has been cancelled
pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail (push) Has been cancelled
pull / linux-jammy-py3-clang12-executorch (push) Has been cancelled
pull / cuda12.8-py3.10-gcc9-sm75 (push) Has been cancelled
pull / linux-jammy-xpu-2025.1-py3.9 (push) Has been cancelled
inductor-unittest / cuda12.6-py3.10-gcc9-sm86 (push) Has been cancelled
inductor-unittest / cuda12.6-py3.12-gcc9-sm86 (push) Has been cancelled
inductor-unittest / linux-jammy-cpu-py3.12-gcc11-inductor-halide (push) Has been cancelled
inductor-unittest / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu (push) Has been cancelled
inductor-unittest / linux-jammy-cpu-py3.9-gcc11-inductor (push) Has been cancelled
inductor-unittest / cuda12.6-py3.13-gcc9-sm86 (push) Has been cancelled
ossf-scorecard / Scorecards analysis (push) Has been cancelled
Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled
* getType() is deprecated, replace it with new/proper static method. These are backwards compatible with old pybind11 versions we support. So break this off before we upgrade to pybind11 3.0 where these methods are dropped in #154115

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154798
Approved by: https://github.com/jansel, https://github.com/cyyever
2025-06-01 06:17:50 +00:00
Yiming Zhou
0289313551 [AOTI] Support OptionalTensor return type in AOTI proxy executor (#154286)
Summary:

When a C++ custom op returns an uninitialized tensor, it will be marked as None in Python. For this scenario, the user should mark the possibly uninitialized return as Tensor? in the custom op schema.
This diff adds `as_optional_tensor` type to export schema and the support for optional tensor in AOTI proxy executor.

Test Plan:

```
buck2 run mode/dev-nosan caffe2/test/inductor:test_aot_inductor_custom_ops -- -r test_fn_with_optional_tensor_output
```

Differential Revision: D75262529

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154286
Approved by: https://github.com/desertfire
2025-05-30 01:53:00 +00:00
ILCSFNO
cf7451f279 Fix signature of torch.sparse_coo_tensor() (#152681)
Fixes #145371

@pearu Searched all and find these codes, wondering whether is the root cause of the issue, could you have a review? Thanks a lot!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152681
Approved by: https://github.com/Skylion007, https://github.com/pearu, https://github.com/nikitaved
2025-05-28 13:16:41 +00:00
Laith Sakka
ab5137b048 used guard_or_false instead of guard_size_oblivious in is_int_or_symint (#154167)
This is a short circuit, that we should not fail on. Before this PR we would not fail on u0, u0+u1,
only if they are size like.  but we will fail on u0-u1.. etc for no need.
guard_or_false seems appropriate for that reason.

This was added in https://github.com/pytorch/pytorch/pull/122145 there was no unit tests for me to verify
why it was added, i could not repo using the associated issue , the example does not work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154167
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #154154, #154164
2025-05-26 21:59:45 +00:00
Nikita Shulga
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
Matthijs Hogervorst
b117a6c47b Fix two error messages involving Tensor.dense() (#152631)
Two error messages in the codebase instruct the user to use `Tendor.dense()`. This method doesn't exist, but `Tensor.to_dense()` does, and this is what the user should be using instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152631
Approved by: https://github.com/jansel
2025-05-04 20:44:08 +00:00
rzou
762844355e Make DispatchKeySet serializable; add __eq__ (#152732)
These seem like reasonable things to add. Also fixes a bug in vLLM for
me.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152732
Approved by: https://github.com/bdhirsh
2025-05-03 14:40:06 +00:00
zhxchen17
a34c28e0d2 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-25 14:16:23 +00:00
PyTorch MergeBot
b1d055fd6a Revert "[dynamo] Add guard serialization for tensor matches. (#151318)"
This reverts commit 81c4369d81.

Reverted https://github.com/pytorch/pytorch/pull/151318 on behalf of https://github.com/zhxchen17 due to macos test failing ([comment](https://github.com/pytorch/pytorch/pull/151318#issuecomment-2828638168))
2025-04-24 19:22:45 +00:00
zhxchen17
81c4369d81 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Differential Revision: [D72987485](https://our.internmc.facebook.com/intern/diff/D72987485/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-24 18:07:01 +00:00
Tugsbayasgalan Manlaibaatar
eb1f85a2a0 Support C++ statically_known_true (#151346)
Differential Revision: [D73040543](https://our.internmc.facebook.com/intern/diff/D73040543/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151346
Approved by: https://github.com/laithsakka
2025-04-18 06:42:12 +00:00
Yu, Guangye
b0810168a3 Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-13 09:54:30 +00:00
Yiming Zhou
dbcd0b571d Back out "[AOTI] Always use oss schema for ExternKernelNodes serialization" (#151026)
Summary: Revert for FC breaking

Test Plan: CI

Differential Revision: D72802075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151026
Approved by: https://github.com/hl475
2025-04-10 22:36:35 +00:00
PyTorch MergeBot
a0ab243c3a Revert "Generalize poison fork logic for each device backend (#144664)"
This reverts commit 83bd0b63b5.

Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2795157082))
2025-04-10 21:02:14 +00:00
Yu, Guangye
83bd0b63b5 Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-10 02:34:53 +00:00
Yiming Zhou
89505f4498 [AOTI] Always use oss schema for ExternKernelNodes serialization (#150197)
Summary: Added a field `protocol` to `ExternKernelNodes` and all the lowering pass will always use the oss schema to serialize external kernel nodes from now on.

Test Plan: CI

Differential Revision: D72020444

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150197
Approved by: https://github.com/zhxchen17
2025-04-08 22:35:28 +00:00
PyTorch MergeBot
bf1132c196 Revert "Generalize poison fork logic for each device backend (#144664)"
This reverts commit d86c14156d.

Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing periodic test: python test/test_cpp_extensions_mtia_backend.py TestCppExtensionMTIABackend.test_device_context ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2784506104))
2025-04-07 20:09:53 +00:00
Yu, Guangye
d86c14156d Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-07 02:06:21 +00:00
Pian Pawakapan
284b766898 [dynamic shapes] C++ bindings for guard_or_false/true (#150148)
C++ version. Would like to add it in one place to prove it works, but couldn't find one that doesn't expose a chain of data-dependent changes... so just gonna put up the base implementation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150148
Approved by: https://github.com/laithsakka, https://github.com/jingsh
2025-03-31 17:04:25 +00:00
Yuxin Wu
40ec9d2bfa avoid allocation when tensor_new from storage (#149797)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149797
Approved by: https://github.com/Skylion007
2025-03-24 20:02:45 +00:00
cyy
8fa81a6066 Enable misc-use-internal-linkage check and apply fixes (#148948)
Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19.

The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller.

The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948
Approved by: https://github.com/Skylion007
2025-03-12 14:22:56 +00:00
Wei-Sheng Chin
9c9b05bc4f Expose functions used in custom backend in torch_python dll (#148213)
Fixes #148208. There are solutions for exposing symbols implicitly from inline functions (i.e., inline function A calls non-inline function B in foo.h. Code includes foo.h has to see the symbol B in DLL).

Solution 1: tag the entire struct where the inline functions are defined as member functions with TORCH_PYTHON_API --- this PR does this for python_arg_parser.h. An alternative solution exists but will slow down dispatching a lot --- drop inline keyword and move implementation to .cc file.

Solution 2: tag individual functions with TORCH_PYTHON_API. This PR does this for python_tensor.h.

Related discussion about hiding torch_python symbols: https://github.com/pytorch/pytorch/pull/142214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148213
Approved by: https://github.com/malfet
2025-03-07 02:34:37 +00:00
Zhengxu Chen
915b9c80ab [export] Sync aoti schema to schema.py (#148017)
Summary: Synchronizing internal AOTI schema to OSS schema.py

Test Plan: CI

Differential Revision: D70271151

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148017
Approved by: https://github.com/yiming0416
2025-02-27 21:46:11 +00:00
vasiliy
382fbcc1e4 add the torch.float8_e8m0fnu dtype to PyTorch (#147466)
Summary:

Continuing the work from https://github.com/pytorch/pytorch/pull/146427

Adds the `torch.float8_e8m0fnu` dtype to PyTorch, as detailed in
https://github.com/pytorch/pytorch/issues/146414 . Please see the issue for a detailed definition of the format.  Example of basic functionality:

```python
import torch

# round trip
x0 = torch.randn(4, 4, dtype=torch.float32)
x1 = x0.to(torch.float8_e8m0fnu)  # RNE rounding
x2 = x1.to(torch.float32)  # 2 ** exponent

# creation with empty
x0 = torch.empty(4, 4, dtype=torch.float8_e8m0fnu)

# printing
print(x0)
```

Done in this PR:
* numerical correctness
* op coverage (except for `torch._scaled_mm`): create tensor, cast to/from float32
* printing a tensor works

For future PRs:
* performance optimizations for casting
* torch._scaled_mm
* PT2
* various cleanups (detailed in comments with issue numbers)

Test Plan:

```
pytest test/quantization/core/experimental/test_float8.py -s
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147466
Approved by: https://github.com/drisspg
2025-02-20 13:55:42 +00:00
Zhengxu Chen
0b84311842 [export] Generate printers/parsers for serialization enum values. (#147126)
Summary:
Generate two helper functions for enum classes in generated_serialization_types.h

printEnum: will convert enum values into strings.
parseEnum: will convert strings into enum values.

Test Plan: CI

Differential Revision: D69604850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147126
Approved by: https://github.com/yiming0416
2025-02-14 02:14:35 +00:00
Zhengxu Chen
683bb1242c [export][ez] Update tag_ for union setters. (#146912)
Summary: ez fix to set tag for union type fields.

Test Plan: CI

Differential Revision: D69467715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146912
Approved by: https://github.com/yiming0416
2025-02-12 03:52:36 +00:00
Zhengxu Chen
664550ecbf [export] Serialize special values of float into strings for json. (#146490)
Summary: Currently inf is serialized as Infinity in JSON which is not standard compliant. Instead we will tweak all special floating points into strings and handle them at json layer.

Test Plan:
see D69060784
CI

Differential Revision: D69186425

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146490
Approved by: https://github.com/yiming0416
2025-02-11 20:01:27 +00:00
cyy
15635b14ce [4/N] Remove unnecessary once flag usage (#146783)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146783
Approved by: https://github.com/albanD
2025-02-11 13:55:06 +00:00
angelayi
0c37c332da [export] Additionally save pytree namedtuple field names (#145956)
If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[*, *])`

The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field?

This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs.

There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956
Approved by: https://github.com/zhxchen17
2025-02-04 04:42:30 +00:00
Zhengxu Chen
1580f47bf4 [export][ez] Fix generated header file. (#146208)
Summary: as title.

Test Plan: CI

Differential Revision: D68978788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146208
Approved by: https://github.com/yiming0416
2025-02-03 06:01:05 +00:00
Zhengxu Chen
aad9f44b2e [export] Sync model container types to schema.py (#145959)
Summary: Synced from D68840230

Test Plan: No behavior changes to existing API. Will be tested internally.

Differential Revision: D68846532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145959
Approved by: https://github.com/yiming0416
2025-01-31 18:17:56 +00:00
cyy
116af809eb Use std::string_view (#145906)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145906
Approved by: https://github.com/albanD
2025-01-30 03:14:27 +00:00
cyyever
ef28df5c9e [Reland][Environment Variable][4/N] Use thread-safe getenv functions (#140593)
Reland of #137843 , after checking the code again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140593
Approved by: https://github.com/albanD

Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-01-28 20:51:49 +00:00
wengshiy
73622fc5fa Fix Throughputbenchmark issue (#144669)
Fixes [144461](https://github.com/pytorch/pytorch/issues/144461)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144669
Approved by: https://github.com/leslie-fang-intel, https://github.com/williamwen42, https://github.com/jansel
2025-01-26 03:37:20 +00:00
Yichen Yan
d4171b724e Let tensor_a.new_tensor() be on tensor_a.device by default (#144958)
Fixes #144957
Closes #73838 cc @albanD @ezyang

Currently, `tensor_a.new_tensor()` will return a on-cpu tensor no matter where is `tensor_a`. This differs from the document and is a side-effect of https://github.com/pytorch/pytorch/pull/41984.

See #144957 how current logic breaks dynamo.

This PR restore the documented behavior and add tests for `new_tensor`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144958
Approved by: https://github.com/ezyang
2025-01-24 22:12:31 +00:00
Edward Z. Yang
b3e90c8c33 Add support for torch function on dtype arguments (#145085)
Along the lines of https://github.com/pytorch/pytorch/issues/119194 although it doesn't actually address the FCD case.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145085
Approved by: https://github.com/vmoens, https://github.com/Skylion007
2025-01-21 17:44:47 +00:00
garfield1997
3a5bf0bc36 expose extra torch_python apis (#144746)
Fixes #144302
After checking the code of my third-party devices, I think these APIs are also relied on by us, so I exposed them according to the discussion in the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144746
Approved by: https://github.com/albanD
2025-01-16 20:50:31 +00:00
Brian Hirsh
4831f89790 support numbers as tensors for aten.copy(Tensor, Tensor) (#141161)
Fixes https://github.com/pytorch/pytorch/issues/141149. `aten.copy_` supports numbers as tensors in the python arg parser. So we need to give the same treatment to `aten.copy`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141161
Approved by: https://github.com/ezyang
2025-01-16 00:08:25 +00:00
Zhengxu Chen
834086c023 [export] Load side info about pos/kw argument kind for serialization. (#144686)
Summary:
Fixing issue of nodes like
```
torch.ops.aten.linear.default(x, w, b)
```
being deserialized as
```
torch.ops.aten.linear.default(x, w, bias=b)
```
which breaks roundtripping.

Test Plan:
buck test mode/opt caffe2/test:test_export -- -r TestDeserialize
buck test mode/opt caffe2/test:test_export -- -r TestSerialize

Differential Revision: D67991410

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144686
Approved by: https://github.com/angelayi
2025-01-15 19:08:38 +00:00
dilililiwhy
7c52c97a65 Expose several APIs to public (torch python APIs) (#144525)
Fixes #144302
Try to expose several APIs to public for privateuse1 scenario.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144525
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-01-15 14:34:45 +00:00
Yiming Zhou
87843ee9ab [export] Unify single and multiple return for hops (#143227)
Summary: Introduce `is_hop_single_tensor_return` field to the `Node` class in serialization so that during deserialization when there is a single return, we know whether it is a tuple of a single element or a single element.

Test Plan:
```
buck2 run @mode/dev-nosan sigmoid/inference/test:e2e_test_cpu -- -r E2ETestCPUCond
buck2 run @mode/dev-nosan sigmoid/inference/test:test_passes -- -r test_const_folding2
```

Differential Revision: D66991624

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143227
Approved by: https://github.com/zhxchen17
2025-01-13 03:31:14 +00:00
Aaron Gokaslan
bbec35f028 [BE]: Replace clone detach with detach clone to be more efficient (#144469)
Follow up to #144270 and fix some vulkan code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144469
Approved by: https://github.com/awgu
2025-01-09 18:28:39 +00:00
cyy
b0be30dd79 [19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448
Approved by: https://github.com/albanD
2025-01-09 15:58:05 +00:00
cyy
d0070ca07e [18/N] Fix extra warnings brought by clang-tidy-17 (#144014)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144014
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-01-08 17:21:55 +00:00
Aaron Gokaslan
e4a05dec0f [BE][Ez]: Fix docs recommending inefficient tensor op order (#144270)
`detach().clone()` is faster than `.clone().detatch()` since the gradients are not cloned. Let's update all the documentation and tests so that users do not use the inefficient op ordering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144270
Approved by: https://github.com/awgu, https://github.com/XuehaiPan
2025-01-07 17:31:32 +00:00
cyy
af629a8146 Enable readability-redundant-declaration (#143982)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143982
Approved by: https://github.com/Skylion007
2024-12-31 00:20:10 +00:00
cyy
dca443835e Enable more readability-redundant checks (#143963)
They are helpful to simplifying code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143963
Approved by: https://github.com/albanD
2024-12-30 14:49:33 +00:00
Aaron Orenstein
9bf4b1c2e9 dynamo tracing perf: c++ strip_function_call: 49.12 -> 47.77 (#143063)
See #143056 for overall docs.

This PR: Convert `strip_function_call()` into C++

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143063
Approved by: https://github.com/jansel
ghstack dependencies: #143057, #143062
2024-12-22 06:38:46 +00:00