Commit Graph

1312 Commits

Author SHA1 Message Date
Scott Wolchok
6a5a436624 DTensor: C++ compute_global_tensor_info (#162990)
compute_global_tensor_info is on the hot path for DTensor.{from,to}_local. More incremental progress toward C++.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162990
Approved by: https://github.com/ezyang
2025-10-30 15:10:54 +00:00
Samuel Park
f58f301313 Fixes bug with tolist calls to GradTrackingTensors (#165184)
Fixes #161943

## The Fix
I implemented a recursive unwrapping helper function in the `tensor_to_list.cpp` file that looks for wrapped tensors and unwraps them. The recursive implementation was needed for multi-level gradTrackingTensors.

Let me know if there is any more suggestions on fixing this issue!

@guilhermeleobas @KimbingNg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165184
Approved by: https://github.com/zou3519
2025-10-15 12:54:28 +00:00
Yuanyuan Chen
36871622f1 [2/N] Mark unused parameters in C++ code (#165121)
This is follow-up of #164912 to mark unused C++ parameters to improve code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121
Approved by: https://github.com/Skylion007
2025-10-15 03:04:39 +00:00
Laith Sakka
7f2a902ea2 more sizelike deprecation (#164889)
remove expext_size c++ bindings and usages

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164889
Approved by: https://github.com/mlazos
ghstack dependencies: #164884, #164885, #164886, #164887, #164888
2025-10-10 03:45:06 +00:00
albanD
24d69c57cb Add view support for library custom Function (#164520)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164520
Approved by: https://github.com/soulitzer, https://github.com/ezyang
2025-10-09 16:17:48 +00:00
Chris Leonard
e89d12bf5d Numpy zerotensor handling (#164487)
Fixes #89034

Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True.

@ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging.

@albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487
Approved by: https://github.com/izaitsevfb
2025-10-07 00:34:14 +00:00
PyTorch MergeBot
1fc71d1b57 Revert "Numpy zerotensor handling (#164487)"
This reverts commit f7ad6dbad6.

Reverted https://github.com/pytorch/pytorch/pull/164487 on behalf of https://github.com/malfet due to Did it break torchbench?, see 8c728e129d/1 ([comment](https://github.com/pytorch/pytorch/pull/164487#issuecomment-3374635051))
2025-10-06 23:32:12 +00:00
Scott Wolchok
11f5f65686 Use PyObject_GetOptionalAttrString in PyObject_FastGetAttrString when available (#164624)
Python 3.13 added PyObject_GetOptionalAttrString. I'm not 100% certain that it is strictly better than the old approach in all cases, but based on documentation/comments it seems to be meant for this type of use, and it's faster when I profile torchtitan training (which gets to the "check for the `__torch_function__` attr on some object" part of maybe_has_torch_function frequently enough to notice, but wastes a bunch of time generating exceptions that we then suppressed here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164624
Approved by: https://github.com/Skylion007
2025-10-06 18:26:09 +00:00
Chris Leonard
f7ad6dbad6 Numpy zerotensor handling (#164487)
Fixes #89034

Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True.

@ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging.

@albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487
Approved by: https://github.com/ngimel, https://github.com/albanD
2025-10-04 12:03:48 +00:00
PyTorch MergeBot
c6329524d8 Revert "Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)"
This reverts commit 50c0550f5a.

Reverted https://github.com/pytorch/pytorch/pull/163527 on behalf of https://github.com/swolchok due to breaking import torch in debug builds, see #164297 ([comment](https://github.com/pytorch/pytorch/pull/163527#issuecomment-3361919142))
2025-10-02 15:42:42 +00:00
PyTorch MergeBot
cc5d74c366 Revert "[BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)"
This reverts commit 94195a37ae.

Reverted https://github.com/pytorch/pytorch/pull/163464 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/163464#issuecomment-3353307034))
2025-09-30 18:20:20 +00:00
Yuanyuan Chen
3766513d25 Remove C++ workarounds for Python < 3.10 (#164055)
Remove two unnecessary `PY_VERSION_HEX` branches.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164055
Approved by: https://github.com/ezyang
2025-09-28 20:00:02 +00:00
Scott Wolchok
50c0550f5a Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)
See comment on the macro definition. In short, pybind11 3.x
added `py::native_enum`, and also had to add overhead for that new way
to bind enums on the critical path for calling functions that take
regular old `py::enum_`s as arguments (for example, `__eq__`).

Differential Revision: [D82873169](https://our.internmc.facebook.com/intern/diff/D82873169/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163527
Approved by: https://github.com/ezyang
2025-09-26 17:59:22 +00:00
arkadip-maitra
d7491fb1c1 Fix tensor creation with empty names crash (#163957)
Partially fixes #148324

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163957
Approved by: https://github.com/malfet, https://github.com/janeyx99
2025-09-26 17:41:00 +00:00
PaliC
94195a37ae [BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)
Removes HermeticPyObjectTLS as we no longer need since torch deploy is no longer supported. PythonOpRegistrationTrampoline is also drastically simplified as and being prepped for removal in a future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163464
Approved by: https://github.com/albanD, https://github.com/Skylion007
2025-09-25 23:30:50 +00:00
FFFrog
33aabdd8ac [Code Clean] Remove deadcodes about Python3.9 [3/N] (#163629)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163629
Approved by: https://github.com/albanD
ghstack dependencies: #163626, #163627
2025-09-24 07:30:50 +00:00
Pian Pawakapan
4c007073e6 [dynamic shapes] DynamicInts prototype (#162194)
Initial prototype for dynamic int inputs, allows users to run with `torch.compile(f)(DynamicInt(4))`, compiling dynamically and using the underlying hint at runtime.

Current behavior:
- Also works in eager (mostly by subclassing int), as scalar input to torch functions, or numpy/math/etc. For example, `x = DynamicInt(3); torch.randn(x); torch.add(y, z, alpha=x); np.arange(x)` all act as if x = 3.
- Behavior for arithmetic ops is to return new DynamicInts rather than static ints; `DynamicInt(3) * 2 = DynamicInt(6)`. This is via SymNode magic methods, but coverage might not be 100% - for example, I had to explicitly override floordiv to avoid int casting. This is not necessarily the case for non-magic method ops (e.g. `math.cos(x)`). The alternative here is to int cast on all operations, but I opted for this for dynamism propagation in non-compiled regions.
- Doesn't ban fullgraph=False; DynamicInt objects might be leaked back to the user, but I guess this is fine, because they can be casted to ints when needed?
- Dynamo only allocates one symbol per DynamicInt; specifying the same DynamicInt for multiple inputs leads to input deduplication, and a guard installed.
- We don't raise on int specialization (in allowlist/maybe_mark_dynamic style) - but an easy change if needed.
- DynamicInts as nn.Module attributes are handled.
- We don't guard on the DynamicInt id, e.g. users can do the following without recompiling (maybe we should guard?)
```python
x = DynamicInt(4)
f(x)
f(1)
f(DynamicInt(3))  # same as f(3)
```

Follow-up work:
- Specifying shape constraints, either at the int-level, e.g.
```python
DynamicInt(64, name="s0", constraints=["s0 % 32 == 0", "s0 <= 1024"]
```
or at the compilation level, e.g. something like
```python
s0 = DynamicInt(64, name="s0")
s1 = DynamicInt(128, name="s1")
with some_compiler_config.dynamic_int_constraints(["s1 == 2*s0", "s0 % 32 == 0"]):
    f(s0, s1)
```
This should subsume the need for specifying derived SymInts?
- SymFloat support - currently it seems backed floats are specialized by the tensorify float pass, and there's no handling in inductor.
- Propagating dynamism in tensor constructors, e.g. `x = DynamicInt(4); torch.randn(x)` could annotate `_dynamo_dynamic_indices`.

Differential Revision: D81698719

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162194
Approved by: https://github.com/bobrenjc93
2025-09-18 23:26:28 +00:00
zeshengzong
e3783a9575 Replace std::runtime_error with TORCH_CHECK (#159344)
Fixes part of #148114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159344
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-09-16 09:00:06 +00:00
PyTorch MergeBot
deb7ebe0a3 Revert "[Reland] Use std::string_view in torchgen (#158625)"
This reverts commit 972e409829.

Reverted https://github.com/pytorch/pytorch/pull/158625 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break a couple of ExecuTorch tests for Vulkan backend ([comment](https://github.com/pytorch/pytorch/pull/158625#issuecomment-3287754275))
2025-09-13 07:52:50 +00:00
Yuanyuan Chen
972e409829 [Reland] Use std::string_view in torchgen (#158625)
Reland of #157050, which is incidentally closed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158625
Approved by: https://github.com/albanD
2025-09-12 08:31:54 +00:00
Scott Wolchok
12db2a7889 Call checkLong in is_int_or_symint, completing TODO (#161692)
Calling this first minimizes overhead for plain old ints, making cheap things cheap.

Differential Revision: [D81530098](https://our.internmc.facebook.com/intern/diff/D81530098)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161692
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #161591, #161595, #161633, #161634
2025-09-09 01:10:06 +00:00
Avik Chaudhuri
711c8c821e shape guards (#161178)
Summary: This PR introduces shape guards to export. Previously only value ranges,  equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program.

Test Plan:
updated several tests

Rollback Plan:

Differential Revision: D80713603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178
Approved by: https://github.com/tugsbayasgalan
2025-09-08 22:44:09 +00:00
rzou
70d36e047d Making batching rule for F.embedding DTensor-aware (#162117)
`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's
batching rule generates a new tensor via at::arange, at::arange
generates a regular tensor, and DTensor rightfully errors on mixed
DTensor-regular Tensor operations.

This PR fixes the problem by activating DTensor implicit replication on
just the at::arange and the subsequent add operation.

In order to accomplish this I move the DTensor implicit replication flag
to C++ (most batching rules are in C++).

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162117
Approved by: https://github.com/bdhirsh
2025-09-05 21:40:14 +00:00
Scott Wolchok
f8ffa9194e Perf nitpicks on python_arg_parser's is_int_or_symint_list (#161998)
This function has come up in DTensor perf work, and I had a nitpick on #160256 so here it is. I have neither compiled nor measured this, but am reasonably confident it's better nonetheless.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161998
Approved by: https://github.com/ezyang
2025-09-03 05:38:30 +00:00
Edward Yang
9a1c5c0a07 Detect torch function in lists as well (#160256)
We basically follow the same pattern we do for tensor arguments. The major downside is we now have to traverse the entirety of the int list / etc where previously we didn't have. Benchmark suggests 2% regression for relevant things.

Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160256
Approved by: https://github.com/albanD
2025-09-02 16:22:42 +00:00
Edward Yang
d9d6dde0f4 Leak Python filenames so that we can give good dispatcher errors. (#160418)
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160418
Approved by: https://github.com/zou3519
2025-08-31 22:31:39 +00:00
Scott Wolchok
68738beff7 PythonArgs::toBool: order cheap mutually exclusive checks first (#161455)
symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161455
Approved by: https://github.com/Skylion007
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432
2025-08-31 21:35:48 +00:00
Scott Wolchok
4d3ab2669b Stop trying to intern arguments in PyObject_FastGetAttrString (#161432)
If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see b222eb66c7 (diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163))

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161432
Approved by: https://github.com/ezyang
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329
2025-08-30 06:55:43 +00:00
Yiming Zhou
a55d2beb50 [export] Support complex constant in serde (#161517)
Summary:

Fixes #160749

For a model like
```
class M(torch.nn.Module):
    def forward(self, x):
        s = torch.sin(x)
        z = 1j * s
        return z
```
Its graph will be
```
graph():
    %x : [num_users=1] = placeholder[target=x]
    %sin : [num_users=1] = call_function[target=torch.ops.aten.sin.default](args = (%x,), kwargs = {})
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%sin, 1j), kwargs = {})
    return (mul,)
```

`1j` will appear as a constant complex argument in the `aten.mul`

Test Plan:
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_complex_constant

Rollback Plan:

Differential Revision: D80672323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161517
Approved by: https://github.com/angelayi
2025-08-29 08:13:21 +00:00
Yiming Zhou
089ad1d88b [1/n][export] Refactor PT2 Archive weight saving and loading (#160394)
Summary:

We split the refactoring in two parts for forward compatibility concerns
First, we land the deserialization (loading part)
Then, we land the serialization (saving part)

Save weights and constants as individual files in PT2 archive. Each weight/constant will be saved as raw bytes, unless it is a custom object (TorchBind object) or a non-fake tensor subclass, for these two special cases we still save them using pickle.

The metadata of saved tensors along with the file name will be saved as `PayloadMeta`.
The mapping from FQN to `PayloadMeta` will be saved as `PayloadConfig` under `WEIGHTS_CONFIG_FORMAT` and `CONTANTS_CONFIG_FORMAT`

This changes the serialization in python side when calling `torch.export.save()`.

For deserialization in python `torch.export.load()`, we make it BC-safe by allowing loading legacy format weights/constants.

For deserialization in C++ `torch/nativert/ModelRunner.cpp`, we make this a BC breaking change as currently the OSS ModelRunner API is not being used.

The file structure

```
├── archive_format
├── archive_version
├── byteorder
├── .data
│   ├── serialization_id
│   └── version
├── data
│   ├── sample_inputs
│   │   └── model.pt
│   ├── constants
│   │   ├── tensor_0
│   │   ├── tensor_1
│   │   └── model_constants_config.json
│   └── weights
│       ├── weight_0
│       ├── weight_1
│       ├── weight_2
│       ├── weight_3
│       └── model_weights_config.json
└── models
    └── model.json
```

Test Plan:
CI

Rollback Plan:

Differential Revision: D80035490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160394
Approved by: https://github.com/SherlockNoMad
2025-08-26 01:15:42 +00:00
Yiming Zhou
be2e6b3158 [export] Remove unused Model, tensor_paths, constant_paths (#161185)
Summary:
Removed `Model`, it's not being used anywhere so it's safe.

Removed `tensor_paths` and `constant_paths` fields in `ExportedProgram`
- BC: when the current deserializer load a previously serialized EP (that comes with empty `tensor_paths` and `constant_paths`), it will just ignore those two fields
- FC: when the old deserializer load a newly serialized EP (that doesn't come with `tensor_paths` and `constant_paths`, it will also ignore those two fields in `_dict_to_dataclass()`

Differential Revision: D80725094

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161185
Approved by: https://github.com/SherlockNoMad
2025-08-22 01:07:01 +00:00
Kaichao You
dae7710bf2 [cuda][cupy] Improve cupy device placement when device is provided with explicit index (#158529)
resubmit https://github.com/pytorch/pytorch/pull/158320 , fixing a potential bug when device index is not specified explicitly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158529
Approved by: https://github.com/ezyang
2025-08-15 00:27:42 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
194fcfcfbd Add support for param mutation under inference mode (#159661)
Summary:
In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works:
1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation
2. Add support for parameter mutation under inference mode in AOTAutograd
3. Add support for parameter mutation under inference mode in export.

Test Plan:
test

Rollback Plan:

Differential Revision: D79460136

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159661
Approved by: https://github.com/ydwu4
2025-08-14 03:34:04 +00:00
Mikayla Gawarecki
1196bb1c2e Add utility to get computed kernel in torch.library (#158393)
Adds `OperatorEntry::getComputedKernelForDispatchKey` which returns the KernelFunction corresponding to `OperatorEntry.dispatchTable_[dispatch_ix]` for a given dispatch key
- Specifically it returns a `SafeKernelFunction` that holds a `KernelToken`. This `KernelToken` is registered to the `KernelFunction` in `OperatorEntry.kernels_` and will be invalidated when the `KernelFunction` is destructed (i.e. when the `AnnotatedKernel` that holds this `KernelFunction` is removed from `kernels_`, which happens when the corresponding impl is deregistered).
- `SafeKernelFunction` can be called via `callBoxed`, the validity of the token will be checked before this happens
- `SafeKernelFunction` is pybinded and `getComputedKernelForDispatchKey` is exposed to the frontend ia `torch.library.get_kernel`

Related to https://github.com/pytorch/pytorch/issues/155330

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158393
Approved by: https://github.com/albanD
2025-08-13 21:00:59 +00:00
Sherlock Huang
99bc2f94c1 Update export/schema.py (#160220)
Summary:
Model could have multiple ExportedPrograms
- for different methods. They can have different weights.
- for different delegates. They can also have different weights.

For this reason, we make weight per ExportedProgram.

Also, we cleanup Model, and Program. IIUC, Model and Program are not used anywhere, so it's ok to make BC breaking change.

Test Plan:
CI

Rollback Plan:

Differential Revision: D79917395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160220
Approved by: https://github.com/angelayi, https://github.com/dolpm, https://github.com/jingsh
2025-08-11 23:14:08 +00:00
Nikita Shulga
d8cb3db533 Add unsigned support to IValue (#160102)
- Moved repeated logic of saving int64/uint64 into a polymorphic container into `THPUtils_unpackInteger`
- Added `TestPythonDispatch.test_dispatch_uint64` regression test

Fixes https://github.com/pytorch/pytorch/issues/159168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160102
Approved by: https://github.com/ezyang
2025-08-11 03:57:18 +00:00
Edward Z. Yang
03b254e49f Extend torch function support to ALL arguments, not just scalar type (but not insides of list) (#145089)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145089
Approved by: https://github.com/albanD, https://github.com/zou3519
2025-08-07 23:43:53 +00:00
Sherlock Huang
aa75e917bd [Export Schema] Remove deviceAllocationMap field (#159653)
Summary:
This field is not used today, and it's not useful either.

The device allocation is configured at model loading time, specified by user.
It shouldn't be part of the model definition.

Test Plan:
CI

Rollback Plan:

Differential Revision: D79385513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159653
Approved by: https://github.com/zhxchen17
2025-08-07 07:31:42 +00:00
Zhengxu Chen
6a82da392e [export] Fix generated schema for C++20/23 (#159871)
Summary: Fixing the issue from https://github.com/pytorch/pytorch/issues/159838

Test Plan:
buck run caffe2/:export_update_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/

Rollback Plan:

Differential Revision: D79647167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159871
Approved by: https://github.com/malfet
2025-08-06 00:23:05 +00:00
PaliC
1b99c1859c [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-30 17:29:43 +00:00
PaliC
6162e650b0 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-29 17:40:49 +00:00
gaoyufeng
f5cf05c983 Throw invalid_argument instead of RuntimeError when parameters exceed… (#158267)
Throw invalid_argument instead of RuntimeError when parameters exceed limits (for torch.int32 dtype)

Fixes #157707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158267
Approved by: https://github.com/albanD
2025-07-25 23:49:46 +00:00
PyTorch MergeBot
f8fafdc7a6 Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit ab26d4fbeb.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Reverting as per offline discussion to fix internal breaks.  @PaliC will reland this as a codev diff. Instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3119037960))
2025-07-25 16:09:39 +00:00
PaliC
ab26d4fbeb [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-23 20:27:28 +00:00
PyTorch MergeBot
ee5a434f8c Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit 1a4268b811.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, see D78496147 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3099826158))
2025-07-21 23:17:39 +00:00
PyTorch MergeBot
15a50dcf1c Revert "[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)"
This reverts commit eb73650723.

Reverted https://github.com/pytorch/pytorch/pull/158427 on behalf of https://github.com/ZainRizvi due to Reverting this as part of reverting the stack for https://github.com/pytorch/pytorch/pull/158288 ([comment](https://github.com/pytorch/pytorch/pull/158427#issuecomment-3099815367))
2025-07-21 23:14:57 +00:00
PaliC
eb73650723 [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-18 05:23:00 +00:00
Kevin Fu
ad223a6c5f Add FP8 Types (#158430)
Summary: Add FP8 Types

Test Plan:
sandcastle

Rollback Plan:

Differential Revision: D78395110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158430
Approved by: https://github.com/henryoier
2025-07-17 18:09:56 +00:00
PaliC
1a4268b811 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-17 05:56:07 +00:00
PyTorch MergeBot
944a140e90 Revert "[cuda][cupy] Improve cupy device placement when device is provided (#158320)"
This reverts commit 59f9b25f3c.

Reverted https://github.com/pytorch/pytorch/pull/158320 on behalf of https://github.com/wdvr due to reverting because most likely causing test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_inferred_strides to fail ([comment](https://github.com/pytorch/pytorch/pull/158320#issuecomment-3079960616))
2025-07-16 19:15:33 +00:00