Commit Graph

431 Commits

Author SHA1 Message Date
Scott Wolchok
7d16fcf2df Re-re-re-re-apply "C++-accessible Placements via pybind11 (#163030)" (#166132)
Was reverted (again!) due to a merge conflict that crept in sometime during the "export to github -> land internally -> merge on github" process.

D85096233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166132
Approved by: https://github.com/Skylion007, https://github.com/ezyang, https://github.com/malfet
2025-10-27 21:19:32 +00:00
Catherine Lee
c6a02eae5b Add XLAHooksInterface to bazel file (#166179)
Differential Revision: D85446553

Internal builds failing after https://github.com/pytorch/pytorch/pull/161369

```
buck-headers/ATen/Context.h:22:10: fatal error: 'ATen/detail/XLAHooksInterface.h' file not found
   22 | #include <ATen/detail/XLAHooksInterface.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```

Changes similar to that PR also change the build_variables file, which I've done here.  I'm not sure why this wasn't caught by the bazel build we have?

Sanity checked that some of the previously failing builds pass after this change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166179
Approved by: https://github.com/Camyll
2025-10-27 18:47:06 +00:00
Eddie Yan
e64a814ae7 [CUDA] Add experimental green context support for SM carveout (#159104)
Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here...

Built on top of @drisspg 's branch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/kwen2501

Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-10-22 21:38:52 +00:00
soulitzer
71aefd5595 [reland] Allow setting grad_dtype on leaf tensors (#164751)
ghstack-source-id: e44b3941530be83a630ec93f1478eec741ffca2e
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/162815

Fixes #ISSUE_NUMBER

Relanding due to internal weirdness. Separate PR to codev w/o ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164751
Approved by: https://github.com/albanD
2025-10-08 20:23:13 +00:00
PyTorch MergeBot
df640df68a Revert "Reapply "C++-accessible Placements via pybind11 (#163030)" (#164519)"
This reverts commit 8c0bc879b9.

Reverted https://github.com/pytorch/pytorch/pull/164519 on behalf of https://github.com/malfet due to Still breaks internal workflows ([comment](https://github.com/pytorch/pytorch/pull/164519#issuecomment-3378469432))
2025-10-07 19:46:17 +00:00
Scott Wolchok
8c0bc879b9 Reapply "C++-accessible Placements via pybind11 (#163030)" (#164519)
This makes Placement data representation available in C++ via pybind11. Reapply with fix for internal errors.

D83788896

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164519
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2025-10-06 23:19:14 +00:00
PyTorch MergeBot
f6f7676756 Revert "C++-accessible Placements via pybind11 (#163030)"
This reverts commit 3e03deab6f.

Reverted https://github.com/pytorch/pytorch/pull/163030 on behalf of https://github.com/swolchok due to doesn't pass pyre ([comment](https://github.com/pytorch/pytorch/pull/163030#issuecomment-3362450379))
2025-10-02 18:25:24 +00:00
Scott Wolchok
3e03deab6f C++-accessible Placements via pybind11 (#163030)
This makes Placement data representation available in C++ via pybind11.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163030
Approved by: https://github.com/ezyang
2025-10-02 02:38:23 +00:00
Han Qi
b5c4f46bb9 Add functions to setup PrivateUse1 as a python backend device. (#157859)
Fixes #156052 and #156444.

This PR setup the privateuseone key in Python to be used as a python backend for pytorch.
Meaning that, after calling `setup_privateuseone_for_python_backend('npy')`, one can use a subclass to with that device to hold arbitrary python data as "device data" and use `torch.library` to register ops that takes that Tensor.

Changes done in this PR:

1. Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine.
2. Tensor subclass allows not having `__torch_dispatch__` if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops.
3. python function that invokes the other incantations to setup the privateusekey backend.

This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157859
Approved by: https://github.com/albanD
2025-10-01 21:32:59 +00:00
PyTorch MergeBot
410ed3006b Revert "Add functions to setup PrivateUse1 as a python backend device. (#157859)"
This reverts commit 1310d6a1f9.

Reverted https://github.com/pytorch/pytorch/pull/157859 on behalf of https://github.com/jeanschmidt due to introduce linting errors ([comment](https://github.com/pytorch/pytorch/pull/157859#issuecomment-3352140098))
2025-09-30 13:24:37 +00:00
Han Qi
1310d6a1f9 Add functions to setup PrivateUse1 as a python backend device. (#157859)
Fixes #156052 and #156444.

This PR setup the privateuseone key in Python to be used as a python backend for pytorch.
Meaning that, after calling `setup_privateuseone_for_python_backend('npy')`, one can use a subclass to with that device to hold arbitrary python data as "device data" and use `torch.library` to register ops that takes that Tensor.

Changes done in this PR:

1. Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine.
2. Tensor subclass allows not having `__torch_dispatch__` if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops.
3. python function that invokes the other incantations to setup the privateusekey backend.

This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157859
Approved by: https://github.com/albanD
2025-09-30 08:39:36 +00:00
Brian Hirsh
7d710403b0 Reapply "Make functionalization ViewMeta serializable with pickle. (#143712)" (#163769)
### Summary:
NOTE: This is a re-export of https://github.com/pytorch/pytorch/pull/161994 ; the changes between these two PRs is exclusively to the buck/build files

(Summary from #161994 )
Attempted rebase of https://github.com/pytorch/pytorch/pull/143712.

This reverts commit 6c713ccb5e.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D81524507

Pulled By: Lucaskabela

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163769
Approved by: https://github.com/dolpm

Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
2025-09-25 10:27:37 +00:00
dolpm
30e16d6389 [nativert] aoti (#162353)
Summary: att

Test Plan:
ci

Rollback Plan:

Differential Revision: D81731425

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162353
Approved by: https://github.com/yiming0416
2025-09-12 05:56:25 +00:00
dolpm
4f72d932fe re-land triton runtime implementation" (#162217)
Summary: original pr - https://github.com/pytorch/pytorch/pull/161798

Test Plan:
ci

Rollback Plan:

Differential Revision: D81724234

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162217
Approved by: https://github.com/SherlockNoMad
2025-09-06 00:52:29 +00:00
rzou
70d36e047d Making batching rule for F.embedding DTensor-aware (#162117)
`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's
batching rule generates a new tensor via at::arange, at::arange
generates a regular tensor, and DTensor rightfully errors on mixed
DTensor-regular Tensor operations.

This PR fixes the problem by activating DTensor implicit replication on
just the at::arange and the subsequent add operation.

In order to accomplish this I move the DTensor implicit replication flag
to C++ (most batching rules are in C++).

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162117
Approved by: https://github.com/bdhirsh
2025-09-05 21:40:14 +00:00
Shunzhi Wen
c10195e723 [C10d][Gloo] Enable complex datatype support in ProcessGroupGloo (#156633)
- Enable communication of tensors with Complex datatype in ProcessGroupGloo, similar to how ProcessGroupNCCL handles it.
- Move a function, which checks if Complex datatype is supported by a reduce operation, from ProcessGroupNCCL.cpp into a new file to be shared with ProcessGroupGloo.

Fixes #156632

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156633
Approved by: https://github.com/d4l3k
2025-09-05 21:24:36 +00:00
PyTorch MergeBot
95ee0bfea9 Revert "[nativert] triton runtime implementation (#161798)"
This reverts commit 3dde5d7f9b.

Reverted https://github.com/pytorch/pytorch/pull/161798 on behalf of https://github.com/jeanschmidt due to introducing linting failures ([comment](https://github.com/pytorch/pytorch/pull/161798#issuecomment-3255412085))
2025-09-04 20:05:24 +00:00
dolpm
3dde5d7f9b [nativert] triton runtime implementation (#161798)
Summary:
att
Test Plan:
ci
Rollback Plan:

Reviewed By: minjang

Differential Revision: D80828148

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161798
Approved by: https://github.com/minjang, https://github.com/SherlockNoMad
2025-09-04 19:00:15 +00:00
Ke Wen
61e18b5304 [2/N][SymmMem] Add MemPool allocator and tests (#161471)
(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
2025-08-31 18:08:57 +00:00
Tan Hoang
91f0bcf43f [c10d][nvshmem] add nvshmem build rules and dependency for libtorch_cuda (#159562)
Summary:
Add guarded build option for nvshmem-related c10d code with `-c fbcode.caffe2_use_nvshmem`

Guarded clause include nvshmem device + host code (static-linked) + these 2 files:
- `torch/csrc/distributed/c10d/symm_mem/NVSHMEMSymmetricMemory.cu`
-    `torch/csrc/distributed/c10d/symm_mem/nvshmem_extension.cu`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159562
Approved by: https://github.com/Skylion007, https://github.com/kwen2501
2025-08-31 12:56:51 +00:00
PyTorch MergeBot
fb2d5ea697 Revert "[2/N][SymmMem] Add MemPool allocator and tests (#161471)"
This reverts commit b291dc9684.

Reverted https://github.com/pytorch/pytorch/pull/161471 on behalf of https://github.com/atalman due to Multiple internal failures on PR #https://github.com/pytorch/pytorch/pull/161471 will need to land it via co-dev ([comment](https://github.com/pytorch/pytorch/pull/161471#issuecomment-3239283585))
2025-08-30 14:00:29 +00:00
Ke Wen
b291dc9684 [2/N][SymmMem] Add MemPool allocator and tests (#161471)
(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
2025-08-28 06:31:29 +00:00
PyTorch MergeBot
903181bb6f Revert "[2/N][SymmMem] Add MemPool allocator and tests (#161471)"
This reverts commit 4ed71d5412.

Reverted https://github.com/pytorch/pytorch/pull/161471 on behalf of https://github.com/atalman due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/161471#issuecomment-3230069186))
2025-08-27 23:18:36 +00:00
Ke Wen
4ed71d5412 [2/N][SymmMem] Add MemPool allocator and tests (#161471)
(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
2025-08-27 00:49:06 +00:00
dolpm
1471b20cb3 add static dispatch kernel registration to open source (#160439)
Summary: static dispatch registry should be moved to open source. the rest can maintain internally for now, since delegates will all go through ET hop.

Test Plan: spot checked existing tests and didn't see any missing registrations

Differential Revision: D80099377

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160439
Approved by: https://github.com/SherlockNoMad, https://github.com/zhxchen17
2025-08-20 17:58:00 +00:00
dolpm
b439675ae2 [nativert] oss pass graph pass registration (#160859)
Summary: att

Test Plan:
CI

Rollback Plan:

Differential Revision: D80368343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160859
Approved by: https://github.com/georgiaphillips
2025-08-18 22:23:38 +00:00
dolpm
138413907a [nativert] oss subgraph rewriter (#160780)
Summary: att

Test Plan:
ci

Rollback Plan:

Differential Revision: D80367765

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160780
Approved by: https://github.com/SherlockNoMad, https://github.com/georgiaphillips
2025-08-18 04:25:05 +00:00
Sherlock Huang
c1722db0f7 [NativeRT] Make VariadicOpConverter and FuseListUnpackConverter for cpu nodes only (#159519)
Summary:
VariadicOpConverter and FuseListUnpackConverter would introduce ops that only have CPU kernels.

Currently, the graph passes are ran if static_dispatch is enabled.

As we plan to enable static_dispatch by default, this diff add the additional check for the graph pass to only work on the node that has all the inputs/outputs on CPU.

Test Plan:
CI

Rollback Plan:

Differential Revision: D79295640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159519
Approved by: https://github.com/dolpm, https://github.com/henryoier
2025-07-31 18:17:21 +00:00
PaliC
1b99c1859c [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-30 17:29:43 +00:00
Zhengxu Chen
8460131087 [nativert] Add OSS version of ModelRunner (#159268)
Summary: Implement a ModelRunner from scratch with the minimum features for OSS only

Test Plan:
test_export -r NativeRT

Rollback Plan:

Differential Revision: D78979812

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159268
Approved by: https://github.com/dolpm
2025-07-29 21:08:14 +00:00
PyTorch MergeBot
15a50dcf1c Revert "[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)"
This reverts commit eb73650723.

Reverted https://github.com/pytorch/pytorch/pull/158427 on behalf of https://github.com/ZainRizvi due to Reverting this as part of reverting the stack for https://github.com/pytorch/pytorch/pull/158288 ([comment](https://github.com/pytorch/pytorch/pull/158427#issuecomment-3099815367))
2025-07-21 23:14:57 +00:00
Tristan Rice
ab557421a4 [cca] [c10d] Refactor CUDAEventCache into separate files (#158616)
Summary:
Refactored CUDAEventCache from ProcessGroupNCCL.hpp/.cpp into dedicated header and implementation files for better code organization and maintainability.

Split out CUDAEventCache into:
- New header file: CUDAEventCache.hpp
- New implementation file: CUDAEventCache.cpp
- Updated build_variables.bzl to include the new file

This change improves code maintainability, readability, and follows better code organization practices.
---
> Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/)
[Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace)

Test Plan:
Verified build with:
```
buck build //caffe2/test/distributed:c10d
```
---
> Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/)
[Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158616
Approved by: https://github.com/fduwjj
2025-07-19 02:51:28 +00:00
PaliC
eb73650723 [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-18 05:23:00 +00:00
dolpm
51a708ffc6 [nativert] libtorch kernel registry (#157150)
Summary: att

Test Plan:
ci

Rollback Plan:

Differential Revision: D77451703

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157150
Approved by: https://github.com/georgiaphillips, https://github.com/henryoier
2025-07-16 12:36:55 +00:00
Tristan Rice
1b3d69b59f Work: block_current_stream API (#156883)
This implements a new `wait_stream` API in Work that matches how `wait` works for ProcessGroupNCCL for CPU based backends such as Gloo.

The idea is to support Gloo communication overlap in FSDPv2/HSDP with minimal changes to FSDP.

There was a previous attempt to make FSDPv2 use Work.wait but given the extensive stream semantics used it doesn't play nicely. https://github.com/pytorch/pytorch/pull/148780

This uses a "Baton" CUDA kernel which spinlocks on a pinned CPU tensor waiting for it to be set.

Test plan:

```
pytest test/distributed/test_c10d_gloo.py -v -k wait_stream
pytest test/distributed/test_c10d_nccl.py -v -k wait_stream
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156883
Approved by: https://github.com/kwen2501, https://github.com/fduwjj
2025-07-08 23:55:46 +00:00
Sheng Qin
f7130c097e [nativert] Move Executor to PyTorch core (#157514)
Test Plan:
CI

Rollback Plan:

Differential Revision: D77693984

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157514
Approved by: https://github.com/zhxchen17
2025-07-03 23:31:54 +00:00
Yidi Wu
aeffb68d34 [schema_upgrader] add C++ upgrader for json based upgrading (#156761)
Differential Revision: [D77459912](https://our.internmc.facebook.com/intern/diff/D77459912)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156761
Approved by: https://github.com/angelayi
2025-06-28 18:15:06 +00:00
Sheng Qin
88c6199db0 [nativert] Move KernelFactory to PyTorch core (#156913)
Summary: Kernel factory handles the kernel nodes initializations and different type of kernels executions.

Test Plan:
CI

Rollback Plan:

Differential Revision: D77346836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156913
Approved by: https://github.com/zhxchen17
2025-06-28 06:34:24 +00:00
PyTorch MergeBot
f810480dbe Revert "[schema_upgrader] add C++ upgrader for json based upgrading (#156761)"
This reverts commit 61712e6f2b.

Reverted https://github.com/pytorch/pytorch/pull/156761 on behalf of https://github.com/ydwu4 due to break linter test, which doesn't show up in the pr ([comment](https://github.com/pytorch/pytorch/pull/156761#issuecomment-3014918800))
2025-06-28 03:58:25 +00:00
Yidi Wu
61712e6f2b [schema_upgrader] add C++ upgrader for json based upgrading (#156761)
Differential Revision: [D77459912](https://our.internmc.facebook.com/intern/diff/D77459912)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156761
Approved by: https://github.com/angelayi
2025-06-27 23:50:19 +00:00
dolpm
7392470da4 [nativert] alias analyzer + layout planner/manager to pytorch core (#156897)
Summary: att

Test Plan:
ci - unit tests still have some unresolved deps but will move them later.

Rollback Plan:

Differential Revision: D77320950

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156897
Approved by: https://github.com/zhxchen17
2025-06-27 03:01:22 +00:00
dolpm
262654ee51 [nativert] move constantfolder to libtorch (#156918)
Summary: att -- unit tests will be migrated later, since they still have unresolved deps.

Test Plan:
ci

Rollback Plan:

Differential Revision: D77159278

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156918
Approved by: https://github.com/henryoier, https://github.com/zhxchen17
2025-06-26 21:26:37 +00:00
Dylan Maloy
d98fa4a103 implement SR's storage group planning algorithm (#156715)
Summary: att

Test Plan:
tested on a localnet. it's ~15% worse performance than greedy-by-size, but more performant.

local:
gbs: 110656b
dsg: 131584b

local_ro:
gbs: 38208
dsg: 44544

Differential Revision: D75653840

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156715
Approved by: https://github.com/zhxchen17
2025-06-25 22:43:40 +00:00
Sheng Qin
6c008e2fb5 [nativert] Move ParallelGraphExecutor to PyTorch core (#156751)
Summary: `ParallelGraphExecutor` inherits from `GraphExecutorBase` and executes all nodes in the graph in a parallel manner

Test Plan:
CI

Rollback Plan:

Differential Revision: D77088996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156751
Approved by: https://github.com/zhxchen17, https://github.com/dolpm
2025-06-25 06:54:45 +00:00
FFFrog
e8cf5ff564 Fix the Problems About Defining Static Variable in Inline Function (#147095)
Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations

- Remove unused header files
- Move common functionality to separate files to reduce dependencies between picklers and unpicklers
- Move the inline function that defines the static variable to .cc

Differential Revision: [D76266755](https://our.internmc.facebook.com/intern/diff/D76266755)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095
Approved by: https://github.com/cyyever, https://github.com/albanD

Co-authored-by: Edward Yang <ezyang@meta.com>
2025-06-25 01:59:10 +00:00
Yiming Zhou
310e8361c5 [nativert] Move PrimKernelRegistry to PyTorch core (#156506)
Summary:
Torch Native Runtime RFC: pytorch/rfcs#72
PrimKernelRegistry manages a small subset of kernel registry in NativeRT.
Including ListPack, ListUnpack, Input, Output, VarConcat, VarStack

Test Plan: Internal unittests

Differential Revision: D77034945

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156506
Approved by: https://github.com/zhxchen17
2025-06-24 21:42:41 +00:00
Jeddie Ji
4c59edf0c5 [nativert] Move call_torchbind_kernel (#156571)
Summary: Move call_torchbind_kernel target from internal sigmoid to pytorch

Test Plan:
Test Internally:

buck2 test mode/dev-nosan caffe2/test/cpp/nativert:op_kernel_test
buck build //sigmoid/core/kernels:kernel_factory
and all  sandcastle tests

Rollback Plan:

Differential Revision: D77118592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156571
Approved by: https://github.com/zhxchen17
2025-06-24 15:24:06 +00:00
dolpm
9665702c64 [nativert] reland D76832891 remove designated initializer cpp20 (#156565)
Summary: fix windows build broke in https://github.com/pytorch/pytorch/pull/156508

Test Plan:
ci

Rollback Plan:

Differential Revision: D77080420

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156565
Approved by: https://github.com/zhxchen17
2025-06-24 02:38:08 +00:00
Shangdi Yu
56b3bf0c74 [nativert] Move HigherOrderKernel (#156507)
Summary:
Torch Native Runtime RFC: https://github.com/pytorch/rfcs/pull/72
As part of the effort to open source TorchNativeRuntime (or what we call Sigmoid), we are moving the implementation to torch/:
fbcode/sigmoid/kernels -> fbcode/caffe2/torch/nativert/kernels

Test Plan: CI

Differential Revision: D77032074

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156507
Approved by: https://github.com/zhxchen17
2025-06-23 19:29:27 +00:00
PyTorch MergeBot
d846e21355 Revert "[nativert] move layout planner algorithms to libtorch (#156508)"
This reverts commit eab45643f2.

Reverted https://github.com/pytorch/pytorch/pull/156508 on behalf of https://github.com/atalman due to [GH job link](https://github.com/pytorch/pytorch/actions/runs/15793524714/job/44524067679) [HUD commit link](eab45643f2) ([comment](https://github.com/pytorch/pytorch/pull/156508#issuecomment-2993589983))
2025-06-21 13:42:40 +00:00