soulitzer
b3861ac8e7
[reland] Warn if AccumulateGrad stream does not match producer node stream ( #166136 )
...
docker-builds / docker-build (pytorch-linux-jammy-linter, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-executorch, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-onnx, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang18-asan, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-halide, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-triton-cpu, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.13-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.14-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-1-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-riscv64-py3.12-gcc14, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
ossf-scorecard / Scorecards analysis (push) Has been cancelled
Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled
Index PyTorch Tests for Target Determination / get-label-type (push) Has been cancelled
nightly / get-label-type (push) Has been cancelled
nightly / update-commit-hashes (main, .ci/docker/ci_commit_pins, triton, triton-lang) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, audio, pytorch) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, vision, pytorch) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, vllm, vllm-project) (push) Has been cancelled
Index PyTorch Tests for Target Determination / index (push) Has been cancelled
nightly / Link checks (push) Has been cancelled
nightly / docs build (push) Has been cancelled
nightly / docs push (push) Has been cancelled
ghstack-source-id: 59641aa32dc6fd027abf3276017432b693aa71f8
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/165065
Fixes #ISSUE_NUMBER
Opening a new PR for codev
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166136
Approved by: https://github.com/ngimel
2025-11-01 12:33:48 +00:00
Yuanyuan Chen
f0745ddb11
Replace c10::call_once with static initialization ( #166381 )
...
This PR replaces c10::call_once calls with static initialization when possible. C++11 semantics guarantees that static initialization is atomic. Static initialization also has lower cost than using c10::call_once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166381
Approved by: https://github.com/malfet
2025-11-01 07:09:40 +00:00
Yuanyuan Chen
e2dc32f4ba
Replace decltype(auto) with auto ( #166537 )
...
This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537
Approved by: https://github.com/Skylion007
2025-11-01 00:30:23 +00:00
PyTorch MergeBot
2699f5410b
Revert "[xpu][feature] Integrate OneDNN SDPA training forward/backward into XPU OVERRIDEABLE Backend ( #162454 )"
...
This reverts commit fd68d409ad .
Reverted https://github.com/pytorch/pytorch/pull/162454 on behalf of https://github.com/atalman due to internal build failure ([comment](https://github.com/pytorch/pytorch/pull/162454#issuecomment-3475009089 ))
2025-10-31 21:58:52 +00:00
PyTorch MergeBot
5bcfdae71d
Revert "Make PT2 compile backprop through custom op without autograd key a hard error ( #166367 )"
...
This reverts commit 4acc66f119 .
Reverted https://github.com/pytorch/pytorch/pull/166367 on behalf of https://github.com/atalman due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166367#issuecomment-3473150269 ))
2025-10-31 13:44:05 +00:00
fengqing.lu
fd68d409ad
[xpu][feature] Integrate OneDNN SDPA training forward/backward into XPU OVERRIDEABLE Backend ( #162454 )
...
This is the second PR split from https://github.com/pytorch/pytorch/pull/156272
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162454
Approved by: https://github.com/guangyey , https://github.com/EikanWang , https://github.com/drisspg
2025-10-31 11:20:38 +00:00
Andy (An) Wang
d3be06cbdc
[MTIAGraph][Pytorch][2/n] Add binding for Python to C++, and hook for Pytorch to Fbcode ( #165963 )
...
Summary:
This diff is the binding and hook layer for MTIA Graph, including
1. binding between Python and C++
2. hook between Pytorch and mtia fbcode
<img width="1780" height="754" alt="image" src="https://github.com/user-attachments/assets/31e24e5b-8324-42d8-8d3b-59536bc18340 " />
[Doc](https://docs.google.com/document/d/1Q3xdZAIqhBvuy2HxGDfJyXVmxYXUEeYSZSwsp7bcJF8/edit?tab=t.osb46a42t6wb#heading=h.ayp9tkk08x00 )
Test Plan: Will be tested in the python implementation which will use the binding and hook
Differential Revision: D84457757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165963
Approved by: https://github.com/malfet , https://github.com/albanD
2025-10-31 02:52:51 +00:00
Yu, Guangye
0ec0549823
Introduce a new API torch.xpu.get_per_process_memory_fraction ( #165511 )
...
# Motivation
Aligned with other backends, this PR introduces a new API torch.xpu.get_per_process_memory_fraction to allow user to retrieve the allowed memory fraction per a single process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165511
Approved by: https://github.com/EikanWang , https://github.com/ezyang
ghstack dependencies: #165508 , #165509 , #165510
2025-10-30 19:30:09 +00:00
Scott Wolchok
639a0b1239
Remove torch.distributed.tensor.OpSchema.has_symints ( #163667 )
...
It appears to be unused based on `cd torch; rg has_symints`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163667
Approved by: https://github.com/xmfan , https://github.com/azahed98 , https://github.com/albanD
ghstack dependencies: #162990
2025-10-30 18:57:17 +00:00
FFFrog
398775a43e
[CodeClean] Replace std::runtime_error with TORCH_CHECK ( #165119 )
...
As the title stated.
**Changes**:
- torch/csrc/inductor(Part 2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165119
Approved by: https://github.com/janeyx99
ghstack dependencies: #165139
2025-10-30 18:43:58 +00:00
FFFrog
fcd5f8c352
[CodeClean] Remove the Unused MACRO for AOT Inductor Runtime ( #165139 )
...
As the title stated.
- AOTI_TORCH_CHECK depend on TORCH_CHECK_MSG which located in c10/util/Exception.h, which maybe break BC
- AOTI_TORCH_CHECK is not used everywhere
- STD_TORCH_CHECK have ABI check tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165139
Approved by: https://github.com/Skylion007 , https://github.com/janeyx99
2025-10-30 18:43:58 +00:00
Edward Z. Yang
4acc66f119
Make PT2 compile backprop through custom op without autograd key a hard error ( #166367 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166367
Approved by: https://github.com/bdhirsh
2025-10-30 18:43:07 +00:00
PyTorch MergeBot
694d205143
Revert "shrink_group implementation to expose ncclCommShrink API ( #164518 )"
...
This reverts commit 311ea0dec0 .
Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/atalman due to breaks internal builds Error: from logging_utils import ( ModuleNotFoundError: No module named 'logging_utils' ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3469308568 ))
2025-10-30 17:52:29 +00:00
Scott Wolchok
6a5a436624
DTensor: C++ compute_global_tensor_info ( #162990 )
...
compute_global_tensor_info is on the hot path for DTensor.{from,to}_local. More incremental progress toward C++.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162990
Approved by: https://github.com/ezyang
2025-10-30 15:10:54 +00:00
linhaifeng
369f2d6951
[3/N] fix typo in other folders ( #166606 )
...
fix typo in other folders
#166374
#166126
_typos.toml
```bash
[files]
extend-exclude = ["tools/linter/dictionary.txt"]
[default.extend-words]
nd = "nd"
arange = "arange"
Nd = "Nd"
GLOBALs = "GLOBALs"
hte = "hte"
iy = "iy"
PN = "PN"
Dout = "Dout"
optin = "optin"
gam = "gam"
PTD = "PTD"
Sur = "Sur"
nin = "nin"
tme = "tme"
inpt = "inpt"
mis = "mis"
Raison = "Raison"
ouput = "ouput"
nto = "nto"
Onwer = "Onwer"
callibrate = "callibrate"
ser = "ser"
Metdata = "Metdata"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606
Approved by: https://github.com/ezyang
2025-10-30 10:30:40 +00:00
Bruce Chang
311ea0dec0
shrink_group implementation to expose ncclCommShrink API ( #164518 )
...
Closes #164529
To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink ) API to PyTorch.
This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.
For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518
Approved by: https://github.com/kwen2501
2025-10-30 01:50:54 +00:00
Michael Lazos
c54e2c5b41
[User-streams] Make torch.Event weakref compatible ( #164522 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522
Approved by: https://github.com/williamwen42
ghstack dependencies: #164304
2025-10-29 23:06:31 +00:00
Michael Lazos
c3047938a0
[user-streams] Make device-agnostic streams weakref compatible ( #164304 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304
Approved by: https://github.com/williamwen42 , https://github.com/colesbury
2025-10-29 23:06:31 +00:00
Shangdi Yu
d2eff5d454
Add python stack trace to AOTI generated code ( #160539 )
...
Summary:
We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel.
This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag.
Example output code:
```cpp
#include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h>
namespace torch::aot_inductor {
thread_local KernelContext* tls_kernel_context = nullptr;
}
// Other code .....
void AOTInductorModel::run_impl(
AtenTensorHandle*
input_handles, // array of input AtenTensorHandle; handles
// are stolen; the array itself is borrowed
AtenTensorHandle*
output_handles, // array for writing output AtenTensorHandle; handles
// will be stolen by the caller; the array itself is
// borrowed
DeviceStreamType stream,
AOTIProxyExecutorHandle proxy_executor
) {
__check_inputs_outputs(input_handles, output_handles);
auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4);
auto arg2_1 = std::move(inputs[0]);
auto arg3_1 = std::move(inputs[1]);
auto arg4_1 = std::move(inputs[2]);
auto arg5_1 = std::move(inputs[3]);
[[maybe_unused]] auto& fc1_weight = constants_->at(0);
[[maybe_unused]] auto& fc1_bias = constants_->at(1);
inputs.clear();
[[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
static constexpr int64_t int_array_0[] = {8L, 16L};
static constexpr int64_t int_array_1[] = {16L, 1L};
AtenTensorHandle buf0_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
RAIIAtenTensorHandle buf0(buf0_handle);
// Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:1
static constexpr int64_t int_array_2[] = {10L, 16L};
static constexpr int64_t int_array_3[] = {1L, 10L};
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward
x = self.fc1(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward
return F.linear(input, self.weight, self.bias)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L));
}
arg2_1.reset();
auto buf1 = std::move(buf0); // reuse
static constexpr int64_t int_array_4[] = {10L, 20L};
static constexpr int64_t int_array_5[] = {20L, 1L};
AtenTensorHandle buf2_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle));
RAIIAtenTensorHandle buf2(buf2_handle);
// [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2
{
KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward
x = self.sigmoid(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward
return torch.sigmoid(input)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward
x = self.relu(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward
return F.relu(input, inplace=self.inplace)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward
d = a * 3.14
)");
cpp_fused_mul_relu_sigmoid_0((float*)(buf1.data_ptr()), (const float*)(arg3_1.data_ptr()), (float*)(buf2.data_ptr()));
}
arg3_1.reset();
static constexpr int64_t int_array_6[] = {10L, 30L};
static constexpr int64_t int_array_7[] = {30L, 1L};
AtenTensorHandle buf3_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle));
RAIIAtenTensorHandle buf3(buf3_handle);
// Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:3
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward
y = torch.addmm(c, d, b)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L));
}
arg4_1.reset();
arg5_1.reset();
buf2.reset();
auto buf4 = std::move(buf3); // reuse
// [Provenance debug handles] cpp_fused_gelu_1:4
{
KernelContextGuard _ctx("cpp_fused_gelu_1", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward
z = torch.nn.functional.gelu(y)
)");
cpp_fused_gelu_1((float*)(buf4.data_ptr()));
}
output_handles[0] = buf1.release();
output_handles[1] = buf4.release();
} // AOTInductorModel::run_impl
```
Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces
```
Rollback Plan:
Differential Revision: D78436007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539
Approved by: https://github.com/yiming0416
2025-10-29 22:47:52 +00:00
Jeff Daily
d401e4e70a
[ROCm][CUDA] add unit test utility busy_wait_for_flag ( #166218 )
...
torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These **must** be run on separate streams or it will deadlock.
When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy.
Fixes #120318 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-29 22:40:23 +00:00
rraminen
deb776319b
[ROCm] Reduce duplication in bfloat16_support_literal definition ( #166147 )
...
This PR refactors the bfloat16_support_literal constant in the PyTorch build logic to eliminate duplicated ROCm-specific code.
Previously, there were two nearly identical branches for ROCM_VERSION < 70000 and ROCM_VERSION >= 70000, differing only by a single typedef. These have been unified into one conditional block with a minimal version guard inside. (https://github.com/ROCm/pytorch/pull/2502 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166147
Approved by: https://github.com/jerrymannil , https://github.com/jeffdaily
2025-10-29 16:59:03 +00:00
PyTorch MergeBot
5fd1d41e62
Revert "[user-streams] Make device-agnostic streams weakref compatible ( #164304 )"
...
This reverts commit bfc2050db9 .
Reverted https://github.com/pytorch/pytorch/pull/164304 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605 ) [HUD commit link](cde81e92b9 ) ([comment](https://github.com/pytorch/pytorch/pull/164304#issuecomment-3462489278 ))
2025-10-29 16:09:54 +00:00
PyTorch MergeBot
5cdbcb5233
Revert "[User-streams] Make torch.Event weakref compatible ( #164522 )"
...
This reverts commit cde81e92b9 .
Reverted https://github.com/pytorch/pytorch/pull/164522 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605 ) [HUD commit link](cde81e92b9 ) ([comment](https://github.com/pytorch/pytorch/pull/164522#issuecomment-3462450571 ))
2025-10-29 16:03:03 +00:00
Mikayla Gawarecki
eae701cad0
Add scaffolding for StableIValue FC/BC (no PoC) ( #164332 )
...
1. Add `extension_build_version` and `is_internal` to `FromImpl`/`ToImpl` (this will be useful for future if we need to break the BC of any type) #163832 has the PoC of how we would actually use this system
2. Add `aoti_torch_library_impl_v2` that takes in an additional `extension_build_version` argument, updates callsite in `torch/csrc/stable/library.h` to always pass `TORCH_ABI_VERSION` for this argument
3. Add `extension_build_version` to `from_ivalue` and `to_ivalue` and update all callsites
4. Add a private `_from` and `_to` that pass `is_internal=True` to `FromImpl`/`ToImpl`, making it easier to reason about what is being called from libtorch-land / extension-land
**Note: This PR does not include a linter that tells the user to update from/to if changing the ABI of a type in headeronly, which I intend to do in https://github.com/pytorch/pytorch/pull/163998 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164332
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356 , #166373 , #163683
2025-10-29 15:41:45 +00:00
Mikayla Gawarecki
8f51556daa
Add scaffolding for aoti_torch_call_dispatcher BC with native ops ( #163683 )
...
Part 1 of plan in https://docs.google.com/document/d/1MaX51H5aEQE5XnOlnZIpf9oCYwzGrTWkgBACxNzsmWE/edit?usp=sharing
- Upgrade `aoti_torch_call_dispatcher` to v2 with an `extension_build_version`
- Allow registration of StableIValue stack --> IValue stack adapters for schema changes
#### Note: This PR does not include a linter that tells the user to add the upgrader if the schema changes, which is an important piece that will be added in a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163683
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356 , #166373
2025-10-29 15:41:45 +00:00
Mikayla Gawarecki
c0bbda37e8
Move static from_ivalue/to_ivalue to new shim_common.cpp ( #166373 )
...
Move `from_ivalue` and `to_ivalue` and their dependents `StableIValueBoxedKernel`, `aoti_torch_library_impl` `aoti_torch_call_dispatcher` into new (non-aoti shim_common.cpp)
This is in prep for the above PRs where I add v2s (`torch_call_dispatcher` and `torch_library_impl`) that are versioning aware
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166373
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356
2025-10-29 15:41:36 +00:00
Mikayla Gawarecki
fefb546b91
Add TORCH_TARGET_VERSION for stable ABI ( #164356 )
...
And update it so comparisons can be done by the preprocessor
**Note: We also need to gate in shim.h and figure out how to enforce this**
Differential Revision: [D85683549](https://our.internmc.facebook.com/intern/diff/D85683549 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164356
Approved by: https://github.com/janeyx99
2025-10-29 15:41:28 +00:00
Michael Lazos
cde81e92b9
[User-streams] Make torch.Event weakref compatible ( #164522 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522
Approved by: https://github.com/williamwen42
ghstack dependencies: #162903 , #164343 , #164344 , #164507 , #162901 , #164304
2025-10-29 04:57:23 +00:00
Michael Lazos
bfc2050db9
[user-streams] Make device-agnostic streams weakref compatible ( #164304 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304
Approved by: https://github.com/williamwen42 , https://github.com/colesbury
ghstack dependencies: #162903 , #164343 , #164344 , #164507 , #162901
2025-10-29 04:57:23 +00:00
Yu, Guangye
753d9bd806
Introduce a new API torch.xpu.set_per_process_memory_fraction ( #165510 )
...
# Motivation
Aligned with other backends, this PR introduces a new API `torch.xpu.set_per_process_memory_fraction` to allow user to customize the allowed memory per a single process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165510
Approved by: https://github.com/EikanWang , https://github.com/ezyang
ghstack dependencies: #165508 , #165509
2025-10-29 03:24:52 +00:00
Yingji Zhang
17bdb232e1
[GR v0] AOTI Enablement - Fix GR model AOTI inplace update by skipping empty named ( #165970 ) ( #166037 )
...
Summary:
Add a gflag to allow us skip empty constant named parameter during
dense loading. In [vm_parameters.py](https://fburl.com/code/7xr9ihwy ), there is
a constant _empty_tensor parameter used for the model. This constant parameter
is skipped in XL weights during model publish because it is empty. This will
break model inplace update later because it will be reported by the AOTI
container but cannot be found from the model merge weights. This diff will
allow us to solve the problem.
Test Plan: Verified inplace update in job https://www.internalfb.com/vanguard/serving_test_cases/1165842932095688
Reviewed By: muchulee8, joannec3634
Differential Revision: D85082330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166037
Approved by: https://github.com/muchulee8 , https://github.com/jcwchen
2025-10-28 01:50:36 +00:00
Scott Wolchok
7d16fcf2df
Re-re-re-re-apply "C++-accessible Placements via pybind11 ( #163030 )" ( #166132 )
...
Was reverted (again!) due to a merge conflict that crept in sometime during the "export to github -> land internally -> merge on github" process.
D85096233
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166132
Approved by: https://github.com/Skylion007 , https://github.com/ezyang , https://github.com/malfet
2025-10-27 21:19:32 +00:00
Weinan Liu
fa4cb91846
add support for ir scalar literal parsing for inf/-inf/True/False ( #163924 )
...
Currently the ir parser doesn't support parse ir like
```
graph():
%12 : float = prim::Constant[value=-inf]()
%13 : float = prim::Constant[value=inf]()
%14 : bool = prim::Constant[value=True]()
%15 : bool = prim::Constant[value=False]()
return (%12)
```
So the python script below will throw error.
```
#!/bin/env python
import torch
def test():
return [True, False]
f = torch.jit.script(test)
torch._C._jit_pass_constant_propagation(f.graph)
ts_str = f.graph.__repr__()
print(ts_str)
ts = torch.parse_ir(ts_str)
func = torch._C._create_function_from_graph("forward", ts)
ret = func()
assert ret == [True, False]
def test():
return [float("inf"), float("-inf")]
f = torch.jit.script(test)
torch._C._jit_pass_constant_propagation(f.graph)
ts_str = f.graph.__repr__()
print(ts_str)
ts = torch.parse_ir(ts_str)
func = torch._C._create_function_from_graph("forward", ts)
ret = func()
assert ret == [float("inf"), float("-inf")]
```
I add "inf" and bool cases for IRParser::parseScalarLiteral in irparser.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163924
Approved by: https://github.com/ezyang
2025-10-27 05:10:21 +00:00
Chang Pan
74e53d0761
[TorchScript] clearer debug for ConcreteModuleType::findSubmoduleConcreteType ( #166192 )
...
Summary:
right now the log is just
```
RuntimeError: it != data_.modules_.end() INTERNAL ASSERT FAILED at "fbcode/caffe2/torch/csrc/jit/frontend/concrete_module_type.cpp":207, please report a bug to PyTorch.
```
we have no clue where the error happens
https://fb.workplace.com/groups/gpuinference/posts/789257990578348/?comment_id=789284783909002&reply_comment_id=789415260562621
Test Plan: UT
Reviewed By: jcwchen
Differential Revision: D80020093
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166192
Approved by: https://github.com/gmagogsfm
2025-10-25 14:07:54 +00:00
Jason Ansel
78bcfcf870
[fx] Optimize torch.fx.Node.replace_all_uses_with ( #165889 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165889
Approved by: https://github.com/aorenste
2025-10-25 03:44:41 +00:00
Natalia Gimelshein
2efcf3ca98
Reverts #163712 and forces allgather/scatter inputs/outputs to be contiguous ( #166181 )
...
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166181
Approved by: https://github.com/kwen2501
2025-10-25 02:43:10 +00:00
Jane Xu
cddd5f74ab
Hide stable Library structs instead of using anon namespace ( #166078 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166078
Approved by: https://github.com/malfet
ghstack dependencies: #166076 , #166077
2025-10-25 00:18:26 +00:00
Jane Xu
dfdb68e51f
Hide all APIs in torch::stable ( #166077 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166077
Approved by: https://github.com/malfet
ghstack dependencies: #166076
2025-10-25 00:18:26 +00:00
PyTorch MergeBot
75b8295868
Revert "Warn if AccumulateGrad stream does not match producer node stream ( #165065 )"
...
This reverts commit 12f742941d .
Reverted https://github.com/pytorch/pytorch/pull/165065 on behalf of https://github.com/clee2000 due to broke internal builds D85273204 usages of TORCH_API void add need to be updated? ([comment](https://github.com/pytorch/pytorch/pull/165065#issuecomment-3438061854 ))
2025-10-23 17:02:49 +00:00
Eddie Yan
e64a814ae7
[CUDA] Add experimental green context support for SM carveout ( #159104 )
...
Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here...
Built on top of @drisspg 's branch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104
Approved by: https://github.com/ngimel , https://github.com/malfet , https://github.com/kwen2501
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-10-22 21:38:52 +00:00
soulitzer
12f742941d
Warn if AccumulateGrad stream does not match producer node stream ( #165065 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165065
Approved by: https://github.com/ngimel
2025-10-22 17:33:27 +00:00
zhudada
2998abd777
[Code Clean] Better error handling in torch/csrc/distributed ( #165053 )
...
Replace the runtime_error of the vallina C++ exceptions with TORCH_CEHCK
Including:
torch/csrc/distributed/*
fix partialy #148114
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165053
Approved by: https://github.com/FFFrog , https://github.com/albanD
2025-10-22 01:40:36 +00:00
Han Chao
a1005427bf
[xpu] Support high stream for ProcessGroupXCCL ( #163049 )
...
Add high priority stream support for ProcessGroupXCCL. Just like CUDA, XPU streams also support execution with higher priority compared to other streams. Implementation in https://github.com/intel/torch-xpu-ops/pull/1715 , add register here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163049
Approved by: https://github.com/guangyey , https://github.com/gujinghui , https://github.com/EikanWang , https://github.com/albanD
2025-10-22 00:54:25 +00:00
Zhaoqi Zhu
04adfe5ba9
Make Backend::setGroupUid virtual ( #165957 )
...
As titled, so that we may customize this function in custom backends
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165957
Approved by: https://github.com/d4l3k
2025-10-21 21:33:24 +00:00
PyTorch MergeBot
ad4dc52bf6
Revert "shrink_group implementation to expose ncclCommShrink API ( #164518 )"
...
This reverts commit 4e643422f6 .
Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/albanD due to Breaks lint ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3429426503 ))
2025-10-21 20:24:14 +00:00
Bruce Chang
4e643422f6
shrink_group implementation to expose ncclCommShrink API ( #164518 )
...
Closes #164529
To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink ) API to PyTorch.
This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.
For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518
Approved by: https://github.com/kwen2501
2025-10-21 19:47:33 +00:00
Jason Ansel
3c3b278872
[reland][fx] Move Node._prepend/Node._remove_from_list to C++ ( #165882 )
...
Relands #148261 that was reverted by #150542
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165882
Approved by: https://github.com/ezyang
2025-10-21 19:43:55 +00:00
Gufan Yin
e6ba4d0725
Back out "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed ( #164939 )" ( #165910 )
...
Summary:
Original commit changeset: d6d62d0c96dd
Original Phabricator Diff: D84468451 and D84613184
D84468451 caused CUDA OutOfMemoryError in model.
Test Plan:
D84468451 was found through bisect. Also double checked on recent trunk 9866939225248c2adc307be7a804b26db0b9b555: f815887517
With this diff that backs out D84468451 and D84613184 : f816114560
Differential Revision: D85025378
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165910
Approved by: https://github.com/clee2000
2025-10-21 16:36:38 +00:00
lichuyang
8b3dc0d1b0
Better error handling in torch/csrc/jit/runtime/* ( #165118 )
...
Refactor error handling by using TORCH_CHECK for improved clarity in constants and scope management in some files in torch/csrc/jit/runtime/*
Fixes some parts of ISSUE https://github.com/pytorch/pytorch/issues/148114
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165118
Approved by: https://github.com/FFFrog , https://github.com/albanD
2025-10-21 15:22:49 +00:00
lichuyang
410e6a4321
Better error handling in torch/csrc/jit/frontend/* ( #165213 )
...
Refactor error handling by using TORCH_CHECK for improved clarity in constants and scope management in some files in torch/csrc/jit/frontend/*
Fixes some parts of ISSUE https://github.com/pytorch/pytorch/issues/148114
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165213
Approved by: https://github.com/FFFrog , https://github.com/albanD
2025-10-21 13:54:59 +00:00