pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
soulitzer	b3861ac8e7	[reland] Warn if AccumulateGrad stream does not match producer node stream (#166136 ) Some checks failed docker-builds / docker-build (pytorch-linux-jammy-linter, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-executorch, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-onnx, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-clang18-asan, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.10-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.10-gcc11, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.12-halide, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.12-triton-cpu, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.13-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-py3.14-clang12, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-1-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-noble-riscv64-py3.12-gcc14, linux.12xlarge) (push) Has been cancelled Details docker-builds / docker-build (pytorch-linux-noble-rocm-n-py3, linux.12xlarge) (push) Has been cancelled Details ossf-scorecard / Scorecards analysis (push) Has been cancelled Details Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled Details Index PyTorch Tests for Target Determination / get-label-type (push) Has been cancelled Details nightly / get-label-type (push) Has been cancelled Details nightly / update-commit-hashes (main, .ci/docker/ci_commit_pins, triton, triton-lang) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, audio, pytorch) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, vision, pytorch) (push) Has been cancelled Details nightly / update-commit-hashes (main, .github/ci_commit_pins, vllm, vllm-project) (push) Has been cancelled Details Index PyTorch Tests for Target Determination / index (push) Has been cancelled Details nightly / Link checks (push) Has been cancelled Details nightly / docs build (push) Has been cancelled Details nightly / docs push (push) Has been cancelled Details ghstack-source-id: 59641aa32dc6fd027abf3276017432b693aa71f8 Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/165065 Fixes #ISSUE_NUMBER Opening a new PR for codev Pull Request resolved: https://github.com/pytorch/pytorch/pull/166136 Approved by: https://github.com/ngimel	2025-11-01 12:33:48 +00:00
Yuanyuan Chen	f0745ddb11	Replace c10::call_once with static initialization (#166381 ) This PR replaces c10::call_once calls with static initialization when possible. C++11 semantics guarantees that static initialization is atomic. Static initialization also has lower cost than using c10::call_once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166381 Approved by: https://github.com/malfet	2025-11-01 07:09:40 +00:00
Yuanyuan Chen	e2dc32f4ba	Replace decltype(auto) with auto (#166537 ) This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537 Approved by: https://github.com/Skylion007	2025-11-01 00:30:23 +00:00
PyTorch MergeBot	2699f5410b	Revert "[xpu][feature] Integrate OneDNN SDPA training forward/backward into XPU OVERRIDEABLE Backend (#162454 )" This reverts commit `fd68d409ad`. Reverted https://github.com/pytorch/pytorch/pull/162454 on behalf of https://github.com/atalman due to internal build failure ([comment](https://github.com/pytorch/pytorch/pull/162454#issuecomment-3475009089))	2025-10-31 21:58:52 +00:00
PyTorch MergeBot	5bcfdae71d	Revert "Make PT2 compile backprop through custom op without autograd key a hard error (#166367 )" This reverts commit `4acc66f119`. Reverted https://github.com/pytorch/pytorch/pull/166367 on behalf of https://github.com/atalman due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166367#issuecomment-3473150269))	2025-10-31 13:44:05 +00:00
fengqing.lu	fd68d409ad	[xpu][feature] Integrate OneDNN SDPA training forward/backward into XPU OVERRIDEABLE Backend (#162454 ) This is the second PR split from https://github.com/pytorch/pytorch/pull/156272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162454 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/drisspg	2025-10-31 11:20:38 +00:00
Andy (An) Wang	d3be06cbdc	[MTIAGraph][Pytorch][2/n] Add binding for Python to C++, and hook for Pytorch to Fbcode (#165963 ) Summary: This diff is the binding and hook layer for MTIA Graph, including 1. binding between Python and C++ 2. hook between Pytorch and mtia fbcode <img width="1780" height="754" alt="image" src="https://github.com/user-attachments/assets/31e24e5b-8324-42d8-8d3b-59536bc18340" /> [Doc](https://docs.google.com/document/d/1Q3xdZAIqhBvuy2HxGDfJyXVmxYXUEeYSZSwsp7bcJF8/edit?tab=t.osb46a42t6wb#heading=h.ayp9tkk08x00) Test Plan: Will be tested in the python implementation which will use the binding and hook Differential Revision: D84457757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165963 Approved by: https://github.com/malfet, https://github.com/albanD	2025-10-31 02:52:51 +00:00
Yu, Guangye	0ec0549823	Introduce a new API torch.xpu.get_per_process_memory_fraction (#165511 ) # Motivation Aligned with other backends, this PR introduces a new API torch.xpu.get_per_process_memory_fraction to allow user to retrieve the allowed memory fraction per a single process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165511 Approved by: https://github.com/EikanWang, https://github.com/ezyang ghstack dependencies: #165508, #165509, #165510	2025-10-30 19:30:09 +00:00
Scott Wolchok	639a0b1239	Remove torch.distributed.tensor.OpSchema.has_symints (#163667 ) It appears to be unused based on `cd torch; rg has_symints`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163667 Approved by: https://github.com/xmfan, https://github.com/azahed98, https://github.com/albanD ghstack dependencies: #162990	2025-10-30 18:57:17 +00:00
FFFrog	398775a43e	[CodeClean] Replace std::runtime_error with TORCH_CHECK (#165119 ) As the title stated. Changes: - torch/csrc/inductor(Part 2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165119 Approved by: https://github.com/janeyx99 ghstack dependencies: #165139	2025-10-30 18:43:58 +00:00
FFFrog	fcd5f8c352	[CodeClean] Remove the Unused MACRO for AOT Inductor Runtime (#165139 ) As the title stated. - AOTI_TORCH_CHECK depend on TORCH_CHECK_MSG which located in c10/util/Exception.h, which maybe break BC - AOTI_TORCH_CHECK is not used everywhere - STD_TORCH_CHECK have ABI check tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165139 Approved by: https://github.com/Skylion007, https://github.com/janeyx99	2025-10-30 18:43:58 +00:00
Edward Z. Yang	4acc66f119	Make PT2 compile backprop through custom op without autograd key a hard error (#166367 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166367 Approved by: https://github.com/bdhirsh	2025-10-30 18:43:07 +00:00
PyTorch MergeBot	694d205143	Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )" This reverts commit `311ea0dec0`. Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/atalman due to breaks internal builds Error: from logging_utils import ( ModuleNotFoundError: No module named 'logging_utils' ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3469308568))	2025-10-30 17:52:29 +00:00
Scott Wolchok	6a5a436624	DTensor: C++ compute_global_tensor_info (#162990 ) compute_global_tensor_info is on the hot path for DTensor.{from,to}_local. More incremental progress toward C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162990 Approved by: https://github.com/ezyang	2025-10-30 15:10:54 +00:00
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
Bruce Chang	311ea0dec0	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/kwen2501	2025-10-30 01:50:54 +00:00
Michael Lazos	c54e2c5b41	[User-streams] Make torch.Event weakref compatible (#164522 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522 Approved by: https://github.com/williamwen42 ghstack dependencies: #164304	2025-10-29 23:06:31 +00:00
Michael Lazos	c3047938a0	[user-streams] Make device-agnostic streams weakref compatible (#164304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304 Approved by: https://github.com/williamwen42, https://github.com/colesbury	2025-10-29 23:06:31 +00:00
Shangdi Yu	d2eff5d454	Add python stack trace to AOTI generated code (#160539 ) Summary: We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel. This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag. Example output code: ```cpp #include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h> namespace torch::aot_inductor { thread_local KernelContext* tls_kernel_context = nullptr; } // Other code ..... void AOTInductorModel::run_impl( AtenTensorHandle* input_handles, // array of input AtenTensorHandle; handles // are stolen; the array itself is borrowed AtenTensorHandle* output_handles, // array for writing output AtenTensorHandle; handles // will be stolen by the caller; the array itself is // borrowed DeviceStreamType stream, AOTIProxyExecutorHandle proxy_executor ) { __check_inputs_outputs(input_handles, output_handles); auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4); auto arg2_1 = std::move(inputs[0]); auto arg3_1 = std::move(inputs[1]); auto arg4_1 = std::move(inputs[2]); auto arg5_1 = std::move(inputs[3]); [[maybe_unused]] auto& fc1_weight = constants_->at(0); [[maybe_unused]] auto& fc1_bias = constants_->at(1); inputs.clear(); [[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(this->kernels_.get()); static constexpr int64_t int_array_0[] = {8L, 16L}; static constexpr int64_t int_array_1[] = {16L, 1L}; AtenTensorHandle buf0_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle)); RAIIAtenTensorHandle buf0(buf0_handle); // Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:1 static constexpr int64_t int_array_2[] = {10L, 16L}; static constexpr int64_t int_array_3[] = {1L, 10L}; { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward x = self.fc1(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward return F.linear(input, self.weight, self.bias) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L)); } arg2_1.reset(); auto buf1 = std::move(buf0); // reuse static constexpr int64_t int_array_4[] = {10L, 20L}; static constexpr int64_t int_array_5[] = {20L, 1L}; AtenTensorHandle buf2_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle)); RAIIAtenTensorHandle buf2(buf2_handle); // [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2 { KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward x = self.sigmoid(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward return torch.sigmoid(input) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward x = self.relu(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward return F.relu(input, inplace=self.inplace) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward d = a 3.14 )"); cpp_fused_mul_relu_sigmoid_0((float)(buf1.data_ptr()), (const float)(arg3_1.data_ptr()), (float)(buf2.data_ptr())); } arg3_1.reset(); static constexpr int64_t int_array_6[] = {10L, 30L}; static constexpr int64_t int_array_7[] = {30L, 1L}; AtenTensorHandle buf3_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle)); RAIIAtenTensorHandle buf3(buf3_handle); // Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:3 { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward y = torch.addmm(c, d, b) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L)); } arg4_1.reset(); arg5_1.reset(); buf2.reset(); auto buf4 = std::move(buf3); // reuse // [Provenance debug handles] cpp_fused_gelu_1:4 { KernelContextGuard _ctx("cpp_fused_gelu_1", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward z = torch.nn.functional.gelu(y) )"); cpp_fused_gelu_1((float)(buf4.data_ptr())); } output_handles[0] = buf1.release(); output_handles[1] = buf4.release(); } // AOTInductorModel::run_impl ``` Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D78436007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539 Approved by: https://github.com/yiming0416	2025-10-29 22:47:52 +00:00
Jeff Daily	d401e4e70a	[ROCm][CUDA] add unit test utility busy_wait_for_flag (#166218 ) torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These must be run on separate streams or it will deadlock. When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy. Fixes #120318. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-29 22:40:23 +00:00
rraminen	deb776319b	[ROCm] Reduce duplication in bfloat16_support_literal definition (#166147 ) This PR refactors the bfloat16_support_literal constant in the PyTorch build logic to eliminate duplicated ROCm-specific code. Previously, there were two nearly identical branches for ROCM_VERSION < 70000 and ROCM_VERSION >= 70000, differing only by a single typedef. These have been unified into one conditional block with a minimal version guard inside. (https://github.com/ROCm/pytorch/pull/2502) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166147 Approved by: https://github.com/jerrymannil, https://github.com/jeffdaily	2025-10-29 16:59:03 +00:00
PyTorch MergeBot	5fd1d41e62	Revert "[user-streams] Make device-agnostic streams weakref compatible (#164304 )" This reverts commit `bfc2050db9`. Reverted https://github.com/pytorch/pytorch/pull/164304 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605) [HUD commit link](`cde81e92b9`) ([comment](https://github.com/pytorch/pytorch/pull/164304#issuecomment-3462489278))	2025-10-29 16:09:54 +00:00
PyTorch MergeBot	5cdbcb5233	Revert "[User-streams] Make torch.Event weakref compatible (#164522 )" This reverts commit `cde81e92b9`. Reverted https://github.com/pytorch/pytorch/pull/164522 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605) [HUD commit link](`cde81e92b9`) ([comment](https://github.com/pytorch/pytorch/pull/164522#issuecomment-3462450571))	2025-10-29 16:03:03 +00:00
Mikayla Gawarecki	eae701cad0	Add scaffolding for StableIValue FC/BC (no PoC) (#164332 ) 1. Add `extension_build_version` and `is_internal` to `FromImpl`/`ToImpl` (this will be useful for future if we need to break the BC of any type) #163832 has the PoC of how we would actually use this system 2. Add `aoti_torch_library_impl_v2` that takes in an additional `extension_build_version` argument, updates callsite in `torch/csrc/stable/library.h` to always pass `TORCH_ABI_VERSION` for this argument 3. Add `extension_build_version` to `from_ivalue` and `to_ivalue` and update all callsites 4. Add a private `_from` and `_to` that pass `is_internal=True` to `FromImpl`/`ToImpl`, making it easier to reason about what is being called from libtorch-land / extension-land Note: This PR does not include a linter that tells the user to update from/to if changing the ABI of a type in headeronly, which I intend to do in https://github.com/pytorch/pytorch/pull/163998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164332 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356, #166373, #163683	2025-10-29 15:41:45 +00:00
Mikayla Gawarecki	8f51556daa	Add scaffolding for aoti_torch_call_dispatcher BC with native ops (#163683 ) Part 1 of plan in https://docs.google.com/document/d/1MaX51H5aEQE5XnOlnZIpf9oCYwzGrTWkgBACxNzsmWE/edit?usp=sharing - Upgrade `aoti_torch_call_dispatcher` to v2 with an `extension_build_version` - Allow registration of StableIValue stack --> IValue stack adapters for schema changes #### Note: This PR does not include a linter that tells the user to add the upgrader if the schema changes, which is an important piece that will be added in a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/163683 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356, #166373	2025-10-29 15:41:45 +00:00
Mikayla Gawarecki	c0bbda37e8	Move static from_ivalue/to_ivalue to new shim_common.cpp (#166373 ) Move `from_ivalue` and `to_ivalue` and their dependents `StableIValueBoxedKernel`, `aoti_torch_library_impl` `aoti_torch_call_dispatcher` into new (non-aoti shim_common.cpp) This is in prep for the above PRs where I add v2s (`torch_call_dispatcher` and `torch_library_impl`) that are versioning aware Pull Request resolved: https://github.com/pytorch/pytorch/pull/166373 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356	2025-10-29 15:41:36 +00:00
Mikayla Gawarecki	fefb546b91	Add TORCH_TARGET_VERSION for stable ABI (#164356 ) And update it so comparisons can be done by the preprocessor Note: We also need to gate in shim.h and figure out how to enforce this Differential Revision: [D85683549](https://our.internmc.facebook.com/intern/diff/D85683549) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164356 Approved by: https://github.com/janeyx99	2025-10-29 15:41:28 +00:00
Michael Lazos	cde81e92b9	[User-streams] Make torch.Event weakref compatible (#164522 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522 Approved by: https://github.com/williamwen42 ghstack dependencies: #162903, #164343, #164344, #164507, #162901, #164304	2025-10-29 04:57:23 +00:00
Michael Lazos	bfc2050db9	[user-streams] Make device-agnostic streams weakref compatible (#164304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304 Approved by: https://github.com/williamwen42, https://github.com/colesbury ghstack dependencies: #162903, #164343, #164344, #164507, #162901	2025-10-29 04:57:23 +00:00
Yu, Guangye	753d9bd806	Introduce a new API torch.xpu.set_per_process_memory_fraction (#165510 ) # Motivation Aligned with other backends, this PR introduces a new API `torch.xpu.set_per_process_memory_fraction` to allow user to customize the allowed memory per a single process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165510 Approved by: https://github.com/EikanWang, https://github.com/ezyang ghstack dependencies: #165508, #165509	2025-10-29 03:24:52 +00:00
Yingji Zhang	17bdb232e1	[GR v0] AOTI Enablement - Fix GR model AOTI inplace update by skipping empty named (#165970 ) (#166037 ) Summary: Add a gflag to allow us skip empty constant named parameter during dense loading. In [vm_parameters.py](https://fburl.com/code/7xr9ihwy), there is a constant _empty_tensor parameter used for the model. This constant parameter is skipped in XL weights during model publish because it is empty. This will break model inplace update later because it will be reported by the AOTI container but cannot be found from the model merge weights. This diff will allow us to solve the problem. Test Plan: Verified inplace update in job https://www.internalfb.com/vanguard/serving_test_cases/1165842932095688 Reviewed By: muchulee8, joannec3634 Differential Revision: D85082330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166037 Approved by: https://github.com/muchulee8, https://github.com/jcwchen	2025-10-28 01:50:36 +00:00
Scott Wolchok	7d16fcf2df	Re-re-re-re-apply "C++-accessible Placements via pybind11 (#163030 )" (#166132 ) Was reverted (again!) due to a merge conflict that crept in sometime during the "export to github -> land internally -> merge on github" process. D85096233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166132 Approved by: https://github.com/Skylion007, https://github.com/ezyang, https://github.com/malfet	2025-10-27 21:19:32 +00:00
Weinan Liu	fa4cb91846	add support for ir scalar literal parsing for inf/-inf/True/False (#163924 ) Currently the ir parser doesn't support parse ir like ``` graph(): %12 : float = prim::Constant[value=-inf]() %13 : float = prim::Constant[value=inf]() %14 : bool = prim::Constant[value=True]() %15 : bool = prim::Constant[value=False]() return (%12) ``` So the python script below will throw error. ``` #!/bin/env python import torch def test(): return [True, False] f = torch.jit.script(test) torch._C._jit_pass_constant_propagation(f.graph) ts_str = f.graph.__repr__() print(ts_str) ts = torch.parse_ir(ts_str) func = torch._C._create_function_from_graph("forward", ts) ret = func() assert ret == [True, False] def test(): return [float("inf"), float("-inf")] f = torch.jit.script(test) torch._C._jit_pass_constant_propagation(f.graph) ts_str = f.graph.__repr__() print(ts_str) ts = torch.parse_ir(ts_str) func = torch._C._create_function_from_graph("forward", ts) ret = func() assert ret == [float("inf"), float("-inf")] ``` I add "inf" and bool cases for IRParser::parseScalarLiteral in irparser.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163924 Approved by: https://github.com/ezyang	2025-10-27 05:10:21 +00:00
Chang Pan	74e53d0761	[TorchScript] clearer debug for ConcreteModuleType::findSubmoduleConcreteType (#166192 ) Summary: right now the log is just ``` RuntimeError: it != data_.modules_.end() INTERNAL ASSERT FAILED at "fbcode/caffe2/torch/csrc/jit/frontend/concrete_module_type.cpp":207, please report a bug to PyTorch. ``` we have no clue where the error happens https://fb.workplace.com/groups/gpuinference/posts/789257990578348/?comment_id=789284783909002&reply_comment_id=789415260562621 Test Plan: UT Reviewed By: jcwchen Differential Revision: D80020093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166192 Approved by: https://github.com/gmagogsfm	2025-10-25 14:07:54 +00:00
Jason Ansel	78bcfcf870	[fx] Optimize torch.fx.Node.replace_all_uses_with (#165889 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165889 Approved by: https://github.com/aorenste	2025-10-25 03:44:41 +00:00
Natalia Gimelshein	2efcf3ca98	Reverts #163712 and forces allgather/scatter inputs/outputs to be contiguous (#166181 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/166181 Approved by: https://github.com/kwen2501	2025-10-25 02:43:10 +00:00
Jane Xu	cddd5f74ab	Hide stable Library structs instead of using anon namespace (#166078 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166078 Approved by: https://github.com/malfet ghstack dependencies: #166076, #166077	2025-10-25 00:18:26 +00:00
Jane Xu	dfdb68e51f	Hide all APIs in torch::stable (#166077 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166077 Approved by: https://github.com/malfet ghstack dependencies: #166076	2025-10-25 00:18:26 +00:00
PyTorch MergeBot	75b8295868	Revert "Warn if AccumulateGrad stream does not match producer node stream (#165065 )" This reverts commit `12f742941d`. Reverted https://github.com/pytorch/pytorch/pull/165065 on behalf of https://github.com/clee2000 due to broke internal builds D85273204 usages of TORCH_API void add need to be updated? ([comment](https://github.com/pytorch/pytorch/pull/165065#issuecomment-3438061854))	2025-10-23 17:02:49 +00:00
Eddie Yan	e64a814ae7	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/kwen2501 Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-10-22 21:38:52 +00:00
soulitzer	12f742941d	Warn if AccumulateGrad stream does not match producer node stream (#165065 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165065 Approved by: https://github.com/ngimel	2025-10-22 17:33:27 +00:00
zhudada	2998abd777	[Code Clean] Better error handling in torch/csrc/distributed (#165053 ) Replace the runtime_error of the vallina C++ exceptions with TORCH_CEHCK Including: torch/csrc/distributed/* fix partialy #148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165053 Approved by: https://github.com/FFFrog, https://github.com/albanD	2025-10-22 01:40:36 +00:00
Han Chao	a1005427bf	[xpu] Support high stream for ProcessGroupXCCL (#163049 ) Add high priority stream support for ProcessGroupXCCL. Just like CUDA, XPU streams also support execution with higher priority compared to other streams. Implementation in https://github.com/intel/torch-xpu-ops/pull/1715, add register here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163049 Approved by: https://github.com/guangyey, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD	2025-10-22 00:54:25 +00:00
Zhaoqi Zhu	04adfe5ba9	Make Backend::setGroupUid virtual (#165957 ) As titled, so that we may customize this function in custom backends Pull Request resolved: https://github.com/pytorch/pytorch/pull/165957 Approved by: https://github.com/d4l3k	2025-10-21 21:33:24 +00:00
PyTorch MergeBot	ad4dc52bf6	Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )" This reverts commit `4e643422f6`. Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/albanD due to Breaks lint ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3429426503))	2025-10-21 20:24:14 +00:00
Bruce Chang	4e643422f6	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/kwen2501	2025-10-21 19:47:33 +00:00
Jason Ansel	3c3b278872	[reland][fx] Move Node._prepend/Node._remove_from_list to C++ (#165882 ) Relands #148261 that was reverted by #150542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165882 Approved by: https://github.com/ezyang	2025-10-21 19:43:55 +00:00
Gufan Yin	e6ba4d0725	Back out "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )" (#165910 ) Summary: Original commit changeset: d6d62d0c96dd Original Phabricator Diff: D84468451 and D84613184 D84468451 caused CUDA OutOfMemoryError in model. Test Plan: D84468451 was found through bisect. Also double checked on recent trunk 9866939225248c2adc307be7a804b26db0b9b555: f815887517 With this diff that backs out D84468451 and D84613184 : f816114560 Differential Revision: D85025378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165910 Approved by: https://github.com/clee2000	2025-10-21 16:36:38 +00:00
lichuyang	8b3dc0d1b0	Better error handling in torch/csrc/jit/runtime/* (#165118 ) Refactor error handling by using TORCH_CHECK for improved clarity in constants and scope management in some files in torch/csrc/jit/runtime/* Fixes some parts of ISSUE https://github.com/pytorch/pytorch/issues/148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165118 Approved by: https://github.com/FFFrog, https://github.com/albanD	2025-10-21 15:22:49 +00:00
lichuyang	410e6a4321	Better error handling in torch/csrc/jit/frontend/* (#165213 ) Refactor error handling by using TORCH_CHECK for improved clarity in constants and scope management in some files in torch/csrc/jit/frontend/* Fixes some parts of ISSUE https://github.com/pytorch/pytorch/issues/148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165213 Approved by: https://github.com/FFFrog, https://github.com/albanD	2025-10-21 13:54:59 +00:00

1 2 3 4 5 ...

16427 Commits