pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
Ruben Rodriguez Buchillon	e380028a51	[inductor][choices] lookup table choices 1/3 (#164978 ) \# why - enable users to control which choices get used on which inputs - reduce lowering time, and pin kernel selection, by selecting them for the inputs \# what - a new InductorChoices subclass that implements a lookup table - a README explaining the usage - corresponding testing - currently only supports templates that go through `V.choices.get_template_configs` \# testing ``` python3 -bb -m pytest test/inductor/test_lookup_table.py -v ``` Differential Revision: [D85685743](https://our.internmc.facebook.com/intern/diff/D85685743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164978 Approved by: https://github.com/PaulZhang12, https://github.com/eellison, https://github.com/mlazos	2025-10-30 01:28:01 +00:00
Colin L Reliability Rice	b4403bfc62	Add waitcounters for torch.compile subprocess pool (#164527 ) Summary: This ads waitcounter for whether or not the pool is running, as well as if we are running jobs. This also ads waitcounters for the first job within a pool. First job and running are working correctly. The job waitcounter seems to either be detecting a leak of a job, or is broken subtly. Test Plan: We've tested this internally and see valid ods metrics. Note that we may be leaking jobs, or the job logic may not be handling an exception correctly. Differential Revision: D83705931 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164527 Approved by: https://github.com/masnesral	2025-10-30 01:15:26 +00:00
Jeff Daily	12c12466b0	[ROCm][CI] remove amdgpu from install_rocm.sh (#166575 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166575 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-30 01:08:33 +00:00
Sherlock Huang	f4d05feb7a	Repro dynamo issue for union typed annotation (#166443 ) when nested function has type annotation using "\|", it fails. it works fine with `Union[torch.Tensor, DTensor]` tho. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166443 Approved by: https://github.com/anijain2305	2025-10-30 01:05:15 +00:00
Pian Pawakapan	7481622237	[symbolic shapes] remove maybe_guard_rel warning (#166553 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/166553 Approved by: https://github.com/laithsakka	2025-10-30 00:57:28 +00:00
Laith Sakka	b2a0f90501	Fix comparing inductor actual strides vs bw graph for activations should not throw DDE. (#166277 ) Fix https://github.com/pytorch/pytorch/issues/163894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166277 Approved by: https://github.com/Lucaskabela	2025-10-30 00:34:05 +00:00
eellison	14d4a77495	disable current modes instead of no dispatch in estimation (#166571 ) otherwise, the custom estimation's TorchDispatchModes will be disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166571 Approved by: https://github.com/SherlockNoMad, https://github.com/bdhirsh	2025-10-29 23:24:41 +00:00
Ivan Zaitsev	3d4ca228be	Remove METADATA.bzl files (#166574 ) (meta-internal, should not be synced) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166574 Approved by: https://github.com/bigfootjon	2025-10-29 23:17:41 +00:00
eellison	c3d205d598	helper function for replacing nodes in aug graph (#166309 ) When we do bucketing, we replace starts and waits with new nodes. This pr adds a helper to transfer the augmented graph additional deps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166309 Approved by: https://github.com/IvanKobzarev	2025-10-29 23:08:33 +00:00
Michael Lazos	c54e2c5b41	[User-streams] Make torch.Event weakref compatible (#164522 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522 Approved by: https://github.com/williamwen42 ghstack dependencies: #164304	2025-10-29 23:06:31 +00:00
Michael Lazos	c3047938a0	[user-streams] Make device-agnostic streams weakref compatible (#164304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304 Approved by: https://github.com/williamwen42, https://github.com/colesbury	2025-10-29 23:06:31 +00:00
Shangdi Yu	d2eff5d454	Add python stack trace to AOTI generated code (#160539 ) Summary: We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel. This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag. Example output code: ```cpp #include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h> namespace torch::aot_inductor { thread_local KernelContext* tls_kernel_context = nullptr; } // Other code ..... void AOTInductorModel::run_impl( AtenTensorHandle* input_handles, // array of input AtenTensorHandle; handles // are stolen; the array itself is borrowed AtenTensorHandle* output_handles, // array for writing output AtenTensorHandle; handles // will be stolen by the caller; the array itself is // borrowed DeviceStreamType stream, AOTIProxyExecutorHandle proxy_executor ) { __check_inputs_outputs(input_handles, output_handles); auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4); auto arg2_1 = std::move(inputs[0]); auto arg3_1 = std::move(inputs[1]); auto arg4_1 = std::move(inputs[2]); auto arg5_1 = std::move(inputs[3]); [[maybe_unused]] auto& fc1_weight = constants_->at(0); [[maybe_unused]] auto& fc1_bias = constants_->at(1); inputs.clear(); [[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(this->kernels_.get()); static constexpr int64_t int_array_0[] = {8L, 16L}; static constexpr int64_t int_array_1[] = {16L, 1L}; AtenTensorHandle buf0_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle)); RAIIAtenTensorHandle buf0(buf0_handle); // Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:1 static constexpr int64_t int_array_2[] = {10L, 16L}; static constexpr int64_t int_array_3[] = {1L, 10L}; { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward x = self.fc1(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward return F.linear(input, self.weight, self.bias) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L)); } arg2_1.reset(); auto buf1 = std::move(buf0); // reuse static constexpr int64_t int_array_4[] = {10L, 20L}; static constexpr int64_t int_array_5[] = {20L, 1L}; AtenTensorHandle buf2_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle)); RAIIAtenTensorHandle buf2(buf2_handle); // [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2 { KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward x = self.sigmoid(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward return torch.sigmoid(input) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward x = self.relu(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward return F.relu(input, inplace=self.inplace) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward d = a 3.14 )"); cpp_fused_mul_relu_sigmoid_0((float)(buf1.data_ptr()), (const float)(arg3_1.data_ptr()), (float)(buf2.data_ptr())); } arg3_1.reset(); static constexpr int64_t int_array_6[] = {10L, 30L}; static constexpr int64_t int_array_7[] = {30L, 1L}; AtenTensorHandle buf3_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle)); RAIIAtenTensorHandle buf3(buf3_handle); // Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:3 { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward y = torch.addmm(c, d, b) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L)); } arg4_1.reset(); arg5_1.reset(); buf2.reset(); auto buf4 = std::move(buf3); // reuse // [Provenance debug handles] cpp_fused_gelu_1:4 { KernelContextGuard _ctx("cpp_fused_gelu_1", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward z = torch.nn.functional.gelu(y) )"); cpp_fused_gelu_1((float)(buf4.data_ptr())); } output_handles[0] = buf1.release(); output_handles[1] = buf4.release(); } // AOTInductorModel::run_impl ``` Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D78436007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539 Approved by: https://github.com/yiming0416	2025-10-29 22:47:52 +00:00
PyTorch MergeBot	972030fe2e	Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 )" This reverts commit `284716a691`. Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal torchrec test' ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3464647878))	2025-10-29 22:46:48 +00:00
Jeff Daily	d401e4e70a	[ROCm][CUDA] add unit test utility busy_wait_for_flag (#166218 ) torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These must be run on separate streams or it will deadlock. When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy. Fixes #120318. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-29 22:40:23 +00:00
Mikayla Gawarecki	f1a3440715	FC/BC policy for libtorch stable ABI (#163991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163991 Approved by: https://github.com/janeyx99 ghstack dependencies: #163899	2025-10-29 22:35:36 +00:00
Andrey Talman	82ff07c788	Add py 3.14 CI docker build pytorch-linux-jammy-py3.14-clang12 (#164791 ) Related to https://github.com/pytorch/pytorch/issues/156856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164791 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/albanD	2025-10-29 22:21:22 +00:00
Rob Timpe	e0604d3170	[dynamo] Fix ListIterator tracking mutations to original list (#166350 ) Currently ListIteratorVariable copies the underlying list, which prevents it from seeing mutations to the original list. Remove the copy to match cpython behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166350 Approved by: https://github.com/guilhermeleobas ghstack dependencies: #166349, #162768	2025-10-29 21:54:37 +00:00
Rob Timpe	8101fd46d4	[dynamo] Implement iter with a polyfill (#162768 ) Currently most variable trackers implement `iter` via `_call_iter_tuple_list`. This makes it difficult to customize the behavior of `iter` for different variable types. Instead, implement `iter` via a polyfill, which will delegate to the appropriate `__iter__` method. While this method is more flexible, it increases the overhead of dynamo tracing. For example, `iter(x)` will generate 9x more instructions than the current implementation for common iterable types. Microbenchmarking shows a ~6x slowdown for this operation. I suspect this would be much less for realistic workloads, but more work would be needed to get specific numbers. If the performance is a concern we could also consider adding a fast path for types that are known to correctly implement `__iter__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162768 Approved by: https://github.com/guilhermeleobas ghstack dependencies: #166349	2025-10-29 21:54:37 +00:00
Rob Timpe	3d4a2d8a93	[dynamo] Add __iter__ for iterable VariableTrackers (#166349 ) This is in preparation for implementing iter with a polyfill Pull Request resolved: https://github.com/pytorch/pytorch/pull/166349 Approved by: https://github.com/guilhermeleobas	2025-10-29 21:54:37 +00:00
Camyll Harajli	59ddfb69a7	[cpu/gpu split] (#165696 ) Summary: cpu/gpu split. cuda is default due to some downstream targets configurations. Test Plan: test in CI Differential Revision: D80712802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165696 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/atalman	2025-10-29 21:44:52 +00:00
Boyuan Feng	bebabd7fce	[Graph Partition] move custom rules to inductor config (#166458 ) This PR adds `custom_should_partition_ops: list[str]` to specify the name of custom ops upon which graph partition happens. It works with cache since it is a `list[str]` in the config file. The op name should be of format "mylib::baz". Close: #165341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166458 Approved by: https://github.com/ProExpertProg, https://github.com/eellison, https://github.com/zou3519	2025-10-29 21:43:58 +00:00
Sean McGovern	56a809aa07	[DTensor] Fix torch.all() using incorrect reduction operator (#165924 ) Fixes #165923 Corrects the reduction operation to be product. Enables "all" in the boolean tensor tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165924 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-10-29 20:58:35 +00:00
Yuanyuan Chen	b33762bd2f	Fix incomplete test_memory_plots_metadata (#166508 ) The different context cases were not fully tested before this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166508 Approved by: https://github.com/Skylion007	2025-10-29 20:55:00 +00:00
fduwjj	f02708c2be	[DeviceMesh] Remove slicing submesh warning messages and clean up in fsdp params (#166466 ) Differential Revision: [D85735294](https://our.internmc.facebook.com/intern/diff/D85735294) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166466 Approved by: https://github.com/fegin	2025-10-29 20:52:49 +00:00
Justin Chu	a186aa8d6c	[ONNX] Change stacklevel in warning message for export (#166558 ) Change to 3 so that the warning shows user callsite. (Where user calls torch.onnx.export) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166558 Approved by: https://github.com/titaiwangms	2025-10-29 20:45:25 +00:00
Tushar Jain	48c3b71ecc	transform fr traces for ft (#166149 ) Summary: - the ranks in the default pg config are local ranks - however fr trace analysis requires them to be global ranks - so we transform the local ranks to global ranks before the analysis kicks in based on a cli flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/166149 Approved by: https://github.com/fduwjj	2025-10-29 20:44:48 +00:00
Nikita Shulga	2c9f877fa7	Revert "[PyTorch] Improve aarch64 performance of bfloat16 ops (#166028 )" This reverts commit `3e77a2b478`. Otherwise it fails ARM build with older compilers with errors that looks as follows: ``` vec128_bfloat16_neon.h:666:12: error: operation not permitted on type ‘bfloat16_t’ 666 \| return (-x) * y - z; ``` For more self-contained example see https://godbolt.org/z/bbY4xWh45 (that compiles the same code using clang-15 and clang-19)	2025-10-29 13:35:59 -07:00
Tushar Jain	fc540cefd4	set pg name based on ranks (#166182 ) Summary: - in torchft we have multiple default pg's, 1 for each task group - for flight recorder to work, each of these need to have a different name, so entries can be matched - change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group Pull Request resolved: https://github.com/pytorch/pytorch/pull/166182 Approved by: https://github.com/fduwjj	2025-10-29 20:13:48 +00:00
Maggie Moss	d1a6e006e0	Fix syntax for pyrefly errors (#166496 ) Last one! This ensures all existing suppressions match the syntax expected and will silence only one error code pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166496 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-10-29 20:00:25 +00:00
Rohit Singh Rathaur	fa560e1158	[ao][pruning] Replace assert statements with AssertionError exceptions (#164926 ) Replace assert statement with explicit ValueError exception to ensure the validation check is not removed when Python runs with optimization flag (-O). This is a draft PR to confirm the process. Fixes partially #164878. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164926 Approved by: https://github.com/fffrog, https://github.com/albanD Co-authored-by: Jiawei Li <ljw1101.vip@gmail.com>	2025-10-29 17:46:46 +00:00
Yuanyuan Chen	a3fe1825aa	Fix incomplete torch.cdist tests (#166507 ) Because the `p` value is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166507 Approved by: https://github.com/Skylion007	2025-10-29 17:11:07 +00:00
rraminen	deb776319b	[ROCm] Reduce duplication in bfloat16_support_literal definition (#166147 ) This PR refactors the bfloat16_support_literal constant in the PyTorch build logic to eliminate duplicated ROCm-specific code. Previously, there were two nearly identical branches for ROCM_VERSION < 70000 and ROCM_VERSION >= 70000, differing only by a single typedef. These have been unified into one conditional block with a minimal version guard inside. (https://github.com/ROCm/pytorch/pull/2502) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166147 Approved by: https://github.com/jerrymannil, https://github.com/jeffdaily	2025-10-29 16:59:03 +00:00
PyTorch MergeBot	d7040e6d75	Revert "[dynamo][guards] 1/N Guard selectively for DTensor (#165824 )" This reverts commit `ee7434be82`. Reverted https://github.com/pytorch/pytorch/pull/165824 on behalf of https://github.com/anijain2305 due to internal job failed ([comment](https://github.com/pytorch/pytorch/pull/165824#issuecomment-3462667536))	2025-10-29 16:52:31 +00:00
PyTorch MergeBot	35f3572fa4	Revert "[ROCm] Enable group gemm through CK (#166334 )" This reverts commit `1fa520ea65`. Reverted https://github.com/pytorch/pytorch/pull/166334 on behalf of https://github.com/atalman due to Internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166334#issuecomment-3462640668))	2025-10-29 16:45:02 +00:00
anwang	bc5111cd8d	[Inductor] Prevent kernel fusion with too many unique inputs and outputs (#166275 ) MTIA triton currently has a limit that it can't support the cases when there are too many input/output buffers. This PR adds the limitation to prevent large fusion with many input/output buffer. Differential Revision: [D85509351](https://our.internmc.facebook.com/intern/diff/D85509351/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166275 Approved by: https://github.com/eellison ghstack dependencies: #166274	2025-10-29 16:41:34 +00:00
Millie Chen	398fdd32bb	[Inductor] Lower fallback nodes annotated with "should_fallback" (#166339 ) Summary: This PR introduces an inductor-level fallback mechanism that gives users control over which operations or subgraphs Inductor should lower and which should fall back to preexisting kernels. This has similar motivation as #164776 in providing flexibility to selectively disable Inductor lowering for specific nodes. The implementation simply adds a check for the `"should_fallback"` metadata annotation on FX graph nodes. If this is set to `True`, the lowerer falls back before attempting the normal lowering path. Note that since these are user-directed fallbacks dependent upon specific, customized conditions, use `add_to_fallback_set=False` to avoid permanent overwrites of inductor's lowering/fallback rules. Simple example marking nodes for fallback based on custom predicates: ``` def should_fallback_predicate(node: torch.fx.Node, pred: Callable[torch.fx.Node, bool]): # Apply predicate and mark for fallback if needed if self.predicate(node): node.meta["should_fallback"] = True ``` Test Plan: added a CI test Differential Revision: D85347587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166339 Approved by: https://github.com/blaine-rister, https://github.com/eellison	2025-10-29 16:33:55 +00:00
PyTorch MergeBot	5fd1d41e62	Revert "[user-streams] Make device-agnostic streams weakref compatible (#164304 )" This reverts commit `bfc2050db9`. Reverted https://github.com/pytorch/pytorch/pull/164304 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605) [HUD commit link](`cde81e92b9`) ([comment](https://github.com/pytorch/pytorch/pull/164304#issuecomment-3462489278))	2025-10-29 16:09:54 +00:00
PyTorch MergeBot	c594950e86	Revert "`nn.Linear`: nD contiguous input + bias -- dispatch to addmm also when weight is sparse (#166071 )" This reverts commit `467c21ad9a`. Reverted https://github.com/pytorch/pytorch/pull/166071 on behalf of https://github.com/atalman due to Multiple CI breakages: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909087335/job/53976915830) [HUD commit link](`467c21ad9a`) ([comment](https://github.com/pytorch/pytorch/pull/166071#issuecomment-3462458968))	2025-10-29 16:05:30 +00:00
Laith Sakka	14102fb1f3	add new line in log (#164240 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164240 Approved by: https://github.com/ColinPeppler, https://github.com/Skylion007, https://github.com/ezyang ghstack dependencies: #164075	2025-10-29 16:03:32 +00:00
PyTorch MergeBot	5cdbcb5233	Revert "[User-streams] Make torch.Event weakref compatible (#164522 )" This reverts commit `cde81e92b9`. Reverted https://github.com/pytorch/pytorch/pull/164522 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605) [HUD commit link](`cde81e92b9`) ([comment](https://github.com/pytorch/pytorch/pull/164522#issuecomment-3462450571))	2025-10-29 16:03:03 +00:00
Mikayla Gawarecki	eae701cad0	Add scaffolding for StableIValue FC/BC (no PoC) (#164332 ) 1. Add `extension_build_version` and `is_internal` to `FromImpl`/`ToImpl` (this will be useful for future if we need to break the BC of any type) #163832 has the PoC of how we would actually use this system 2. Add `aoti_torch_library_impl_v2` that takes in an additional `extension_build_version` argument, updates callsite in `torch/csrc/stable/library.h` to always pass `TORCH_ABI_VERSION` for this argument 3. Add `extension_build_version` to `from_ivalue` and `to_ivalue` and update all callsites 4. Add a private `_from` and `_to` that pass `is_internal=True` to `FromImpl`/`ToImpl`, making it easier to reason about what is being called from libtorch-land / extension-land Note: This PR does not include a linter that tells the user to update from/to if changing the ABI of a type in headeronly, which I intend to do in https://github.com/pytorch/pytorch/pull/163998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164332 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356, #166373, #163683	2025-10-29 15:41:45 +00:00
Mikayla Gawarecki	8f51556daa	Add scaffolding for aoti_torch_call_dispatcher BC with native ops (#163683 ) Part 1 of plan in https://docs.google.com/document/d/1MaX51H5aEQE5XnOlnZIpf9oCYwzGrTWkgBACxNzsmWE/edit?usp=sharing - Upgrade `aoti_torch_call_dispatcher` to v2 with an `extension_build_version` - Allow registration of StableIValue stack --> IValue stack adapters for schema changes #### Note: This PR does not include a linter that tells the user to add the upgrader if the schema changes, which is an important piece that will be added in a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/163683 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356, #166373	2025-10-29 15:41:45 +00:00
Mikayla Gawarecki	c0bbda37e8	Move static from_ivalue/to_ivalue to new shim_common.cpp (#166373 ) Move `from_ivalue` and `to_ivalue` and their dependents `StableIValueBoxedKernel`, `aoti_torch_library_impl` `aoti_torch_call_dispatcher` into new (non-aoti shim_common.cpp) This is in prep for the above PRs where I add v2s (`torch_call_dispatcher` and `torch_library_impl`) that are versioning aware Pull Request resolved: https://github.com/pytorch/pytorch/pull/166373 Approved by: https://github.com/janeyx99 ghstack dependencies: #164356	2025-10-29 15:41:36 +00:00
Mikayla Gawarecki	fefb546b91	Add TORCH_TARGET_VERSION for stable ABI (#164356 ) And update it so comparisons can be done by the preprocessor Note: We also need to gate in shim.h and figure out how to enforce this Differential Revision: [D85683549](https://our.internmc.facebook.com/intern/diff/D85683549) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164356 Approved by: https://github.com/janeyx99	2025-10-29 15:41:28 +00:00
PyTorch MergeBot	d6d6fa26f5	Revert "bwd pass (#164504 )" This reverts commit `f36f372acc`. Reverted https://github.com/pytorch/pytorch/pull/164504 on behalf of https://github.com/jeffdaily due to CI had been clean for both cuda and rocm before merge, broke post merge? ([comment](https://github.com/pytorch/pytorch/pull/164504#issuecomment-3462116676))	2025-10-29 15:10:40 +00:00
Nikita Vedeneev	467c21ad9a	`nn.Linear`: nD contiguous input + bias -- dispatch to addmm also when weight is sparse (#166071 ) As per title. It seems safe to be able to generalize to arbitrary contiguous inputs since `at::matmul` is likely to do the flattening to avoid `baddmm`. Additionally, we guard for bias to be 1D and contiguous which is guaranteed to be fused with no copies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166071 Approved by: https://github.com/ngimel	2025-10-29 13:13:40 +00:00
Way Wang	4a94591321	filter out alloc-free pairs from trace plot (#165752 ) Summary: When dealing with a large memory trace, the resulting plot can be challenging to interpret and analyze. This commit introduces a feature that enables filtering of allocations that have already been freed, providing a more focused view. The remaining events in the plot often warrant closer examination, as they may be indicative of potential out-of-memory (OOM) issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165752 Approved by: https://github.com/zdevito	2025-10-29 12:44:54 +00:00
PyTorch MergeBot	5e7272b60a	Revert "[BE] Move GreenContext implementation details to cpp (#166462 )" This reverts commit `afaaaa314c`. Reverted https://github.com/pytorch/pytorch/pull/166462 on behalf of https://github.com/atalman due to multiple internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166462#issuecomment-3461145801))	2025-10-29 11:59:41 +00:00
PyTorch MergeBot	1dd6b76914	Revert "[1/N] Remove unused loop variables (#166258 )" This reverts commit `76b2c37045`. Reverted https://github.com/pytorch/pytorch/pull/166258 on behalf of https://github.com/atalman due to breaks test/distributed/test_serialization.py::TestSerialization::test_weights_only [GH job link](https://github.com/pytorch/pytorch/actions/runs/18894311802/job/53929321703) [HUD commit link](`76b2c37045`) ([comment](https://github.com/pytorch/pytorch/pull/166258#issuecomment-3460964612))	2025-10-29 11:10:37 +00:00
Xuehai Pan	284716a691	[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 ) The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class. Changes: 1. Add function `treespec_leaf()` to replace `LeafSpec()`. 2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `args` / `*kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class. 3. Change `len(spec.children_specs)` to `spec.num_children`. 4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843 Approved by: https://github.com/mlazos	2025-10-29 09:16:24 +00:00

... 2 3 4 5 6 ...

95310 Commits