Ruben Rodriguez Buchillon
e380028a51
[inductor][choices] lookup table choices 1/3 ( #164978 )
...
\# why
- enable users to control which choices get used on which inputs
- reduce lowering time, and pin kernel selection, by selecting
them for the inputs
\# what
- a new InductorChoices subclass that implements a lookup table
- a README explaining the usage
- corresponding testing
- currently only supports templates that go through
`V.choices.get_template_configs`
\# testing
```
python3 -bb -m pytest test/inductor/test_lookup_table.py -v
```
Differential Revision: [D85685743](https://our.internmc.facebook.com/intern/diff/D85685743 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164978
Approved by: https://github.com/PaulZhang12 , https://github.com/eellison , https://github.com/mlazos
2025-10-30 01:28:01 +00:00
Colin L Reliability Rice
b4403bfc62
Add waitcounters for torch.compile subprocess pool ( #164527 )
...
Summary:
This ads waitcounter for whether or not the pool is running, as well as if we
are running jobs.
This also ads waitcounters for the first job within a pool. First job and running are working correctly. The job waitcounter seems to either be detecting a leak of a job, or is broken subtly.
Test Plan:
We've tested this internally and see valid ods metrics.
Note that we may be leaking jobs, or the job logic may not be handling an exception correctly.
Differential Revision: D83705931
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164527
Approved by: https://github.com/masnesral
2025-10-30 01:15:26 +00:00
Jeff Daily
12c12466b0
[ROCm][CI] remove amdgpu from install_rocm.sh ( #166575 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166575
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-30 01:08:33 +00:00
Sherlock Huang
f4d05feb7a
Repro dynamo issue for union typed annotation ( #166443 )
...
when nested function has type annotation using "|", it fails.
it works fine with `Union[torch.Tensor, DTensor]` tho.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166443
Approved by: https://github.com/anijain2305
2025-10-30 01:05:15 +00:00
Pian Pawakapan
7481622237
[symbolic shapes] remove maybe_guard_rel warning ( #166553 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166553
Approved by: https://github.com/laithsakka
2025-10-30 00:57:28 +00:00
Laith Sakka
b2a0f90501
Fix comparing inductor actual strides vs bw graph for activations should not throw DDE. ( #166277 )
...
Fix https://github.com/pytorch/pytorch/issues/163894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166277
Approved by: https://github.com/Lucaskabela
2025-10-30 00:34:05 +00:00
eellison
14d4a77495
disable current modes instead of no dispatch in estimation ( #166571 )
...
otherwise, the custom estimation's TorchDispatchModes will be disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166571
Approved by: https://github.com/SherlockNoMad , https://github.com/bdhirsh
2025-10-29 23:24:41 +00:00
Ivan Zaitsev
3d4ca228be
Remove METADATA.bzl files ( #166574 )
...
(meta-internal, should not be synced)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166574
Approved by: https://github.com/bigfootjon
2025-10-29 23:17:41 +00:00
eellison
c3d205d598
helper function for replacing nodes in aug graph ( #166309 )
...
When we do bucketing, we replace starts and waits with new nodes. This pr adds a helper to transfer the augmented graph additional deps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166309
Approved by: https://github.com/IvanKobzarev
2025-10-29 23:08:33 +00:00
Michael Lazos
c54e2c5b41
[User-streams] Make torch.Event weakref compatible ( #164522 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522
Approved by: https://github.com/williamwen42
ghstack dependencies: #164304
2025-10-29 23:06:31 +00:00
Michael Lazos
c3047938a0
[user-streams] Make device-agnostic streams weakref compatible ( #164304 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304
Approved by: https://github.com/williamwen42 , https://github.com/colesbury
2025-10-29 23:06:31 +00:00
Shangdi Yu
d2eff5d454
Add python stack trace to AOTI generated code ( #160539 )
...
Summary:
We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel.
This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag.
Example output code:
```cpp
#include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h>
namespace torch::aot_inductor {
thread_local KernelContext* tls_kernel_context = nullptr;
}
// Other code .....
void AOTInductorModel::run_impl(
AtenTensorHandle*
input_handles, // array of input AtenTensorHandle; handles
// are stolen; the array itself is borrowed
AtenTensorHandle*
output_handles, // array for writing output AtenTensorHandle; handles
// will be stolen by the caller; the array itself is
// borrowed
DeviceStreamType stream,
AOTIProxyExecutorHandle proxy_executor
) {
__check_inputs_outputs(input_handles, output_handles);
auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4);
auto arg2_1 = std::move(inputs[0]);
auto arg3_1 = std::move(inputs[1]);
auto arg4_1 = std::move(inputs[2]);
auto arg5_1 = std::move(inputs[3]);
[[maybe_unused]] auto& fc1_weight = constants_->at(0);
[[maybe_unused]] auto& fc1_bias = constants_->at(1);
inputs.clear();
[[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
static constexpr int64_t int_array_0[] = {8L, 16L};
static constexpr int64_t int_array_1[] = {16L, 1L};
AtenTensorHandle buf0_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
RAIIAtenTensorHandle buf0(buf0_handle);
// Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:1
static constexpr int64_t int_array_2[] = {10L, 16L};
static constexpr int64_t int_array_3[] = {1L, 10L};
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward
x = self.fc1(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward
return F.linear(input, self.weight, self.bias)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L));
}
arg2_1.reset();
auto buf1 = std::move(buf0); // reuse
static constexpr int64_t int_array_4[] = {10L, 20L};
static constexpr int64_t int_array_5[] = {20L, 1L};
AtenTensorHandle buf2_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle));
RAIIAtenTensorHandle buf2(buf2_handle);
// [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2
{
KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward
x = self.sigmoid(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward
return torch.sigmoid(input)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward
x = self.relu(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward
return F.relu(input, inplace=self.inplace)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward
d = a * 3.14
)");
cpp_fused_mul_relu_sigmoid_0((float*)(buf1.data_ptr()), (const float*)(arg3_1.data_ptr()), (float*)(buf2.data_ptr()));
}
arg3_1.reset();
static constexpr int64_t int_array_6[] = {10L, 30L};
static constexpr int64_t int_array_7[] = {30L, 1L};
AtenTensorHandle buf3_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle));
RAIIAtenTensorHandle buf3(buf3_handle);
// Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:3
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward
y = torch.addmm(c, d, b)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L));
}
arg4_1.reset();
arg5_1.reset();
buf2.reset();
auto buf4 = std::move(buf3); // reuse
// [Provenance debug handles] cpp_fused_gelu_1:4
{
KernelContextGuard _ctx("cpp_fused_gelu_1", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward
z = torch.nn.functional.gelu(y)
)");
cpp_fused_gelu_1((float*)(buf4.data_ptr()));
}
output_handles[0] = buf1.release();
output_handles[1] = buf4.release();
} // AOTInductorModel::run_impl
```
Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces
```
Rollback Plan:
Differential Revision: D78436007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539
Approved by: https://github.com/yiming0416
2025-10-29 22:47:52 +00:00
PyTorch MergeBot
972030fe2e
Revert "[pytree] add treespec_{leaf,tuple,dict} functions for args_spec modification ( #160843 )"
...
This reverts commit 284716a691 .
Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal torchrec test' ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3464647878 ))
2025-10-29 22:46:48 +00:00
Jeff Daily
d401e4e70a
[ROCm][CUDA] add unit test utility busy_wait_for_flag ( #166218 )
...
torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These **must** be run on separate streams or it will deadlock.
When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy.
Fixes #120318 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-29 22:40:23 +00:00
Mikayla Gawarecki
f1a3440715
FC/BC policy for libtorch stable ABI ( #163991 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163991
Approved by: https://github.com/janeyx99
ghstack dependencies: #163899
2025-10-29 22:35:36 +00:00
Andrey Talman
82ff07c788
Add py 3.14 CI docker build pytorch-linux-jammy-py3.14-clang12 ( #164791 )
...
Related to https://github.com/pytorch/pytorch/issues/156856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164791
Approved by: https://github.com/huydhn , https://github.com/malfet , https://github.com/albanD
2025-10-29 22:21:22 +00:00
Rob Timpe
e0604d3170
[dynamo] Fix ListIterator tracking mutations to original list ( #166350 )
...
Currently ListIteratorVariable copies the underlying list, which prevents it
from seeing mutations to the original list. Remove the copy to match cpython behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166350
Approved by: https://github.com/guilhermeleobas
ghstack dependencies: #166349 , #162768
2025-10-29 21:54:37 +00:00
Rob Timpe
8101fd46d4
[dynamo] Implement iter with a polyfill ( #162768 )
...
Currently most variable trackers implement `iter` via `_call_iter_tuple_list`.
This makes it difficult to customize the behavior of `iter` for different
variable types. Instead, implement `iter` via a polyfill, which will delegate
to the appropriate `__iter__` method.
While this method is more flexible, it increases the overhead of dynamo tracing.
For example, `iter(x)` will generate 9x more instructions than the current
implementation for common iterable types. Microbenchmarking shows a ~6x
slowdown for this operation. I suspect this would be much less for realistic
workloads, but more work would be needed to get specific numbers. If the
performance is a concern we could also consider adding a fast path for types
that are known to correctly implement `__iter__`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162768
Approved by: https://github.com/guilhermeleobas
ghstack dependencies: #166349
2025-10-29 21:54:37 +00:00
Rob Timpe
3d4a2d8a93
[dynamo] Add __iter__ for iterable VariableTrackers ( #166349 )
...
This is in preparation for implementing iter with a polyfill
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166349
Approved by: https://github.com/guilhermeleobas
2025-10-29 21:54:37 +00:00
Camyll Harajli
59ddfb69a7
[cpu/gpu split] ( #165696 )
...
Summary: cpu/gpu split. cuda is default due to some downstream targets configurations.
Test Plan: test in CI
Differential Revision: D80712802
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165696
Approved by: https://github.com/jeffdaily , https://github.com/malfet , https://github.com/atalman
2025-10-29 21:44:52 +00:00
Boyuan Feng
bebabd7fce
[Graph Partition] move custom rules to inductor config ( #166458 )
...
This PR adds `custom_should_partition_ops: list[str]` to specify the name of custom ops upon which graph partition happens. It works with cache since it is a `list[str]` in the config file. The op name should be of format "mylib::baz".
Close : #165341
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166458
Approved by: https://github.com/ProExpertProg , https://github.com/eellison , https://github.com/zou3519
2025-10-29 21:43:58 +00:00
Sean McGovern
56a809aa07
[DTensor] Fix torch.all() using incorrect reduction operator ( #165924 )
...
Fixes #165923
Corrects the reduction operation to be product.
Enables "all" in the boolean tensor tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165924
Approved by: https://github.com/malfet , https://github.com/Skylion007
2025-10-29 20:58:35 +00:00
Yuanyuan Chen
b33762bd2f
Fix incomplete test_memory_plots_metadata ( #166508 )
...
The different context cases were not fully tested before this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166508
Approved by: https://github.com/Skylion007
2025-10-29 20:55:00 +00:00
fduwjj
f02708c2be
[DeviceMesh] Remove slicing submesh warning messages and clean up in fsdp params ( #166466 )
...
Differential Revision: [D85735294](https://our.internmc.facebook.com/intern/diff/D85735294 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166466
Approved by: https://github.com/fegin
2025-10-29 20:52:49 +00:00
Justin Chu
a186aa8d6c
[ONNX] Change stacklevel in warning message for export ( #166558 )
...
Change to 3 so that the warning shows user callsite. (Where user calls torch.onnx.export)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166558
Approved by: https://github.com/titaiwangms
2025-10-29 20:45:25 +00:00
Tushar Jain
48c3b71ecc
transform fr traces for ft ( #166149 )
...
Summary:
- the ranks in the default pg config are local ranks
- however fr trace analysis requires them to be global ranks
- so we transform the local ranks to global ranks before the analysis kicks in based on a cli flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166149
Approved by: https://github.com/fduwjj
2025-10-29 20:44:48 +00:00
Nikita Shulga
2c9f877fa7
Revert "[PyTorch] Improve aarch64 performance of bfloat16 ops ( #166028 )"
...
This reverts commit 3e77a2b478 .
Otherwise it fails ARM build with older compilers with errors that looks
as follows:
```
vec128_bfloat16_neon.h:666:12: error: operation not permitted on type ‘bfloat16_t’
666 | return (-x) * y - z;
```
For more self-contained example see https://godbolt.org/z/bbY4xWh45
(that compiles the same code using clang-15 and clang-19)
2025-10-29 13:35:59 -07:00
Tushar Jain
fc540cefd4
set pg name based on ranks ( #166182 )
...
Summary:
- in torchft we have multiple default pg's, 1 for each task group
- for flight recorder to work, each of these need to have a different name, so entries can be matched
- change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166182
Approved by: https://github.com/fduwjj
2025-10-29 20:13:48 +00:00
Maggie Moss
d1a6e006e0
Fix syntax for pyrefly errors ( #166496 )
...
Last one! This ensures all existing suppressions match the syntax expected and will silence only one error code
pyrefly check
lintrunner
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166496
Approved by: https://github.com/Skylion007 , https://github.com/mlazos
2025-10-29 20:00:25 +00:00
Rohit Singh Rathaur
fa560e1158
[ao][pruning] Replace assert statements with AssertionError exceptions ( #164926 )
...
Replace assert statement with explicit ValueError exception to ensure the validation check is not removed when Python runs with optimization flag (-O).
This is a draft PR to confirm the process.
Fixes partially #164878 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164926
Approved by: https://github.com/fffrog , https://github.com/albanD
Co-authored-by: Jiawei Li <ljw1101.vip@gmail.com>
2025-10-29 17:46:46 +00:00
Yuanyuan Chen
a3fe1825aa
Fix incomplete torch.cdist tests ( #166507 )
...
Because the `p` value is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166507
Approved by: https://github.com/Skylion007
2025-10-29 17:11:07 +00:00
rraminen
deb776319b
[ROCm] Reduce duplication in bfloat16_support_literal definition ( #166147 )
...
This PR refactors the bfloat16_support_literal constant in the PyTorch build logic to eliminate duplicated ROCm-specific code.
Previously, there were two nearly identical branches for ROCM_VERSION < 70000 and ROCM_VERSION >= 70000, differing only by a single typedef. These have been unified into one conditional block with a minimal version guard inside. (https://github.com/ROCm/pytorch/pull/2502 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166147
Approved by: https://github.com/jerrymannil , https://github.com/jeffdaily
2025-10-29 16:59:03 +00:00
PyTorch MergeBot
d7040e6d75
Revert "[dynamo][guards] 1/N Guard selectively for DTensor ( #165824 )"
...
This reverts commit ee7434be82 .
Reverted https://github.com/pytorch/pytorch/pull/165824 on behalf of https://github.com/anijain2305 due to internal job failed ([comment](https://github.com/pytorch/pytorch/pull/165824#issuecomment-3462667536 ))
2025-10-29 16:52:31 +00:00
PyTorch MergeBot
35f3572fa4
Revert "[ROCm] Enable group gemm through CK ( #166334 )"
...
This reverts commit 1fa520ea65 .
Reverted https://github.com/pytorch/pytorch/pull/166334 on behalf of https://github.com/atalman due to Internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166334#issuecomment-3462640668 ))
2025-10-29 16:45:02 +00:00
anwang
bc5111cd8d
[Inductor] Prevent kernel fusion with too many unique inputs and outputs ( #166275 )
...
MTIA triton currently has a limit that it can't support the cases when there are too many input/output buffers. This PR adds the limitation to prevent large fusion with many input/output buffer.
Differential Revision: [D85509351](https://our.internmc.facebook.com/intern/diff/D85509351/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166275
Approved by: https://github.com/eellison
ghstack dependencies: #166274
2025-10-29 16:41:34 +00:00
Millie Chen
398fdd32bb
[Inductor] Lower fallback nodes annotated with "should_fallback" ( #166339 )
...
Summary:
This PR introduces an inductor-level fallback mechanism that gives users control over which operations or subgraphs Inductor should lower and which should fall back to preexisting kernels. This has similar motivation as #164776 in providing flexibility to selectively disable Inductor lowering for specific nodes.
The implementation simply adds a check for the `"should_fallback"` metadata annotation on FX graph nodes. If this is set to `True`, the lowerer falls back before attempting the normal lowering path. Note that since these are user-directed fallbacks dependent upon specific, customized conditions, use `add_to_fallback_set=False` to avoid permanent overwrites of inductor's lowering/fallback rules.
Simple example marking nodes for fallback based on custom predicates:
```
def should_fallback_predicate(node: torch.fx.Node, pred: Callable[torch.fx.Node, bool]):
# Apply predicate and mark for fallback if needed
if self.predicate(node):
node.meta["should_fallback"] = True
```
Test Plan: added a CI test
Differential Revision: D85347587
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166339
Approved by: https://github.com/blaine-rister , https://github.com/eellison
2025-10-29 16:33:55 +00:00
PyTorch MergeBot
5fd1d41e62
Revert "[user-streams] Make device-agnostic streams weakref compatible ( #164304 )"
...
This reverts commit bfc2050db9 .
Reverted https://github.com/pytorch/pytorch/pull/164304 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605 ) [HUD commit link](cde81e92b9 ) ([comment](https://github.com/pytorch/pytorch/pull/164304#issuecomment-3462489278 ))
2025-10-29 16:09:54 +00:00
PyTorch MergeBot
c594950e86
Revert "nn.Linear: nD contiguous input + bias -- dispatch to addmm also when weight is sparse ( #166071 )"
...
This reverts commit 467c21ad9a .
Reverted https://github.com/pytorch/pytorch/pull/166071 on behalf of https://github.com/atalman due to Multiple CI breakages: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909087335/job/53976915830 ) [HUD commit link](467c21ad9a ) ([comment](https://github.com/pytorch/pytorch/pull/166071#issuecomment-3462458968 ))
2025-10-29 16:05:30 +00:00
Laith Sakka
14102fb1f3
add new line in log ( #164240 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164240
Approved by: https://github.com/ColinPeppler , https://github.com/Skylion007 , https://github.com/ezyang
ghstack dependencies: #164075
2025-10-29 16:03:32 +00:00
PyTorch MergeBot
5cdbcb5233
Revert "[User-streams] Make torch.Event weakref compatible ( #164522 )"
...
This reverts commit cde81e92b9 .
Reverted https://github.com/pytorch/pytorch/pull/164522 on behalf of https://github.com/atalman due to Breaks periodic: test/dynamo/test_streams.py::TestStreams::test_stream_weakref [GH job link](https://github.com/pytorch/pytorch/actions/runs/18909552619/job/53979171605 ) [HUD commit link](cde81e92b9 ) ([comment](https://github.com/pytorch/pytorch/pull/164522#issuecomment-3462450571 ))
2025-10-29 16:03:03 +00:00
Mikayla Gawarecki
eae701cad0
Add scaffolding for StableIValue FC/BC (no PoC) ( #164332 )
...
1. Add `extension_build_version` and `is_internal` to `FromImpl`/`ToImpl` (this will be useful for future if we need to break the BC of any type) #163832 has the PoC of how we would actually use this system
2. Add `aoti_torch_library_impl_v2` that takes in an additional `extension_build_version` argument, updates callsite in `torch/csrc/stable/library.h` to always pass `TORCH_ABI_VERSION` for this argument
3. Add `extension_build_version` to `from_ivalue` and `to_ivalue` and update all callsites
4. Add a private `_from` and `_to` that pass `is_internal=True` to `FromImpl`/`ToImpl`, making it easier to reason about what is being called from libtorch-land / extension-land
**Note: This PR does not include a linter that tells the user to update from/to if changing the ABI of a type in headeronly, which I intend to do in https://github.com/pytorch/pytorch/pull/163998 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164332
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356 , #166373 , #163683
2025-10-29 15:41:45 +00:00
Mikayla Gawarecki
8f51556daa
Add scaffolding for aoti_torch_call_dispatcher BC with native ops ( #163683 )
...
Part 1 of plan in https://docs.google.com/document/d/1MaX51H5aEQE5XnOlnZIpf9oCYwzGrTWkgBACxNzsmWE/edit?usp=sharing
- Upgrade `aoti_torch_call_dispatcher` to v2 with an `extension_build_version`
- Allow registration of StableIValue stack --> IValue stack adapters for schema changes
#### Note: This PR does not include a linter that tells the user to add the upgrader if the schema changes, which is an important piece that will be added in a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163683
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356 , #166373
2025-10-29 15:41:45 +00:00
Mikayla Gawarecki
c0bbda37e8
Move static from_ivalue/to_ivalue to new shim_common.cpp ( #166373 )
...
Move `from_ivalue` and `to_ivalue` and their dependents `StableIValueBoxedKernel`, `aoti_torch_library_impl` `aoti_torch_call_dispatcher` into new (non-aoti shim_common.cpp)
This is in prep for the above PRs where I add v2s (`torch_call_dispatcher` and `torch_library_impl`) that are versioning aware
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166373
Approved by: https://github.com/janeyx99
ghstack dependencies: #164356
2025-10-29 15:41:36 +00:00
Mikayla Gawarecki
fefb546b91
Add TORCH_TARGET_VERSION for stable ABI ( #164356 )
...
And update it so comparisons can be done by the preprocessor
**Note: We also need to gate in shim.h and figure out how to enforce this**
Differential Revision: [D85683549](https://our.internmc.facebook.com/intern/diff/D85683549 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164356
Approved by: https://github.com/janeyx99
2025-10-29 15:41:28 +00:00
PyTorch MergeBot
d6d6fa26f5
Revert "bwd pass ( #164504 )"
...
This reverts commit f36f372acc .
Reverted https://github.com/pytorch/pytorch/pull/164504 on behalf of https://github.com/jeffdaily due to CI had been clean for both cuda and rocm before merge, broke post merge? ([comment](https://github.com/pytorch/pytorch/pull/164504#issuecomment-3462116676 ))
2025-10-29 15:10:40 +00:00
Nikita Vedeneev
467c21ad9a
nn.Linear: nD contiguous input + bias -- dispatch to addmm also when weight is sparse (#166071 )
...
As per title.
It seems safe to be able to generalize to arbitrary contiguous inputs since `at::matmul` is likely to do the flattening to avoid `baddmm`.
Additionally, we guard for bias to be 1D and contiguous which is guaranteed to be fused with no copies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166071
Approved by: https://github.com/ngimel
2025-10-29 13:13:40 +00:00
Way Wang
4a94591321
filter out alloc-free pairs from trace plot ( #165752 )
...
Summary:
When dealing with a large memory trace, the resulting plot can be challenging to interpret and analyze.
This commit introduces a feature that enables filtering of allocations that have already been freed, providing a more focused view.
The remaining events in the plot often warrant closer examination, as they may be indicative of potential out-of-memory (OOM) issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165752
Approved by: https://github.com/zdevito
2025-10-29 12:44:54 +00:00
PyTorch MergeBot
5e7272b60a
Revert "[BE] Move GreenContext implementation details to cpp ( #166462 )"
...
This reverts commit afaaaa314c .
Reverted https://github.com/pytorch/pytorch/pull/166462 on behalf of https://github.com/atalman due to multiple internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166462#issuecomment-3461145801 ))
2025-10-29 11:59:41 +00:00
PyTorch MergeBot
1dd6b76914
Revert "[1/N] Remove unused loop variables ( #166258 )"
...
This reverts commit 76b2c37045 .
Reverted https://github.com/pytorch/pytorch/pull/166258 on behalf of https://github.com/atalman due to breaks test/distributed/test_serialization.py::TestSerialization::test_weights_only [GH job link](https://github.com/pytorch/pytorch/actions/runs/18894311802/job/53929321703 ) [HUD commit link](76b2c37045 ) ([comment](https://github.com/pytorch/pytorch/pull/166258#issuecomment-3460964612 ))
2025-10-29 11:10:37 +00:00
Xuehai Pan
284716a691
[pytree] add treespec_{leaf,tuple,dict} functions for args_spec modification ( #160843 )
...
The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class.
Changes:
1. Add function `treespec_leaf()` to replace `LeafSpec()`.
2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `*args` / `**kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class.
3. Change `len(spec.children_specs)` to `spec.num_children`.
4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`.
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843
Approved by: https://github.com/mlazos
2025-10-29 09:16:24 +00:00