linhaifeng
369f2d6951
[3/N] fix typo in other folders ( #166606 )
...
fix typo in other folders
#166374
#166126
_typos.toml
```bash
[files]
extend-exclude = ["tools/linter/dictionary.txt"]
[default.extend-words]
nd = "nd"
arange = "arange"
Nd = "Nd"
GLOBALs = "GLOBALs"
hte = "hte"
iy = "iy"
PN = "PN"
Dout = "Dout"
optin = "optin"
gam = "gam"
PTD = "PTD"
Sur = "Sur"
nin = "nin"
tme = "tme"
inpt = "inpt"
mis = "mis"
Raison = "Raison"
ouput = "ouput"
nto = "nto"
Onwer = "Onwer"
callibrate = "callibrate"
ser = "ser"
Metdata = "Metdata"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606
Approved by: https://github.com/ezyang
2025-10-30 10:30:40 +00:00
Zhang, Jianyi
32920926f0
[xpu][fix] [Inductor] Avoid using tl.sqrt_rn on XPU before triton is ready ( #165740 )
...
Fixes #165738
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165740
Approved by: https://github.com/etaf , https://github.com/EikanWang , https://github.com/chuanqi129 , https://github.com/desertfire
2025-10-30 09:24:24 +00:00
Yuanyuan Chen
39e5cdddf7
[2/N] Add strict parameter to Python zip calls ( #166257 )
...
This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257
Approved by: https://github.com/janeyx99
2025-10-30 08:10:10 +00:00
libohao1201
2829d48bd1
[xpu][test][1/N] Port 3 fsdp distributed test cases to Intel GPU ( #161476 )
...
For https://github.com/pytorch/pytorch/issues/114850 , we will port 3 distributed tests to Intel GPU.
We could enable Intel GPU with the following methods and try the best to keep the original code styles:
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use "requires_accelerator_dist_backend" to enable "xccl"
- enabled XPU for some test path
- skip some test cases that Intel GPU does not support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161476
Approved by: https://github.com/weifengpy , https://github.com/guangyey
2025-10-30 07:30:04 +00:00
Michael Lazos
f1af679270
[user-streams] Handle returning the current stream with/without device index ( #165356 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165356
Approved by: https://github.com/anijain2305
ghstack dependencies: #164304 , #164522 , #164819 , #165211 , #165212
2025-10-30 07:20:25 +00:00
Shawn Xu
d46d8d6f54
[triton][sigmoid] Fix kernel cache and serialization issue for triton sigmoid + CUDA kernel bug ( #166568 )
...
Differential Revision: D85792537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166568
Approved by: https://github.com/minjang
2025-10-30 06:17:39 +00:00
Michael Lazos
a5335263d3
[user-streams] Track symbolic current stream ( #165212 )
...
merge into stream tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165212
Approved by: https://github.com/anijain2305
ghstack dependencies: #164304 , #164522 , #164819 , #165211
2025-10-30 04:58:53 +00:00
Michael Lazos
79aee77381
[user-streams] Add current stream source ( #165211 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165211
Approved by: https://github.com/anijain2305
ghstack dependencies: #164304 , #164522 , #164819
2025-10-30 04:58:53 +00:00
Michael Lazos
f5cb9a4c68
[user-streams] Fix stream graph output semantics ( #164819 )
...
Preivously, we would stash a single stream value we constructed at trace time in a global and return the same value from repeated calls to the graph.
With this PR, we construct the stream value in advance, reference the constructed value in the graph via the lookup table, and if that value is returned as an output, read the value from the lookup table and return it (in bytecode, not as a graph output, since we don't support arbitrary stream outputs).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164819
Approved by: https://github.com/anijain2305
ghstack dependencies: #164304 , #164522
2025-10-30 04:58:46 +00:00
PyTorch UpdateBot
f20bf77874
[audio hash update] update the pinned audio hash ( #166597 )
...
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml ).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166597
Approved by: https://github.com/pytorchbot
2025-10-30 04:28:30 +00:00
Nicolas Macchioni
75f798e05b
[inductor][mi350] add tech specs for MI350 ( #166576 )
...
Summary:
was digging through matmul padding for other work, and I noticed that the compute bound checking won't work on MI350 since we haven't supplied the tech specs yet.
I added MI350 specs following the predefined format
Test Plan: CI
Differential Revision: D85804980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166576
Approved by: https://github.com/leitian
2025-10-30 03:46:52 +00:00
Angel Li
476b149a00
bwd pass ( #164504 )
...
**Summary**
This implements the backward pass for the Varlen API and registers `_varlen_attn()` as a custom op.
**Benchmarking**
To benchmark, we compare runtime and TFLOPs against the current SDPA approach with padding.
Settings:
- 1 H100 machine
- `batch_size=8`, `max_seq_len=2048`, `embed_dim=1024`, `num_heads=16`
- dtype `torch.bfloat16`
- `is_causal=False`
- for variable length, we set sequences to be random multiples of 64 up to `max_seq_len`
- 100 runs
| | Variable Length API | SDPA |
|--------|--------------------|----------|
| Runtime | 0.8189142608642578 ms | 3.263883056640625 ms |
| TFLOPs | 268.652 | 158.731 |
We can see that runtime for Varlen is >3x faster
**Testing**
Run `python test/test_varlen_attention.py` for unit tests where we verify basic functionality and confirm numerical match between varlen gradients vs SDPA.
For custom op testing, `test_custom_op_registration` uses logging mode to verify that `_varlen_attn()` was called and tests with `torch.compile`. `test_custom_op_compliances` uses `torch.library.opcheck()` to verify.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164504
Approved by: https://github.com/drisspg
2025-10-30 03:46:37 +00:00
Justin Chu
845da9c817
[ONNX] Ignore pyrefly errors in torchlib ( #166588 )
...
Fixes #166475
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166588
Approved by: https://github.com/titaiwangms
2025-10-30 03:43:52 +00:00
xinan.lin
0918bf321c
[xpu][test] Reuse native_mm and mix_order_reduction for Intel GPU. ( #166384 )
...
This PR reused native_mm and mix_order_reduction for Intel GPU and enabled the corresonding test.
Fixes #165370
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166384
Approved by: https://github.com/jansel
2025-10-30 03:38:35 +00:00
Laith Sakka
90519402c2
address DDE in matmul decomp ( #166541 )
...
Address https://github.com/pytorch/pytorch/issues/165081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166541
Approved by: https://github.com/mlazos
2025-10-30 03:19:29 +00:00
Dzmitry Huba
791ca80d3a
Enable local tensor mode for DTensor attention and convolution tests ( #166406 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166406
Approved by: https://github.com/ezyang
2025-10-30 02:48:02 +00:00
Yuanyuan Chen
5cbdade914
Fix a syntactic error in test_indexing.py ( #166390 )
...
This PR fixes a syntactic error in test_indexing.py by a misplaced `if else` expression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166390
Approved by: https://github.com/jerryzh168
2025-10-30 02:28:01 +00:00
amdfaa
0187db88d4
[ROCm][CI] Create periodic-rocm-mi200.yml ( #166544 )
...
* We are separating out the rocm jobs of the periodic workflow
* We are introducing a new label `ciflow/periodic-rocm-mi200` to allow us to run distributed tests only on ROCm runners, without triggering many other jobs on the `periodic.yml` workflow (via `ciflow/periodic`)
* This new workflow will also be triggered via the `ciflow/periodic`, thus maintaining the old status quo.
* We are reverting to the `linux.rocm.gpu.4` label since it targets a lot more CI nodes at this point than the K8s/ARC-based `linux.rocm.gpu.mi250.4` label, as that is still having some network/scaling issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166544
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-30 02:08:07 +00:00
Bruce Chang
311ea0dec0
shrink_group implementation to expose ncclCommShrink API ( #164518 )
...
Closes #164529
To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink ) API to PyTorch.
This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.
For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518
Approved by: https://github.com/kwen2501
2025-10-30 01:50:54 +00:00
dependabot[bot]
cf7756da38
Bump uv from 0.9.5 to 0.9.6 in /.ci/lumen_cli ( #166578 )
...
Bumps [uv](https://github.com/astral-sh/uv ) from 0.9.5 to 0.9.6.
- [Release notes](https://github.com/astral-sh/uv/releases )
- [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md )
- [Commits](https://github.com/astral-sh/uv/compare/0.9.5...0.9.6 )
---
updated-dependencies:
- dependency-name: uv
dependency-version: 0.9.6
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-29 18:28:14 -07:00
Ruben Rodriguez Buchillon
e380028a51
[inductor][choices] lookup table choices 1/3 ( #164978 )
...
\# why
- enable users to control which choices get used on which inputs
- reduce lowering time, and pin kernel selection, by selecting
them for the inputs
\# what
- a new InductorChoices subclass that implements a lookup table
- a README explaining the usage
- corresponding testing
- currently only supports templates that go through
`V.choices.get_template_configs`
\# testing
```
python3 -bb -m pytest test/inductor/test_lookup_table.py -v
```
Differential Revision: [D85685743](https://our.internmc.facebook.com/intern/diff/D85685743 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164978
Approved by: https://github.com/PaulZhang12 , https://github.com/eellison , https://github.com/mlazos
2025-10-30 01:28:01 +00:00
Colin L Reliability Rice
b4403bfc62
Add waitcounters for torch.compile subprocess pool ( #164527 )
...
Summary:
This ads waitcounter for whether or not the pool is running, as well as if we
are running jobs.
This also ads waitcounters for the first job within a pool. First job and running are working correctly. The job waitcounter seems to either be detecting a leak of a job, or is broken subtly.
Test Plan:
We've tested this internally and see valid ods metrics.
Note that we may be leaking jobs, or the job logic may not be handling an exception correctly.
Differential Revision: D83705931
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164527
Approved by: https://github.com/masnesral
2025-10-30 01:15:26 +00:00
Jeff Daily
12c12466b0
[ROCm][CI] remove amdgpu from install_rocm.sh ( #166575 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166575
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-30 01:08:33 +00:00
Sherlock Huang
f4d05feb7a
Repro dynamo issue for union typed annotation ( #166443 )
...
when nested function has type annotation using "|", it fails.
it works fine with `Union[torch.Tensor, DTensor]` tho.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166443
Approved by: https://github.com/anijain2305
2025-10-30 01:05:15 +00:00
Pian Pawakapan
7481622237
[symbolic shapes] remove maybe_guard_rel warning ( #166553 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166553
Approved by: https://github.com/laithsakka
2025-10-30 00:57:28 +00:00
Laith Sakka
b2a0f90501
Fix comparing inductor actual strides vs bw graph for activations should not throw DDE. ( #166277 )
...
Fix https://github.com/pytorch/pytorch/issues/163894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166277
Approved by: https://github.com/Lucaskabela
2025-10-30 00:34:05 +00:00
eellison
14d4a77495
disable current modes instead of no dispatch in estimation ( #166571 )
...
otherwise, the custom estimation's TorchDispatchModes will be disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166571
Approved by: https://github.com/SherlockNoMad , https://github.com/bdhirsh
2025-10-29 23:24:41 +00:00
Ivan Zaitsev
3d4ca228be
Remove METADATA.bzl files ( #166574 )
...
(meta-internal, should not be synced)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166574
Approved by: https://github.com/bigfootjon
2025-10-29 23:17:41 +00:00
eellison
c3d205d598
helper function for replacing nodes in aug graph ( #166309 )
...
When we do bucketing, we replace starts and waits with new nodes. This pr adds a helper to transfer the augmented graph additional deps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166309
Approved by: https://github.com/IvanKobzarev
2025-10-29 23:08:33 +00:00
Michael Lazos
c54e2c5b41
[User-streams] Make torch.Event weakref compatible ( #164522 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522
Approved by: https://github.com/williamwen42
ghstack dependencies: #164304
2025-10-29 23:06:31 +00:00
Michael Lazos
c3047938a0
[user-streams] Make device-agnostic streams weakref compatible ( #164304 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304
Approved by: https://github.com/williamwen42 , https://github.com/colesbury
2025-10-29 23:06:31 +00:00
Shangdi Yu
d2eff5d454
Add python stack trace to AOTI generated code ( #160539 )
...
Summary:
We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel.
This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag.
Example output code:
```cpp
#include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h>
namespace torch::aot_inductor {
thread_local KernelContext* tls_kernel_context = nullptr;
}
// Other code .....
void AOTInductorModel::run_impl(
AtenTensorHandle*
input_handles, // array of input AtenTensorHandle; handles
// are stolen; the array itself is borrowed
AtenTensorHandle*
output_handles, // array for writing output AtenTensorHandle; handles
// will be stolen by the caller; the array itself is
// borrowed
DeviceStreamType stream,
AOTIProxyExecutorHandle proxy_executor
) {
__check_inputs_outputs(input_handles, output_handles);
auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4);
auto arg2_1 = std::move(inputs[0]);
auto arg3_1 = std::move(inputs[1]);
auto arg4_1 = std::move(inputs[2]);
auto arg5_1 = std::move(inputs[3]);
[[maybe_unused]] auto& fc1_weight = constants_->at(0);
[[maybe_unused]] auto& fc1_bias = constants_->at(1);
inputs.clear();
[[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
static constexpr int64_t int_array_0[] = {8L, 16L};
static constexpr int64_t int_array_1[] = {16L, 1L};
AtenTensorHandle buf0_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
RAIIAtenTensorHandle buf0(buf0_handle);
// Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:1
static constexpr int64_t int_array_2[] = {10L, 16L};
static constexpr int64_t int_array_3[] = {1L, 10L};
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward
x = self.fc1(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward
return F.linear(input, self.weight, self.bias)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L));
}
arg2_1.reset();
auto buf1 = std::move(buf0); // reuse
static constexpr int64_t int_array_4[] = {10L, 20L};
static constexpr int64_t int_array_5[] = {20L, 1L};
AtenTensorHandle buf2_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle));
RAIIAtenTensorHandle buf2(buf2_handle);
// [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2
{
KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward
x = self.sigmoid(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward
return torch.sigmoid(input)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward
x = self.relu(x)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward
return F.relu(input, inplace=self.inplace)
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward
d = a * 3.14
)");
cpp_fused_mul_relu_sigmoid_0((float*)(buf1.data_ptr()), (const float*)(arg3_1.data_ptr()), (float*)(buf2.data_ptr()));
}
arg3_1.reset();
static constexpr int64_t int_array_6[] = {10L, 30L};
static constexpr int64_t int_array_7[] = {30L, 1L};
AtenTensorHandle buf3_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle));
RAIIAtenTensorHandle buf3(buf3_handle);
// Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm]
// [Provenance debug handles] aoti_torch_cpu_addmm_out:3
{
KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward
y = torch.addmm(c, d, b)
)");
RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr);
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L));
}
arg4_1.reset();
arg5_1.reset();
buf2.reset();
auto buf4 = std::move(buf3); // reuse
// [Provenance debug handles] cpp_fused_gelu_1:4
{
KernelContextGuard _ctx("cpp_fused_gelu_1", R"(
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward
z = torch.nn.functional.gelu(y)
)");
cpp_fused_gelu_1((float*)(buf4.data_ptr()));
}
output_handles[0] = buf1.release();
output_handles[1] = buf4.release();
} // AOTInductorModel::run_impl
```
Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces
```
Rollback Plan:
Differential Revision: D78436007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539
Approved by: https://github.com/yiming0416
2025-10-29 22:47:52 +00:00
PyTorch MergeBot
972030fe2e
Revert "[pytree] add treespec_{leaf,tuple,dict} functions for args_spec modification ( #160843 )"
...
This reverts commit 284716a691 .
Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal torchrec test' ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3464647878 ))
2025-10-29 22:46:48 +00:00
Jeff Daily
d401e4e70a
[ROCm][CUDA] add unit test utility busy_wait_for_flag ( #166218 )
...
torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These **must** be run on separate streams or it will deadlock.
When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy.
Fixes #120318 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-29 22:40:23 +00:00
Mikayla Gawarecki
f1a3440715
FC/BC policy for libtorch stable ABI ( #163991 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163991
Approved by: https://github.com/janeyx99
ghstack dependencies: #163899
2025-10-29 22:35:36 +00:00
Andrey Talman
82ff07c788
Add py 3.14 CI docker build pytorch-linux-jammy-py3.14-clang12 ( #164791 )
...
Related to https://github.com/pytorch/pytorch/issues/156856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164791
Approved by: https://github.com/huydhn , https://github.com/malfet , https://github.com/albanD
2025-10-29 22:21:22 +00:00
Rob Timpe
e0604d3170
[dynamo] Fix ListIterator tracking mutations to original list ( #166350 )
...
Currently ListIteratorVariable copies the underlying list, which prevents it
from seeing mutations to the original list. Remove the copy to match cpython behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166350
Approved by: https://github.com/guilhermeleobas
ghstack dependencies: #166349 , #162768
2025-10-29 21:54:37 +00:00
Rob Timpe
8101fd46d4
[dynamo] Implement iter with a polyfill ( #162768 )
...
Currently most variable trackers implement `iter` via `_call_iter_tuple_list`.
This makes it difficult to customize the behavior of `iter` for different
variable types. Instead, implement `iter` via a polyfill, which will delegate
to the appropriate `__iter__` method.
While this method is more flexible, it increases the overhead of dynamo tracing.
For example, `iter(x)` will generate 9x more instructions than the current
implementation for common iterable types. Microbenchmarking shows a ~6x
slowdown for this operation. I suspect this would be much less for realistic
workloads, but more work would be needed to get specific numbers. If the
performance is a concern we could also consider adding a fast path for types
that are known to correctly implement `__iter__`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162768
Approved by: https://github.com/guilhermeleobas
ghstack dependencies: #166349
2025-10-29 21:54:37 +00:00
Rob Timpe
3d4a2d8a93
[dynamo] Add __iter__ for iterable VariableTrackers ( #166349 )
...
This is in preparation for implementing iter with a polyfill
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166349
Approved by: https://github.com/guilhermeleobas
2025-10-29 21:54:37 +00:00
Camyll Harajli
59ddfb69a7
[cpu/gpu split] ( #165696 )
...
Summary: cpu/gpu split. cuda is default due to some downstream targets configurations.
Test Plan: test in CI
Differential Revision: D80712802
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165696
Approved by: https://github.com/jeffdaily , https://github.com/malfet , https://github.com/atalman
2025-10-29 21:44:52 +00:00
Boyuan Feng
bebabd7fce
[Graph Partition] move custom rules to inductor config ( #166458 )
...
This PR adds `custom_should_partition_ops: list[str]` to specify the name of custom ops upon which graph partition happens. It works with cache since it is a `list[str]` in the config file. The op name should be of format "mylib::baz".
Close : #165341
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166458
Approved by: https://github.com/ProExpertProg , https://github.com/eellison , https://github.com/zou3519
2025-10-29 21:43:58 +00:00
Sean McGovern
56a809aa07
[DTensor] Fix torch.all() using incorrect reduction operator ( #165924 )
...
Fixes #165923
Corrects the reduction operation to be product.
Enables "all" in the boolean tensor tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165924
Approved by: https://github.com/malfet , https://github.com/Skylion007
2025-10-29 20:58:35 +00:00
Yuanyuan Chen
b33762bd2f
Fix incomplete test_memory_plots_metadata ( #166508 )
...
The different context cases were not fully tested before this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166508
Approved by: https://github.com/Skylion007
2025-10-29 20:55:00 +00:00
fduwjj
f02708c2be
[DeviceMesh] Remove slicing submesh warning messages and clean up in fsdp params ( #166466 )
...
Differential Revision: [D85735294](https://our.internmc.facebook.com/intern/diff/D85735294 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166466
Approved by: https://github.com/fegin
2025-10-29 20:52:49 +00:00
Justin Chu
a186aa8d6c
[ONNX] Change stacklevel in warning message for export ( #166558 )
...
Change to 3 so that the warning shows user callsite. (Where user calls torch.onnx.export)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166558
Approved by: https://github.com/titaiwangms
2025-10-29 20:45:25 +00:00
Tushar Jain
48c3b71ecc
transform fr traces for ft ( #166149 )
...
Summary:
- the ranks in the default pg config are local ranks
- however fr trace analysis requires them to be global ranks
- so we transform the local ranks to global ranks before the analysis kicks in based on a cli flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166149
Approved by: https://github.com/fduwjj
2025-10-29 20:44:48 +00:00
Nikita Shulga
2c9f877fa7
Revert "[PyTorch] Improve aarch64 performance of bfloat16 ops ( #166028 )"
...
This reverts commit 3e77a2b478 .
Otherwise it fails ARM build with older compilers with errors that looks
as follows:
```
vec128_bfloat16_neon.h:666:12: error: operation not permitted on type ‘bfloat16_t’
666 | return (-x) * y - z;
```
For more self-contained example see https://godbolt.org/z/bbY4xWh45
(that compiles the same code using clang-15 and clang-19)
2025-10-29 13:35:59 -07:00
Tushar Jain
fc540cefd4
set pg name based on ranks ( #166182 )
...
Summary:
- in torchft we have multiple default pg's, 1 for each task group
- for flight recorder to work, each of these need to have a different name, so entries can be matched
- change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166182
Approved by: https://github.com/fduwjj
2025-10-29 20:13:48 +00:00
Maggie Moss
d1a6e006e0
Fix syntax for pyrefly errors ( #166496 )
...
Last one! This ensures all existing suppressions match the syntax expected and will silence only one error code
pyrefly check
lintrunner
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166496
Approved by: https://github.com/Skylion007 , https://github.com/mlazos
2025-10-29 20:00:25 +00:00
Rohit Singh Rathaur
fa560e1158
[ao][pruning] Replace assert statements with AssertionError exceptions ( #164926 )
...
Replace assert statement with explicit ValueError exception to ensure the validation check is not removed when Python runs with optimization flag (-O).
This is a draft PR to confirm the process.
Fixes partially #164878 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164926
Approved by: https://github.com/fffrog , https://github.com/albanD
Co-authored-by: Jiawei Li <ljw1101.vip@gmail.com>
2025-10-29 17:46:46 +00:00