pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
Zhang, Jianyi	32920926f0	[xpu][fix] [Inductor] Avoid using tl.sqrt_rn on XPU before triton is ready (#165740 ) Fixes #165738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165740 Approved by: https://github.com/etaf, https://github.com/EikanWang, https://github.com/chuanqi129, https://github.com/desertfire	2025-10-30 09:24:24 +00:00
Yuanyuan Chen	39e5cdddf7	[2/N] Add strict parameter to Python zip calls (#166257 ) This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257 Approved by: https://github.com/janeyx99	2025-10-30 08:10:10 +00:00
libohao1201	2829d48bd1	[xpu][test][1/N] Port 3 fsdp distributed test cases to Intel GPU (#161476 ) For https://github.com/pytorch/pytorch/issues/114850, we will port 3 distributed tests to Intel GPU. We could enable Intel GPU with the following methods and try the best to keep the original code styles: - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - use "requires_accelerator_dist_backend" to enable "xccl" - enabled XPU for some test path - skip some test cases that Intel GPU does not support Pull Request resolved: https://github.com/pytorch/pytorch/pull/161476 Approved by: https://github.com/weifengpy, https://github.com/guangyey	2025-10-30 07:30:04 +00:00
Michael Lazos	f1af679270	[user-streams] Handle returning the current stream with/without device index (#165356 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165356 Approved by: https://github.com/anijain2305 ghstack dependencies: #164304, #164522, #164819, #165211, #165212	2025-10-30 07:20:25 +00:00
Shawn Xu	d46d8d6f54	[triton][sigmoid] Fix kernel cache and serialization issue for triton sigmoid + CUDA kernel bug (#166568 ) Differential Revision: D85792537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166568 Approved by: https://github.com/minjang	2025-10-30 06:17:39 +00:00
Michael Lazos	a5335263d3	[user-streams] Track symbolic current stream (#165212 ) merge into stream tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/165212 Approved by: https://github.com/anijain2305 ghstack dependencies: #164304, #164522, #164819, #165211	2025-10-30 04:58:53 +00:00
Michael Lazos	79aee77381	[user-streams] Add current stream source (#165211 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165211 Approved by: https://github.com/anijain2305 ghstack dependencies: #164304, #164522, #164819	2025-10-30 04:58:53 +00:00
Michael Lazos	f5cb9a4c68	[user-streams] Fix stream graph output semantics (#164819 ) Preivously, we would stash a single stream value we constructed at trace time in a global and return the same value from repeated calls to the graph. With this PR, we construct the stream value in advance, reference the constructed value in the graph via the lookup table, and if that value is returned as an output, read the value from the lookup table and return it (in bytecode, not as a graph output, since we don't support arbitrary stream outputs). Pull Request resolved: https://github.com/pytorch/pytorch/pull/164819 Approved by: https://github.com/anijain2305 ghstack dependencies: #164304, #164522	2025-10-30 04:58:46 +00:00
PyTorch UpdateBot	f20bf77874	[audio hash update] update the pinned audio hash (#166597 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166597 Approved by: https://github.com/pytorchbot	2025-10-30 04:28:30 +00:00
Nicolas Macchioni	75f798e05b	[inductor][mi350] add tech specs for MI350 (#166576 ) Summary: was digging through matmul padding for other work, and I noticed that the compute bound checking won't work on MI350 since we haven't supplied the tech specs yet. I added MI350 specs following the predefined format Test Plan: CI Differential Revision: D85804980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166576 Approved by: https://github.com/leitian	2025-10-30 03:46:52 +00:00
Angel Li	476b149a00	bwd pass (#164504 ) Summary This implements the backward pass for the Varlen API and registers `_varlen_attn()` as a custom op. Benchmarking To benchmark, we compare runtime and TFLOPs against the current SDPA approach with padding. Settings: - 1 H100 machine - `batch_size=8`, `max_seq_len=2048`, `embed_dim=1024`, `num_heads=16` - dtype `torch.bfloat16` - `is_causal=False` - for variable length, we set sequences to be random multiples of 64 up to `max_seq_len` - 100 runs \| \| Variable Length API \| SDPA \| \|--------\|--------------------\|----------\| \| Runtime \| 0.8189142608642578 ms \| 3.263883056640625 ms \| \| TFLOPs \| 268.652 \| 158.731 \| We can see that runtime for Varlen is >3x faster Testing Run `python test/test_varlen_attention.py` for unit tests where we verify basic functionality and confirm numerical match between varlen gradients vs SDPA. For custom op testing, `test_custom_op_registration` uses logging mode to verify that `_varlen_attn()` was called and tests with `torch.compile`. `test_custom_op_compliances` uses `torch.library.opcheck()` to verify. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164504 Approved by: https://github.com/drisspg	2025-10-30 03:46:37 +00:00
Justin Chu	845da9c817	[ONNX] Ignore pyrefly errors in torchlib (#166588 ) Fixes #166475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166588 Approved by: https://github.com/titaiwangms	2025-10-30 03:43:52 +00:00
xinan.lin	0918bf321c	[xpu][test] Reuse native_mm and mix_order_reduction for Intel GPU. (#166384 ) This PR reused native_mm and mix_order_reduction for Intel GPU and enabled the corresonding test. Fixes #165370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166384 Approved by: https://github.com/jansel	2025-10-30 03:38:35 +00:00
Laith Sakka	90519402c2	address DDE in matmul decomp (#166541 ) Address https://github.com/pytorch/pytorch/issues/165081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166541 Approved by: https://github.com/mlazos	2025-10-30 03:19:29 +00:00
Dzmitry Huba	791ca80d3a	Enable local tensor mode for DTensor attention and convolution tests (#166406 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166406 Approved by: https://github.com/ezyang	2025-10-30 02:48:02 +00:00
Yuanyuan Chen	5cbdade914	Fix a syntactic error in test_indexing.py (#166390 ) This PR fixes a syntactic error in test_indexing.py by a misplaced `if else` expression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166390 Approved by: https://github.com/jerryzh168	2025-10-30 02:28:01 +00:00
amdfaa	0187db88d4	[ROCm][CI] Create periodic-rocm-mi200.yml (#166544 ) * We are separating out the rocm jobs of the periodic workflow * We are introducing a new label `ciflow/periodic-rocm-mi200` to allow us to run distributed tests only on ROCm runners, without triggering many other jobs on the `periodic.yml` workflow (via `ciflow/periodic`) * This new workflow will also be triggered via the `ciflow/periodic`, thus maintaining the old status quo. * We are reverting to the `linux.rocm.gpu.4` label since it targets a lot more CI nodes at this point than the K8s/ARC-based `linux.rocm.gpu.mi250.4` label, as that is still having some network/scaling issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166544 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-30 02:08:07 +00:00
Bruce Chang	311ea0dec0	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/kwen2501	2025-10-30 01:50:54 +00:00
dependabot[bot]	cf7756da38	Bump uv from 0.9.5 to 0.9.6 in /.ci/lumen_cli (#166578 ) Bumps [uv](https://github.com/astral-sh/uv) from 0.9.5 to 0.9.6. - [Release notes](https://github.com/astral-sh/uv/releases) - [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/uv/compare/0.9.5...0.9.6) --- updated-dependencies: - dependency-name: uv dependency-version: 0.9.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-29 18:28:14 -07:00
Ruben Rodriguez Buchillon	e380028a51	[inductor][choices] lookup table choices 1/3 (#164978 ) \# why - enable users to control which choices get used on which inputs - reduce lowering time, and pin kernel selection, by selecting them for the inputs \# what - a new InductorChoices subclass that implements a lookup table - a README explaining the usage - corresponding testing - currently only supports templates that go through `V.choices.get_template_configs` \# testing ``` python3 -bb -m pytest test/inductor/test_lookup_table.py -v ``` Differential Revision: [D85685743](https://our.internmc.facebook.com/intern/diff/D85685743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164978 Approved by: https://github.com/PaulZhang12, https://github.com/eellison, https://github.com/mlazos	2025-10-30 01:28:01 +00:00
Colin L Reliability Rice	b4403bfc62	Add waitcounters for torch.compile subprocess pool (#164527 ) Summary: This ads waitcounter for whether or not the pool is running, as well as if we are running jobs. This also ads waitcounters for the first job within a pool. First job and running are working correctly. The job waitcounter seems to either be detecting a leak of a job, or is broken subtly. Test Plan: We've tested this internally and see valid ods metrics. Note that we may be leaking jobs, or the job logic may not be handling an exception correctly. Differential Revision: D83705931 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164527 Approved by: https://github.com/masnesral	2025-10-30 01:15:26 +00:00
Jeff Daily	12c12466b0	[ROCm][CI] remove amdgpu from install_rocm.sh (#166575 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166575 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-30 01:08:33 +00:00
Sherlock Huang	f4d05feb7a	Repro dynamo issue for union typed annotation (#166443 ) when nested function has type annotation using "\|", it fails. it works fine with `Union[torch.Tensor, DTensor]` tho. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166443 Approved by: https://github.com/anijain2305	2025-10-30 01:05:15 +00:00
Pian Pawakapan	7481622237	[symbolic shapes] remove maybe_guard_rel warning (#166553 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/166553 Approved by: https://github.com/laithsakka	2025-10-30 00:57:28 +00:00
Laith Sakka	b2a0f90501	Fix comparing inductor actual strides vs bw graph for activations should not throw DDE. (#166277 ) Fix https://github.com/pytorch/pytorch/issues/163894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166277 Approved by: https://github.com/Lucaskabela	2025-10-30 00:34:05 +00:00
eellison	14d4a77495	disable current modes instead of no dispatch in estimation (#166571 ) otherwise, the custom estimation's TorchDispatchModes will be disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166571 Approved by: https://github.com/SherlockNoMad, https://github.com/bdhirsh	2025-10-29 23:24:41 +00:00
Ivan Zaitsev	3d4ca228be	Remove METADATA.bzl files (#166574 ) (meta-internal, should not be synced) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166574 Approved by: https://github.com/bigfootjon	2025-10-29 23:17:41 +00:00
eellison	c3d205d598	helper function for replacing nodes in aug graph (#166309 ) When we do bucketing, we replace starts and waits with new nodes. This pr adds a helper to transfer the augmented graph additional deps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166309 Approved by: https://github.com/IvanKobzarev	2025-10-29 23:08:33 +00:00
Michael Lazos	c54e2c5b41	[User-streams] Make torch.Event weakref compatible (#164522 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164522 Approved by: https://github.com/williamwen42 ghstack dependencies: #164304	2025-10-29 23:06:31 +00:00
Michael Lazos	c3047938a0	[user-streams] Make device-agnostic streams weakref compatible (#164304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164304 Approved by: https://github.com/williamwen42, https://github.com/colesbury	2025-10-29 23:06:31 +00:00
Shangdi Yu	d2eff5d454	Add python stack trace to AOTI generated code (#160539 ) Summary: We add a thread_local KernelContext object so Strobelight (and other potential profilers) can read the stack trace information of the running kernel. This will bring extra overhead, so we guard this behind the `cpp.enable_kernel_profile` flag. Example output code: ```cpp #include <torch/csrc/inductor/aoti_runtime/kernel_context_tls.h> namespace torch::aot_inductor { thread_local KernelContext* tls_kernel_context = nullptr; } // Other code ..... void AOTInductorModel::run_impl( AtenTensorHandle* input_handles, // array of input AtenTensorHandle; handles // are stolen; the array itself is borrowed AtenTensorHandle* output_handles, // array for writing output AtenTensorHandle; handles // will be stolen by the caller; the array itself is // borrowed DeviceStreamType stream, AOTIProxyExecutorHandle proxy_executor ) { __check_inputs_outputs(input_handles, output_handles); auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 4); auto arg2_1 = std::move(inputs[0]); auto arg3_1 = std::move(inputs[1]); auto arg4_1 = std::move(inputs[2]); auto arg5_1 = std::move(inputs[3]); [[maybe_unused]] auto& fc1_weight = constants_->at(0); [[maybe_unused]] auto& fc1_bias = constants_->at(1); inputs.clear(); [[maybe_unused]] auto& kernels = static_cast<AOTInductorModelKernels&>(this->kernels_.get()); static constexpr int64_t int_array_0[] = {8L, 16L}; static constexpr int64_t int_array_1[] = {16L, 1L}; AtenTensorHandle buf0_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_0, int_array_1, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle)); RAIIAtenTensorHandle buf0(buf0_handle); // Topologically Sorted Source Nodes: [linear], Original ATen: [aten.t, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:1 static constexpr int64_t int_array_2[] = {10L, 16L}; static constexpr int64_t int_array_3[] = {1L, 10L}; { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 829, in forward x = self.fc1(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/linear.py", line 134, in forward return F.linear(input, self.weight, self.bias) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf0, fc1_bias, arg2_1, wrap_with_raii_handle_if_needed(reinterpret_tensor_wrapper(fc1_weight, 2, int_array_2, int_array_3, 0L)), 1L, 1L)); } arg2_1.reset(); auto buf1 = std::move(buf0); // reuse static constexpr int64_t int_array_4[] = {10L, 20L}; static constexpr int64_t int_array_5[] = {20L, 1L}; AtenTensorHandle buf2_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_4, int_array_5, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf2_handle)); RAIIAtenTensorHandle buf2(buf2_handle); // [Provenance debug handles] cpp_fused_mul_relu_sigmoid_0:2 { KernelContextGuard _ctx("cpp_fused_mul_relu_sigmoid_0", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 831, in forward x = self.sigmoid(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 359, in forward return torch.sigmoid(input) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 830, in forward x = self.relu(x) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/torch/nn/modules/activation.py", line 144, in forward return F.relu(input, inplace=self.inplace) File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 832, in forward d = a 3.14 )"); cpp_fused_mul_relu_sigmoid_0((float)(buf1.data_ptr()), (const float)(arg3_1.data_ptr()), (float)(buf2.data_ptr())); } arg3_1.reset(); static constexpr int64_t int_array_6[] = {10L, 30L}; static constexpr int64_t int_array_7[] = {30L, 1L}; AtenTensorHandle buf3_handle; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_6, int_array_7, cached_torch_dtype_float32, cached_torch_device_type_cpu, this->device_idx_, &buf3_handle)); RAIIAtenTensorHandle buf3(buf3_handle); // Topologically Sorted Source Nodes: [mul, addmm], Original ATen: [aten.mul, aten.addmm] // [Provenance debug handles] aoti_torch_cpu_addmm_out:3 { KernelContextGuard _ctx("aoti_torch_cpu_addmm_out", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 833, in forward y = torch.addmm(c, d, b) )"); RAIIAtenRecordFunctionHandle record_aoti_torch_cpu_addmm_out_("aoti_torch_cpu_addmm_out", nullptr); AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_addmm_out(buf3, arg5_1, buf2, arg4_1, 1L, 1L)); } arg4_1.reset(); arg5_1.reset(); buf2.reset(); auto buf4 = std::move(buf3); // reuse // [Provenance debug handles] cpp_fused_gelu_1:4 { KernelContextGuard _ctx("cpp_fused_gelu_1", R"( File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/cba6f4fb5faa5f79/caffe2/test/inductor/__provenance_tracing__/provenance_tracing#link-tree/caffe2/test/inductor/test_provenance_tracing.py", line 834, in forward z = torch.nn.functional.gelu(y) )"); cpp_fused_gelu_1((float)(buf4.data_ptr())); } output_handles[0] = buf1.release(); output_handles[1] = buf4.release(); } // AOTInductorModel::run_impl ``` Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D78436007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160539 Approved by: https://github.com/yiming0416	2025-10-29 22:47:52 +00:00
PyTorch MergeBot	972030fe2e	Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 )" This reverts commit `284716a691`. Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal torchrec test' ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3464647878))	2025-10-29 22:46:48 +00:00
Jeff Daily	d401e4e70a	[ROCm][CUDA] add unit test utility busy_wait_for_flag (#166218 ) torch.cuda._busy_wait_for_flag() will launch a kernel that spins until a flag is set by a corresponding torch.cuda._clear_flag(). These must be run on separate streams or it will deadlock. When used correctly these kernels will put work on the GPU that is more predictable than torch.cuda._sleep() in cases where the unit test is depending on the GPU being busy. Fixes #120318. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166218 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-29 22:40:23 +00:00
Mikayla Gawarecki	f1a3440715	FC/BC policy for libtorch stable ABI (#163991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163991 Approved by: https://github.com/janeyx99 ghstack dependencies: #163899	2025-10-29 22:35:36 +00:00
Andrey Talman	82ff07c788	Add py 3.14 CI docker build pytorch-linux-jammy-py3.14-clang12 (#164791 ) Related to https://github.com/pytorch/pytorch/issues/156856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164791 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/albanD	2025-10-29 22:21:22 +00:00
Rob Timpe	e0604d3170	[dynamo] Fix ListIterator tracking mutations to original list (#166350 ) Currently ListIteratorVariable copies the underlying list, which prevents it from seeing mutations to the original list. Remove the copy to match cpython behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166350 Approved by: https://github.com/guilhermeleobas ghstack dependencies: #166349, #162768	2025-10-29 21:54:37 +00:00
Rob Timpe	8101fd46d4	[dynamo] Implement iter with a polyfill (#162768 ) Currently most variable trackers implement `iter` via `_call_iter_tuple_list`. This makes it difficult to customize the behavior of `iter` for different variable types. Instead, implement `iter` via a polyfill, which will delegate to the appropriate `__iter__` method. While this method is more flexible, it increases the overhead of dynamo tracing. For example, `iter(x)` will generate 9x more instructions than the current implementation for common iterable types. Microbenchmarking shows a ~6x slowdown for this operation. I suspect this would be much less for realistic workloads, but more work would be needed to get specific numbers. If the performance is a concern we could also consider adding a fast path for types that are known to correctly implement `__iter__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162768 Approved by: https://github.com/guilhermeleobas ghstack dependencies: #166349	2025-10-29 21:54:37 +00:00
Rob Timpe	3d4a2d8a93	[dynamo] Add __iter__ for iterable VariableTrackers (#166349 ) This is in preparation for implementing iter with a polyfill Pull Request resolved: https://github.com/pytorch/pytorch/pull/166349 Approved by: https://github.com/guilhermeleobas	2025-10-29 21:54:37 +00:00
Camyll Harajli	59ddfb69a7	[cpu/gpu split] (#165696 ) Summary: cpu/gpu split. cuda is default due to some downstream targets configurations. Test Plan: test in CI Differential Revision: D80712802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165696 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/atalman	2025-10-29 21:44:52 +00:00
Boyuan Feng	bebabd7fce	[Graph Partition] move custom rules to inductor config (#166458 ) This PR adds `custom_should_partition_ops: list[str]` to specify the name of custom ops upon which graph partition happens. It works with cache since it is a `list[str]` in the config file. The op name should be of format "mylib::baz". Close: #165341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166458 Approved by: https://github.com/ProExpertProg, https://github.com/eellison, https://github.com/zou3519	2025-10-29 21:43:58 +00:00
Sean McGovern	56a809aa07	[DTensor] Fix torch.all() using incorrect reduction operator (#165924 ) Fixes #165923 Corrects the reduction operation to be product. Enables "all" in the boolean tensor tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165924 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-10-29 20:58:35 +00:00
Yuanyuan Chen	b33762bd2f	Fix incomplete test_memory_plots_metadata (#166508 ) The different context cases were not fully tested before this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166508 Approved by: https://github.com/Skylion007	2025-10-29 20:55:00 +00:00
fduwjj	f02708c2be	[DeviceMesh] Remove slicing submesh warning messages and clean up in fsdp params (#166466 ) Differential Revision: [D85735294](https://our.internmc.facebook.com/intern/diff/D85735294) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166466 Approved by: https://github.com/fegin	2025-10-29 20:52:49 +00:00
Justin Chu	a186aa8d6c	[ONNX] Change stacklevel in warning message for export (#166558 ) Change to 3 so that the warning shows user callsite. (Where user calls torch.onnx.export) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166558 Approved by: https://github.com/titaiwangms	2025-10-29 20:45:25 +00:00
Tushar Jain	48c3b71ecc	transform fr traces for ft (#166149 ) Summary: - the ranks in the default pg config are local ranks - however fr trace analysis requires them to be global ranks - so we transform the local ranks to global ranks before the analysis kicks in based on a cli flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/166149 Approved by: https://github.com/fduwjj	2025-10-29 20:44:48 +00:00
Nikita Shulga	2c9f877fa7	Revert "[PyTorch] Improve aarch64 performance of bfloat16 ops (#166028 )" This reverts commit `3e77a2b478`. Otherwise it fails ARM build with older compilers with errors that looks as follows: ``` vec128_bfloat16_neon.h:666:12: error: operation not permitted on type ‘bfloat16_t’ 666 \| return (-x) * y - z; ``` For more self-contained example see https://godbolt.org/z/bbY4xWh45 (that compiles the same code using clang-15 and clang-19)	2025-10-29 13:35:59 -07:00
Tushar Jain	fc540cefd4	set pg name based on ranks (#166182 ) Summary: - in torchft we have multiple default pg's, 1 for each task group - for flight recorder to work, each of these need to have a different name, so entries can be matched - change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group Pull Request resolved: https://github.com/pytorch/pytorch/pull/166182 Approved by: https://github.com/fduwjj	2025-10-29 20:13:48 +00:00
Maggie Moss	d1a6e006e0	Fix syntax for pyrefly errors (#166496 ) Last one! This ensures all existing suppressions match the syntax expected and will silence only one error code pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166496 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-10-29 20:00:25 +00:00
Rohit Singh Rathaur	fa560e1158	[ao][pruning] Replace assert statements with AssertionError exceptions (#164926 ) Replace assert statement with explicit ValueError exception to ensure the validation check is not removed when Python runs with optimization flag (-O). This is a draft PR to confirm the process. Fixes partially #164878. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164926 Approved by: https://github.com/fffrog, https://github.com/albanD Co-authored-by: Jiawei Li <ljw1101.vip@gmail.com>	2025-10-29 17:46:46 +00:00

1 2 3 4 5 ...

95180 Commits