Pian Pawakapan
b9bcb37f40
[DebugMode] store stringify args by default ( #166347 )
...
DebugMode currently stores dispatch call args & kwargs, which is all intermediate tensors and more. This quickly OOMed on GPU when trying to debug some torchtitan / llama 8b models.
This defaults to storing the stringified version, adding a flag `DebugMode(store_original_args=True)` if users want to store the original args as-is (and for BC).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166347
Approved by: https://github.com/yushangdi
2025-10-30 22:12:23 +00:00
Maggie Moss
84b14f3a10
Fix error suppression syntax in utils and nn ( #166242 )
...
Fixes syntax for pyrefly : ignores so they only ignore a specific category. No functional changes
pyrefly check
lintrunner
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166242
Approved by: https://github.com/oulgen , https://github.com/cyyever
2025-10-26 05:21:07 +00:00
Pian Pawakapan
6494cdc40c
[DebugMode] add nn.Module tracking ( #165498 )
...
Uses ModTracker to record nn.Module entries, much like CommDebugMode.
Can be switched on with `DebugMode(record_nn_module=True)`:
```
[nn.Mod] Bar
[nn.Mod] Bar.abc
[nn.Mod] Bar.abc.l1
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])
[nn.Mod] Bar.abc.l2
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])
[nn.Mod] Bar.xyz
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])"""
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165498
Approved by: https://github.com/SherlockNoMad
2025-10-24 05:08:33 +00:00
Pian Pawakapan
82ef1b5db3
[DebugMode] refactor logs into _DebugCalls ( #165376 )
...
Refactors `DebugMode.operators` to be more structured `_DebugCall` objects, instead of (op, args, kwargs, call_depth) tuples. Useful going forward for attaching more information (e.g. output info, call metadata).
Is BC-breaking, but attaches an `__iter__` method for `_OpCall` and `_RedistributeCall` so previous tuple usage is accessible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165376
Approved by: https://github.com/yushangdi
2025-10-22 19:01:56 +00:00
PyTorch MergeBot
b08d8c2e50
Revert "[DebugMode][2/N] add nn.Module tracking ( #165498 )"
...
This reverts commit 45afaf08a1 .
Reverted https://github.com/pytorch/pytorch/pull/165498 on behalf of https://github.com/seemethere due to First part of the stack was reverted so will need to revert this too ([comment](https://github.com/pytorch/pytorch/pull/165498#issuecomment-3416618198 ))
2025-10-17 18:22:48 +00:00
PyTorch MergeBot
9a71d96256
Revert "[DebugMode][1/N] refactor logs into _DebugCalls ( #165376 )"
...
This reverts commit 556fc09a9f .
Reverted https://github.com/pytorch/pytorch/pull/165376 on behalf of https://github.com/seemethere due to This is failing for internal tests, see D84877379 for more context ([comment](https://github.com/pytorch/pytorch/pull/165376#issuecomment-3416570407 ))
2025-10-17 18:08:59 +00:00
Pian Pawakapan
45afaf08a1
[DebugMode][2/N] add nn.Module tracking ( #165498 )
...
Uses ModTracker to record nn.Module entries, much like CommDebugMode.
Can be switched on with `DebugMode(record_nn_module=True)`:
```
[nn.Mod] Bar
[nn.Mod] Bar.abc
[nn.Mod] Bar.abc.l1
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])
[nn.Mod] Bar.abc.l2
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])
[nn.Mod] Bar.xyz
aten::t(t: f32[4, 4])
aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])"""
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165498
Approved by: https://github.com/SherlockNoMad
ghstack dependencies: #165376
2025-10-17 17:39:48 +00:00
Pian Pawakapan
556fc09a9f
[DebugMode][1/N] refactor logs into _DebugCalls ( #165376 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165376
Approved by: https://github.com/SherlockNoMad
2025-10-16 22:43:52 +00:00
zpcore
512dd79ff0
[4/N] [DTensor device order] Support debugmode to show dtensor distribution transform path ( #164821 )
...
docker-builds / docker-build (pytorch-linux-jammy-aarch64-py3.10-gcc11-inductor-benchmarks, linux.arm64.m7g.4xlarge, 600) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-linter, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-linter, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-executorch, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-onnx, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang18-asan, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-halide, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-triton-cpu, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.13-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-1-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-riscv64-py3.12-gcc14, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
ossf-scorecard / Scorecards analysis (push) Has been cancelled
Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled
quantization-periodic / get-default-label-prefix (push) Has been cancelled
quantization-periodic / periodic-quantization-build (push) Has been cancelled
quantization-periodic / periodic-test-quantization (push) Has been cancelled
Enable the DebugMode to print out how `placements` and `shard_order` will update when we execute `transform_infos` to transform from source placement to target placement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164821
Approved by: https://github.com/SherlockNoMad , https://github.com/pianpwk
ghstack dependencies: #164806 , #164820
2025-10-11 09:44:54 +00:00
Dzmitry Huba
1e35b3c4e0
Augment DebugMode to support attributes reporting ( #165109 )
...
DebugMode reports tensor type, it shapes and placements while active. This change augments reporting to tensor attributes from configured set. This feature is intended to be used to ease understanding debug string when dealing with larger outputs. For example, before running forward pass of a model we can annotate each of parameters and buffers with their fully qualified names, so that we can see which ops are being executed against specific tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165109
Approved by: https://github.com/ezyang , https://github.com/pianpwk
2025-10-10 21:27:05 +00:00
Maggie Moss
086dec3235
Pyrefly suppressions 6/n ( #164877 )
...
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283
Almost there!
Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check
step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199
after:
INFO 0 errors (5,064 ignored)
Only four directories left to enable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877
Approved by: https://github.com/oulgen
2025-10-08 02:30:57 +00:00
Sherlock Huang
27eb36debb
DebugMode add ignore_compile_internals ( #164205 )
...
Fixes #164143
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164205
Approved by: https://github.com/albanD
2025-10-02 07:39:54 +00:00
Yuanyuan Chen
e30f01b5b5
[1/N] Simplify "in" operation for containers of a single item ( #164224 )
...
These issues are detected by ruff [FURB171](https://docs.astral.sh/ruff/rules/single-item-membership-test/#single-item-membership-test-furb171 ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164224
Approved by: https://github.com/rec , https://github.com/Skylion007
2025-09-30 19:59:43 +00:00
Sherlock Huang
f9821b1be7
DebugMode supports_higher_order_operators=True ( #163824 )
...
Make DebugMode supports HOP
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163824
Approved by: https://github.com/ydwu4
2025-09-25 17:11:43 +00:00
Sherlock Huang
4c2c401ccf
Record redistribute_local_tensor in DebugMode ( #163704 )
...
Explicit redistribute_local_tensor API call could also results in communication, record it!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163704
Approved by: https://github.com/ezyang
2025-09-24 16:11:26 +00:00
Sherlock Huang
95ac7d724e
Rename to _debug_mode.py to make it private ( #163534 )
...
rename debug_mode.py to _debug_mode.py to make it private, per @alban's request.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163534
Approved by: https://github.com/albanD
2025-09-23 04:27:10 +00:00