Commit Graph

230 Commits

Author SHA1 Message Date
Yuanyuan Chen
f91899ca6c [2/N] Add strict parameter to Python zip calls (#166257)
This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257
Approved by: https://github.com/janeyx99
2025-11-01 00:35:41 +00:00
PyTorch MergeBot
f60751024e Revert "[2/N] Add strict parameter to Python zip calls (#166257)"
This reverts commit 39e5cdddf7.

Reverted https://github.com/pytorch/pytorch/pull/166257 on behalf of https://github.com/atalman due to Failing: test/distributed/fsdp/test_fsdp_mixed_precision.py::TestFSDPTrainEval::test_train_ema_eval_flow [GH job link](https://github.com/pytorch/pytorch/actions/runs/18934047991/job/54057218160) [HUD commit link](39e5cdddf7) ([comment](https://github.com/pytorch/pytorch/pull/166257#issuecomment-3467955332))
2025-10-30 13:20:00 +00:00
Yuanyuan Chen
2de4cf2102 [1/N] Remove unused loop variables (#166258)
This PR removes unused loop variables.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258
Approved by: https://github.com/Lucaskabela, https://github.com/mlazos
2025-10-30 12:22:25 +00:00
linhaifeng
369f2d6951 [3/N] fix typo in other folders (#166606)
fix typo in other folders

#166374
#166126

_typos.toml
```bash
[files]
extend-exclude = ["tools/linter/dictionary.txt"]
[default.extend-words]
nd = "nd"
arange = "arange"
Nd = "Nd"
GLOBALs = "GLOBALs"
hte = "hte"
iy = "iy"
PN = "PN"
Dout = "Dout"
optin = "optin"
gam = "gam"
PTD = "PTD"
Sur = "Sur"
nin = "nin"
tme = "tme"
inpt = "inpt"
mis = "mis"
Raison = "Raison"
ouput = "ouput"
nto = "nto"
Onwer = "Onwer"
callibrate = "callibrate"
ser = "ser"
Metdata = "Metdata"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606
Approved by: https://github.com/ezyang
2025-10-30 10:30:40 +00:00
Yuanyuan Chen
39e5cdddf7 [2/N] Add strict parameter to Python zip calls (#166257)
This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257
Approved by: https://github.com/janeyx99
2025-10-30 08:10:10 +00:00
PyTorch MergeBot
1dd6b76914 Revert "[1/N] Remove unused loop variables (#166258)"
This reverts commit 76b2c37045.

Reverted https://github.com/pytorch/pytorch/pull/166258 on behalf of https://github.com/atalman due to breaks test/distributed/test_serialization.py::TestSerialization::test_weights_only [GH job link](https://github.com/pytorch/pytorch/actions/runs/18894311802/job/53929321703) [HUD commit link](76b2c37045) ([comment](https://github.com/pytorch/pytorch/pull/166258#issuecomment-3460964612))
2025-10-29 11:10:37 +00:00
Yuanyuan Chen
76b2c37045 [1/N] Remove unused loop variables (#166258)
This PR removes unused loop variables.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258
Approved by: https://github.com/Lucaskabela, https://github.com/mlazos
2025-10-29 01:34:15 +00:00
fduwjj
904abfc2ca Export flex attention with kwargs and DTensor (#166045)
Fixes #165948

Adding registration of the MaskBlock makes flex attention with kwargs exportable.

Also modified unittests to accept kwargs

```
python test/distributed/tensor/test_dtensor_export.py -k test_flex_attention_dtensor_export

python test/inductor/test_flex_attention.py -k test_pytree_
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166045
Approved by: https://github.com/drisspg, https://github.com/SherlockNoMad

Co-authored-by: fduwjj <fduwjj@gmail.com>
2025-10-27 21:40:40 +00:00
Maggie Moss
8f80892359 Use correct pyrefly syntax in suppressions distributed/... (#166241)
Updates the pyrefy-ignores in the torch/distributed directory to use the correct syntax. No functional changes.

pyrefly check
lintrunner

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166241
Approved by: https://github.com/oulgen
2025-10-26 04:16:41 +00:00
Maggie Moss
eb83c3ca23 Clean up unused Pyrefly suppressions (#166178)
Cleaning up ignores that are no longer needed in the repo and adding select suppressions so the main branch is clean.

test plan:
`lintrunner -a`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166178
Approved by: https://github.com/oulgen
2025-10-25 05:32:21 +00:00
Howard Huang
b20deec3d1 [PP] Add optional argument to not save outputs (#165822)
Fix https://github.com/pytorch/pytorch/issues/159251

Add an optional argument `return_outputs` to the schedule `step`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165822
Approved by: https://github.com/wconstab
2025-10-21 00:09:31 +00:00
Yuanyuan Chen
3255e7872b Enable all flake8-logging-format rules (#164655)
These rules are enabled by removing existing suppressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655
Approved by: https://github.com/janeyx99, https://github.com/mlazos
2025-10-19 00:59:28 +00:00
Yuanyuan Chen
e595136187 Enable PLC1802 on ruff (#165813)
This PR enables ruff check `PLC1802`, which detects len calls on sequences in a boolean test context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165813
Approved by: https://github.com/ezyang
2025-10-18 05:44:14 +00:00
Howard Huang
bc1f2108d7 [PP] Update backward_counter and fsdp util to schedule class (#165513)
Fixed one issue with FSDP last reshard not being called.

Rest is mostly refactoring, changing some variables to be class variables so they can be used in https://github.com/pytorch/torchtitan/pull/1721

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165513
Approved by: https://github.com/fegin
2025-10-15 21:58:16 +00:00
Timm Ruland
ffe3cb226a In pipeline parallelism: Use same dtype for receive and send tensor when initializing p2p communication. (#165539)
When initializing the p2p communication for pipeline parallelism, currently different default dtypes are used for the send and receive tensor here:
5c583e2573/torch/distributed/pipelining/stage.py (L935-L936)

This caused hard to trace issues when training on multiple nodes. Multiple stages on one node seem to work for some reason which probably caused the unit tests not to catch this.

Fixes #165143

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165539
Approved by: https://github.com/H-Huang
2025-10-15 15:05:55 +00:00
Yuanyuan Chen
b11593c31b [8/N] Apply ruff UP035 rule (#165214)
This is follow-up of #164653 to continue applying `UP035` fixes. The purpose is to finally enable this rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165214
Approved by: https://github.com/ezyang
2025-10-15 03:18:57 +00:00
Howard Huang
ca65023b90 [PP] Fix edge case with FSDP when stages_per_rank > 3 (#165467)
There is an edge case with FSDP + PP when we add UNSHARD + RESHARD, we at max have 3 stages unsharded, 3f83e8915e/torch/distributed/pipelining/schedules.py (L1029-L1031)

This change is need to be able to unshard and reshard a stage multiple times.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165467
Approved by: https://github.com/wwwjn
2025-10-15 01:53:04 +00:00
Yuanyuan Chen
fbe0d20a17 [2/N] More ruff SIM fixes (#165031)
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-14 14:22:54 +00:00
Chien-Chin Huang
6bda3bb286 [PP] Fix split_args_kwargs_into_chunks issues (#165306)
1. https://github.com/pytorch/pytorch/pull/164111/ adds the support of splitting BlockMask. But BlockMask actually has B=1 case that the BlockMask will be broadcast. This PR adds the support of B=1 case.

2. The original split_args_kwargs_into_chunks doesn't initialize the default specs correctly. Since we now use tree_flatten and tree_unflatten to do split, we should also use tree_map to initialize the default spec. This will actually support the case when the values are not torch.Tensor, which were only supported if users explicitly provide the shard spec.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165306
Approved by: https://github.com/H-Huang
2025-10-13 15:52:39 +00:00
Howard Huang
2beead7523 [PP] move FSDP reduce scatters to end of step (#165106)
Move FSDP reduce scatters to the end of the PP step. The reduce scatter compute stream sync blocks the other stages from executing their backwards leading to bubbles. There should be a way to execute these RS earlier, but doing this for now as a quick fix.

<img width="1056" height="463" alt="image" src="https://github.com/user-attachments/assets/b945dd55-8ab1-4acc-b862-c6e2e476b834" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165106
Approved by: https://github.com/weifengpy
ghstack dependencies: #164976
2025-10-12 13:28:02 +00:00
Howard Huang
a3eb275d3c Add torch compile check for ZeroBubble (#162511)
Fix https://github.com/pytorch/pytorch/issues/161904

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162511
Approved by: https://github.com/fegin
2025-10-10 18:49:45 +00:00
Yuanyuan Chen
fb64da0791 [2/N] Use "is" in python type comparison (#165142)
This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142
Approved by: https://github.com/albanD
2025-10-10 15:36:44 +00:00
PyTorch MergeBot
b8be796a57 Revert "[2/N] More ruff SIM fixes (#165031)"
This reverts commit 38095fbd13.

Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870))
2025-10-10 13:42:14 +00:00
Howard Huang
238dd5517d [PP] Move profiler record_function in schedule (#164976)
Better engineering to move the `record_function` call to also encompass the custom callback, this line is the only change: https://github.com/pytorch/pytorch/pull/164976/files#diff-1d3d91f53db88fb886901fb178d69e47776e71b8103f85688fa9ca64cc55d068R2147, the rest is just formatting.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164976
Approved by: https://github.com/fegin
ghstack dependencies: #162016, #164962
2025-10-10 13:09:23 +00:00
Yuanyuan Chen
38095fbd13 [2/N] More ruff SIM fixes (#165031)
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-10 05:37:46 +00:00
Maggie Moss
7457d139c5 Add pyrefly suppressions to torch/distributed (7/n) (#165002)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

One more PR after this one.

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002
Approved by: https://github.com/oulgen
2025-10-09 04:08:25 +00:00
Howard Huang
005c3d449e Support custom callback functions in schedule (#162016)
This is going to be used in https://github.com/pytorch/torchtitan/issues/1682

Add a `register_custom_function` to the `_PipelineScheduleRuntime` which allows users to implement any custom function to replace the runtime operation dynamically.

The signature of the callback should look like:

```python
class _CustomFunctionProtocol(Protocol):
    def __call__(self, action: _Action, ctx: _PipelineContext) -> None: ...
```

`_PipelineContext` contains a reference to the schedule which is executing the operations.

### Testing

Added a test which adds custom methods for `FORWARD` and `OVERLAP_F_B` which are just the same implementations as those used in the default schedule runtime. Check that the schedule can still run, numerics are correct, and the callbacks are executed the correct number of times.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162016
Approved by: https://github.com/fegin
2025-10-08 20:43:26 +00:00
mingyuan.wang
0a3e4e894c [PP]: Optimize memory by early releasing stage inputs' gradients (#164329)
Seems that we can release input activations' gradients early in `stage_backward()` in PP, which helps to reduce the peak memory.

I tested this using `1F1B` and `Interleaved1F1B` PP strategy (for simplicity, I use 4 decoder layers of llama3, set PP size to 2 and set num_microbatches to 128)  based on torchtitan
run command using torchtitan:
```bash
CUDA_VISIBLE_DEVICES=4,5 LOG_RANK=0,1 NGPU=2 CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml ./run_train.sh --metrics.log_freq 1  --training.seq_len 8192 --training.steps 10 --parallelism.data_parallel_shard_degree 1 --activation_checkpoint.mode full --model.tokenizer_path /workspace/torchtitan-v0.1.0/torchtitan/torchtitan/datasets/tokenizer/original/tokenizer.model --tr
aining.dataset wikipedia  --parallelism.pipeline_parallel_degree 2  --training.local_batch_size 128 --parallelism.pipeline_parallel_microbatch_size 1 --training.dataset_path /workspace/wikipedia_subset --training.seed 42 --parallelism.pipeline_parallel_schedule 1F1B
```
## 1F1B torchtitan train results
### before fix
<img width="1526" height="606" alt="b8e281cce1dac15e827c216e7d83f402" src="https://github.com/user-attachments/assets/545c0a80-6276-40c0-893f-fd2df0a53b8d" />

### after fix
<img width="1526" height="594" alt="70d5ceba311a8398d041189bf8897cfc" src="https://github.com/user-attachments/assets/0d606e08-238a-4115-a1c0-b40df101d867" />

after fix, the memory usage on rank1, i.e., non first stages saving 6.9GB compare to before fix. the memory usage on rank0 remains unchanged (rank0 represents stage0)

## Interleaved1F1B torchtitan train results
### before fix
<img width="1514" height="601" alt="a28b7f9704b9234870619c43194e8a72" src="https://github.com/user-attachments/assets/2c28565f-ffff-4747-a8f5-722b5c65dc7e" />

### after fix
<img width="1526" height="621" alt="2d8d6d956b72885186f8c7059146c41a" src="https://github.com/user-attachments/assets/8c4a4ff2-336b-4e0b-8ac4-014ae22c2ed1" />

after fix, the memory usage on rank1 saving 14.57GB (rank1 holds layer1 and layer3) and rank0 saving 7.5GB (rank0 holds layer0 and layer2)

## Memory snapshot results
also, I have dumped the memory snapshot to observe the memory under the 1F1B PP strategy.

### before fix
<img width="1906" height="918" alt="6fd4e4ba82b8bacf9ca6edee4f3d5581" src="https://github.com/user-attachments/assets/d1b9245c-b09f-43c5-87ce-87ba48533a70" />

we can see the memory is increasing as pp step_microbatches running. (the lifetime of input activation's gradient, i.e., the output of `FusedRMSNormBackward`  lasts too long)

### after fix
<img width="1903" height="918" alt="2e415f25af6750d06e5e647683b212b9" src="https://github.com/user-attachments/assets/b657c8f6-5a56-46bd-8743-f3b8375c81b0" />

after fix, we got more steady memory usage during training. (the input activation's gradient will be released or return allocator soon)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164329
Approved by: https://github.com/H-Huang
2025-10-08 16:12:00 +00:00
Howard Huang
c813617c53 [PP] Migrate other schedules to use PipelineScheduleRuntime (#164777)
Second fix for https://github.com/pytorch/pytorch/issues/164756

This has been a TODO to make the all schedules execute using the same runtime. Now after this change, schedules will use the same logic for `_PipelineScheduleRuntime` where it adds `UNSHARD` and `RESHARD` operations to the schedules which fixes the issue mentioned above.

<img width="920" height="406" alt="image" src="https://github.com/user-attachments/assets/a4d5bcd0-7dac-43cd-96f9-8ca33cfd8b91" />

A test is failing after the conversion:
- Fixed a gradient scaling issue for dWeight

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164777
Approved by: https://github.com/fegin
ghstack dependencies: #164775
2025-10-08 01:45:57 +00:00
Howard Huang
e659661ffa [PP] Fix FSDP unshard/reshard (#164775)
First fix for https://github.com/pytorch/pytorch/issues/164756

In the pipeline IR we call `UNSHARD` and `RESHARD`,  but there is a bug because when we call `module.unshard()` these do not recursively call the FSDP modules, hence leading to sometime call allgather before the module forward.

Since we want the pipeline IR to explicitly handle this, we can call `group.unshard` instead which ensures that all the modules are unsharded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164775
Approved by: https://github.com/weifengpy
2025-10-08 01:45:57 +00:00
Chien-Chin Huang
e3ae80fc03 [PP] Let PP split BlockMask into micro-BlockMask (#164111)
BlockMask has batch dimension information. So PP has to split it as well just like all other tensors. All the tensors in BlockMask have the batch dimension, so we can just split it without too many issues. However, `mask_mod` requires the batch index as the input, which the value is going to be changed after the split. So we have to wrap it inside a closure to modify the batch index.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164111
Approved by: https://github.com/H-Huang
2025-10-07 23:25:34 +00:00
Howard Huang
65f10becdf Support OVERLAP_F_B in schedule (#161072)
Previously, we converted the overlap_f_b into separate forward and backward operations in the plan. This is a small change that includes it in the plan and handles it in the runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161072
Approved by: https://github.com/fegin, https://github.com/wconstab
2025-10-07 19:55:10 +00:00
PyTorch MergeBot
5d7360bb03 Revert "Enable all SIM rules except disabled ones (#164645)"
This reverts commit 321e602692.

Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))
2025-10-05 19:32:21 +00:00
Yuanyuan Chen
321e602692 Enable all SIM rules except disabled ones (#164645)
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang
2025-10-05 07:38:25 +00:00
Yuanyuan Chen
35c4130fd1 [2/N] Fix ruff warnings (#164460)
Apply ruff `SIM` rules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164460
Approved by: https://github.com/ezyang
2025-10-04 03:40:32 +00:00
Anshul Sinha
3ffaab3bc8 [Replicate][Pipeline Parallelism] integration of new replicate function with pipeline parallelism (#164031)
**Summary:** In order to test numerics for replicate + pp, stage.py needs to be able to call replicate's backward manually as pipeline parallelism doesn't have this feature.

**Test Case**
1.  pytest test/distributed/_composable/test_composability/test_pp_composability.py -k test_replicate_pp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164031
Approved by: https://github.com/weifengpy, https://github.com/H-Huang
ghstack dependencies: #163897
2025-10-01 18:01:16 +00:00
Ke Wen
e419dc6d08 [PP] Customize pipeline's submod name (#164037)
Changing PP submodules' name from `submod_i` to `submod_pp_i` to distinguish from the submodule created by HOP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164037
Approved by: https://github.com/H-Huang
ghstack dependencies: #164045, #164035
2025-10-01 16:29:19 +00:00
PyTorch MergeBot
36a37b81cd Revert "[PP] Customize pipeline's submod name (#164037)"
This reverts commit 704cd771f6.

Reverted https://github.com/pytorch/pytorch/pull/164037 on behalf of https://github.com/yangw-dev due to internal build failed Buck build failed for this target, and is likely caused by your changes. ([comment](https://github.com/pytorch/pytorch/pull/164035#issuecomment-3357113348))
2025-10-01 16:09:50 +00:00
Yuanyuan Chen
a293206bd5 Fix invalid f-strings (#164112)
Fixes invalid f-strings detected by `ruff`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164112
Approved by: https://github.com/Skylion007, https://github.com/mlazos
2025-09-30 04:17:13 +00:00
Yuanyuan Chen
85012fe167 Remove unnecessary list comprehensions (#164103)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164103
Approved by: https://github.com/Lucaskabela, https://github.com/mlazos
2025-09-30 03:56:54 +00:00
Yuanyuan Chen
da003d7b95 [3/N] Import Callable from collections.abc in torch/distributed (#164104)
This is the result of applying the ruff `UP035` check.
`Callable` is imported from `collections.abc` instead of `typing`.
This PR is the follow-up of #164054.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164104
Approved by: https://github.com/Skylion007
2025-09-30 00:28:53 +00:00
Ke Wen
704cd771f6 [PP] Customize pipeline's submod name (#164037)
Changing PP submodules' name from `submod_i` to `submod_pp_i` to distinguish from the submodule created by HOP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164037
Approved by: https://github.com/H-Huang
ghstack dependencies: #164045, #164035
2025-09-29 23:29:52 +00:00
Ke Wen
5ddad22196 [PP] Use default export mode (non-strict) (#164045)
export's default mode has switched from strict to non-strict. We just follow suit in PP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164045
Approved by: https://github.com/H-Huang
2025-09-29 06:31:06 +00:00
can-gaa-hou
7c7ae86991 [Fix] Adding missing f prefixes to formatted strings [2/N] (#164066)
As stated in the title.

* #164068
* #164067
* __->__ #164066
* #164065

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164066
Approved by: https://github.com/Skylion007
2025-09-29 04:40:44 +00:00
Timm Ruland
5fcde74aed Fix pipeline parallelism not correctly initializing backwards stages when evaluating before training. (#162823)
Previously, an eval() call before a training step() would not correctly initialize the backward pass of the pipeline stages, leading to errors during the subsequent training step. This PR ensures that the backward stages can still be initialized after an eval() call.

Fixes #162822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162823
Approved by: https://github.com/dcci, https://github.com/H-Huang
2025-09-25 15:13:19 +00:00
Howard Huang
9de22bc5da Inspect schedule IR comms (#162996)
Small change to util to allow us to see comms (e.g. `SEND`, `RECV`, etc.) in the schedule IR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162996
Approved by: https://github.com/fegin
2025-09-16 16:59:06 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
de05dbc39c Replace export_for_training with export (#162396)
Summary: replace export_for_training with epxort

Test Plan:
CI

Rollback Plan:

Differential Revision: D81935792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162396
Approved by: https://github.com/angelayi, https://github.com/jerryzh168
2025-09-10 14:19:34 +00:00
Howard Huang
e1be887870 [PP] Add spacing to visualizer (#160474)
When visualizing the schedules using `_PipelineScheduleExecution`, we don't provide any spacing between dependencies, so when visualizing `DualPipeV` it looks like this:

<img width="3168" height="486" alt="image" src="https://github.com/user-attachments/assets/d2c881ad-4ee0-46b6-ac03-13e5600b5a55" />

While it has the correct order of operations, it does not show the dependencies correctly. As shown in the original implementation, it should look something like this:

<img width="3542" height="384" alt="image" src="https://github.com/user-attachments/assets/c930fa98-848e-4951-a58b-c81f41092d14" />

This allows an option to add spacing to the visualizer, so it is easier to see dependencies. After change:

<img width="3633" height="486" alt="image" src="https://github.com/user-attachments/assets/7708367e-bdb4-46e8-a7c4-f19e18047f59" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160474
Approved by: https://github.com/fegin
2025-09-09 17:52:52 +00:00
Avik Chaudhuri
711c8c821e shape guards (#161178)
Summary: This PR introduces shape guards to export. Previously only value ranges,  equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program.

Test Plan:
updated several tests

Rollback Plan:

Differential Revision: D80713603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178
Approved by: https://github.com/tugsbayasgalan
2025-09-08 22:44:09 +00:00
Howard Huang
abc447174c [PP] Add profiling to schedule execution (#160753)
Profiling title will be `str(action)`

<img width="1545" height="694" alt="image" src="https://github.com/user-attachments/assets/60b3506b-b8d6-4ae0-8b32-0d51d45fa2f0" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160753
Approved by: https://github.com/wconstab
2025-09-03 21:31:50 +00:00