pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	f91899ca6c	[2/N] Add strict parameter to Python zip calls (#166257 ) This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257 Approved by: https://github.com/janeyx99	2025-11-01 00:35:41 +00:00
PyTorch MergeBot	f60751024e	Revert "[2/N] Add strict parameter to Python zip calls (#166257 )" This reverts commit `39e5cdddf7`. Reverted https://github.com/pytorch/pytorch/pull/166257 on behalf of https://github.com/atalman due to Failing: test/distributed/fsdp/test_fsdp_mixed_precision.py::TestFSDPTrainEval::test_train_ema_eval_flow [GH job link](https://github.com/pytorch/pytorch/actions/runs/18934047991/job/54057218160) [HUD commit link](`39e5cdddf7`) ([comment](https://github.com/pytorch/pytorch/pull/166257#issuecomment-3467955332))	2025-10-30 13:20:00 +00:00
Yuanyuan Chen	2de4cf2102	[1/N] Remove unused loop variables (#166258 ) This PR removes unused loop variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos	2025-10-30 12:22:25 +00:00
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
Yuanyuan Chen	39e5cdddf7	[2/N] Add strict parameter to Python zip calls (#166257 ) This PR adds `strict=True/False` to zip calls in test utils. strict=True is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166257 Approved by: https://github.com/janeyx99	2025-10-30 08:10:10 +00:00
PyTorch MergeBot	1dd6b76914	Revert "[1/N] Remove unused loop variables (#166258 )" This reverts commit `76b2c37045`. Reverted https://github.com/pytorch/pytorch/pull/166258 on behalf of https://github.com/atalman due to breaks test/distributed/test_serialization.py::TestSerialization::test_weights_only [GH job link](https://github.com/pytorch/pytorch/actions/runs/18894311802/job/53929321703) [HUD commit link](`76b2c37045`) ([comment](https://github.com/pytorch/pytorch/pull/166258#issuecomment-3460964612))	2025-10-29 11:10:37 +00:00
Yuanyuan Chen	76b2c37045	[1/N] Remove unused loop variables (#166258 ) This PR removes unused loop variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166258 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos	2025-10-29 01:34:15 +00:00
fduwjj	904abfc2ca	Export flex attention with kwargs and DTensor (#166045 ) Fixes #165948 Adding registration of the MaskBlock makes flex attention with kwargs exportable. Also modified unittests to accept kwargs ``` python test/distributed/tensor/test_dtensor_export.py -k test_flex_attention_dtensor_export python test/inductor/test_flex_attention.py -k test_pytree_ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166045 Approved by: https://github.com/drisspg, https://github.com/SherlockNoMad Co-authored-by: fduwjj <fduwjj@gmail.com>	2025-10-27 21:40:40 +00:00
Maggie Moss	8f80892359	Use correct pyrefly syntax in suppressions distributed/... (#166241 ) Updates the pyrefy-ignores in the torch/distributed directory to use the correct syntax. No functional changes. pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166241 Approved by: https://github.com/oulgen	2025-10-26 04:16:41 +00:00
Maggie Moss	eb83c3ca23	Clean up unused Pyrefly suppressions (#166178 ) Cleaning up ignores that are no longer needed in the repo and adding select suppressions so the main branch is clean. test plan: `lintrunner -a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166178 Approved by: https://github.com/oulgen	2025-10-25 05:32:21 +00:00
Howard Huang	b20deec3d1	[PP] Add optional argument to not save outputs (#165822 ) Fix https://github.com/pytorch/pytorch/issues/159251 Add an optional argument `return_outputs` to the schedule `step` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165822 Approved by: https://github.com/wconstab	2025-10-21 00:09:31 +00:00
Yuanyuan Chen	3255e7872b	Enable all flake8-logging-format rules (#164655 ) These rules are enabled by removing existing suppressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655 Approved by: https://github.com/janeyx99, https://github.com/mlazos	2025-10-19 00:59:28 +00:00
Yuanyuan Chen	e595136187	Enable PLC1802 on ruff (#165813 ) This PR enables ruff check `PLC1802`, which detects len calls on sequences in a boolean test context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165813 Approved by: https://github.com/ezyang	2025-10-18 05:44:14 +00:00
Howard Huang	bc1f2108d7	[PP] Update backward_counter and fsdp util to schedule class (#165513 ) Fixed one issue with FSDP last reshard not being called. Rest is mostly refactoring, changing some variables to be class variables so they can be used in https://github.com/pytorch/torchtitan/pull/1721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165513 Approved by: https://github.com/fegin	2025-10-15 21:58:16 +00:00
Timm Ruland	ffe3cb226a	In pipeline parallelism: Use same dtype for receive and send tensor when initializing p2p communication. (#165539 ) When initializing the p2p communication for pipeline parallelism, currently different default dtypes are used for the send and receive tensor here: `5c583e2573/torch/distributed/pipelining/stage.py (L935-L936)` This caused hard to trace issues when training on multiple nodes. Multiple stages on one node seem to work for some reason which probably caused the unit tests not to catch this. Fixes #165143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165539 Approved by: https://github.com/H-Huang	2025-10-15 15:05:55 +00:00
Yuanyuan Chen	b11593c31b	[8/N] Apply ruff UP035 rule (#165214 ) This is follow-up of #164653 to continue applying `UP035` fixes. The purpose is to finally enable this rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165214 Approved by: https://github.com/ezyang	2025-10-15 03:18:57 +00:00
Howard Huang	ca65023b90	[PP] Fix edge case with FSDP when stages_per_rank > 3 (#165467 ) There is an edge case with FSDP + PP when we add UNSHARD + RESHARD, we at max have 3 stages unsharded, `3f83e8915e/torch/distributed/pipelining/schedules.py (L1029-L1031)` This change is need to be able to unshard and reshard a stage multiple times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165467 Approved by: https://github.com/wwwjn	2025-10-15 01:53:04 +00:00
Yuanyuan Chen	fbe0d20a17	[2/N] More ruff SIM fixes (#165031 ) This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031 Approved by: https://github.com/mlazos	2025-10-14 14:22:54 +00:00
Chien-Chin Huang	6bda3bb286	[PP] Fix split_args_kwargs_into_chunks issues (#165306 ) 1. https://github.com/pytorch/pytorch/pull/164111/ adds the support of splitting BlockMask. But BlockMask actually has B=1 case that the BlockMask will be broadcast. This PR adds the support of B=1 case. 2. The original split_args_kwargs_into_chunks doesn't initialize the default specs correctly. Since we now use tree_flatten and tree_unflatten to do split, we should also use tree_map to initialize the default spec. This will actually support the case when the values are not torch.Tensor, which were only supported if users explicitly provide the shard spec. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165306 Approved by: https://github.com/H-Huang	2025-10-13 15:52:39 +00:00
Howard Huang	2beead7523	[PP] move FSDP reduce scatters to end of step (#165106 ) Move FSDP reduce scatters to the end of the PP step. The reduce scatter compute stream sync blocks the other stages from executing their backwards leading to bubbles. There should be a way to execute these RS earlier, but doing this for now as a quick fix. <img width="1056" height="463" alt="image" src="https://github.com/user-attachments/assets/b945dd55-8ab1-4acc-b862-c6e2e476b834" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165106 Approved by: https://github.com/weifengpy ghstack dependencies: #164976	2025-10-12 13:28:02 +00:00
Howard Huang	a3eb275d3c	Add torch compile check for ZeroBubble (#162511 ) Fix https://github.com/pytorch/pytorch/issues/161904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162511 Approved by: https://github.com/fegin	2025-10-10 18:49:45 +00:00
Yuanyuan Chen	fb64da0791	[2/N] Use "is" in python type comparison (#165142 ) This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142 Approved by: https://github.com/albanD	2025-10-10 15:36:44 +00:00
PyTorch MergeBot	b8be796a57	Revert "[2/N] More ruff SIM fixes (#165031 )" This reverts commit `38095fbd13`. Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870))	2025-10-10 13:42:14 +00:00
Howard Huang	238dd5517d	[PP] Move profiler record_function in schedule (#164976 ) Better engineering to move the `record_function` call to also encompass the custom callback, this line is the only change: https://github.com/pytorch/pytorch/pull/164976/files#diff-1d3d91f53db88fb886901fb178d69e47776e71b8103f85688fa9ca64cc55d068R2147, the rest is just formatting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164976 Approved by: https://github.com/fegin ghstack dependencies: #162016, #164962	2025-10-10 13:09:23 +00:00
Yuanyuan Chen	38095fbd13	[2/N] More ruff SIM fixes (#165031 ) This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031 Approved by: https://github.com/mlazos	2025-10-10 05:37:46 +00:00
Maggie Moss	7457d139c5	Add pyrefly suppressions to torch/distributed (7/n) (#165002 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 One more PR after this one. Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002 Approved by: https://github.com/oulgen	2025-10-09 04:08:25 +00:00
Howard Huang	005c3d449e	Support custom callback functions in schedule (#162016 ) This is going to be used in https://github.com/pytorch/torchtitan/issues/1682 Add a `register_custom_function` to the `_PipelineScheduleRuntime` which allows users to implement any custom function to replace the runtime operation dynamically. The signature of the callback should look like: ```python class _CustomFunctionProtocol(Protocol): def __call__(self, action: _Action, ctx: _PipelineContext) -> None: ... ``` `_PipelineContext` contains a reference to the schedule which is executing the operations. ### Testing Added a test which adds custom methods for `FORWARD` and `OVERLAP_F_B` which are just the same implementations as those used in the default schedule runtime. Check that the schedule can still run, numerics are correct, and the callbacks are executed the correct number of times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162016 Approved by: https://github.com/fegin	2025-10-08 20:43:26 +00:00
mingyuan.wang	0a3e4e894c	[PP]: Optimize memory by early releasing stage inputs' gradients (#164329 ) Seems that we can release input activations' gradients early in `stage_backward()` in PP, which helps to reduce the peak memory. I tested this using `1F1B` and `Interleaved1F1B` PP strategy (for simplicity, I use 4 decoder layers of llama3, set PP size to 2 and set num_microbatches to 128) based on torchtitan run command using torchtitan: ```bash CUDA_VISIBLE_DEVICES=4,5 LOG_RANK=0,1 NGPU=2 CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml ./run_train.sh --metrics.log_freq 1 --training.seq_len 8192 --training.steps 10 --parallelism.data_parallel_shard_degree 1 --activation_checkpoint.mode full --model.tokenizer_path /workspace/torchtitan-v0.1.0/torchtitan/torchtitan/datasets/tokenizer/original/tokenizer.model --tr aining.dataset wikipedia --parallelism.pipeline_parallel_degree 2 --training.local_batch_size 128 --parallelism.pipeline_parallel_microbatch_size 1 --training.dataset_path /workspace/wikipedia_subset --training.seed 42 --parallelism.pipeline_parallel_schedule 1F1B ``` ## 1F1B torchtitan train results ### before fix <img width="1526" height="606" alt="b8e281cce1dac15e827c216e7d83f402" src="https://github.com/user-attachments/assets/545c0a80-6276-40c0-893f-fd2df0a53b8d" /> ### after fix <img width="1526" height="594" alt="70d5ceba311a8398d041189bf8897cfc" src="https://github.com/user-attachments/assets/0d606e08-238a-4115-a1c0-b40df101d867" /> after fix, the memory usage on rank1, i.e., non first stages saving 6.9GB compare to before fix. the memory usage on rank0 remains unchanged (rank0 represents stage0) ## Interleaved1F1B torchtitan train results ### before fix <img width="1514" height="601" alt="a28b7f9704b9234870619c43194e8a72" src="https://github.com/user-attachments/assets/2c28565f-ffff-4747-a8f5-722b5c65dc7e" /> ### after fix <img width="1526" height="621" alt="2d8d6d956b72885186f8c7059146c41a" src="https://github.com/user-attachments/assets/8c4a4ff2-336b-4e0b-8ac4-014ae22c2ed1" /> after fix, the memory usage on rank1 saving 14.57GB (rank1 holds layer1 and layer3) and rank0 saving 7.5GB (rank0 holds layer0 and layer2) ## Memory snapshot results also, I have dumped the memory snapshot to observe the memory under the 1F1B PP strategy. ### before fix <img width="1906" height="918" alt="6fd4e4ba82b8bacf9ca6edee4f3d5581" src="https://github.com/user-attachments/assets/d1b9245c-b09f-43c5-87ce-87ba48533a70" /> we can see the memory is increasing as pp step_microbatches running. (the lifetime of input activation's gradient, i.e., the output of `FusedRMSNormBackward` lasts too long) ### after fix <img width="1903" height="918" alt="2e415f25af6750d06e5e647683b212b9" src="https://github.com/user-attachments/assets/b657c8f6-5a56-46bd-8743-f3b8375c81b0" /> after fix, we got more steady memory usage during training. (the input activation's gradient will be released or return allocator soon) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164329 Approved by: https://github.com/H-Huang	2025-10-08 16:12:00 +00:00
Howard Huang	c813617c53	[PP] Migrate other schedules to use PipelineScheduleRuntime (#164777 ) Second fix for https://github.com/pytorch/pytorch/issues/164756 This has been a TODO to make the all schedules execute using the same runtime. Now after this change, schedules will use the same logic for `_PipelineScheduleRuntime` where it adds `UNSHARD` and `RESHARD` operations to the schedules which fixes the issue mentioned above. <img width="920" height="406" alt="image" src="https://github.com/user-attachments/assets/a4d5bcd0-7dac-43cd-96f9-8ca33cfd8b91" /> A test is failing after the conversion: - Fixed a gradient scaling issue for dWeight Pull Request resolved: https://github.com/pytorch/pytorch/pull/164777 Approved by: https://github.com/fegin ghstack dependencies: #164775	2025-10-08 01:45:57 +00:00
Howard Huang	e659661ffa	[PP] Fix FSDP unshard/reshard (#164775 ) First fix for https://github.com/pytorch/pytorch/issues/164756 In the pipeline IR we call `UNSHARD` and `RESHARD`, but there is a bug because when we call `module.unshard()` these do not recursively call the FSDP modules, hence leading to sometime call allgather before the module forward. Since we want the pipeline IR to explicitly handle this, we can call `group.unshard` instead which ensures that all the modules are unsharded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164775 Approved by: https://github.com/weifengpy	2025-10-08 01:45:57 +00:00
Chien-Chin Huang	e3ae80fc03	[PP] Let PP split BlockMask into micro-BlockMask (#164111 ) BlockMask has batch dimension information. So PP has to split it as well just like all other tensors. All the tensors in BlockMask have the batch dimension, so we can just split it without too many issues. However, `mask_mod` requires the batch index as the input, which the value is going to be changed after the split. So we have to wrap it inside a closure to modify the batch index. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164111 Approved by: https://github.com/H-Huang	2025-10-07 23:25:34 +00:00
Howard Huang	65f10becdf	Support OVERLAP_F_B in schedule (#161072 ) Previously, we converted the overlap_f_b into separate forward and backward operations in the plan. This is a small change that includes it in the plan and handles it in the runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/161072 Approved by: https://github.com/fegin, https://github.com/wconstab	2025-10-07 19:55:10 +00:00
PyTorch MergeBot	5d7360bb03	Revert "Enable all SIM rules except disabled ones (#164645 )" This reverts commit `321e602692`. Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))	2025-10-05 19:32:21 +00:00
Yuanyuan Chen	321e602692	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang	2025-10-05 07:38:25 +00:00
Yuanyuan Chen	35c4130fd1	[2/N] Fix ruff warnings (#164460 ) Apply ruff `SIM` rules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164460 Approved by: https://github.com/ezyang	2025-10-04 03:40:32 +00:00
Anshul Sinha	3ffaab3bc8	[Replicate][Pipeline Parallelism] integration of new replicate function with pipeline parallelism (#164031 ) Summary: In order to test numerics for replicate + pp, stage.py needs to be able to call replicate's backward manually as pipeline parallelism doesn't have this feature. Test Case 1. pytest test/distributed/_composable/test_composability/test_pp_composability.py -k test_replicate_pp Pull Request resolved: https://github.com/pytorch/pytorch/pull/164031 Approved by: https://github.com/weifengpy, https://github.com/H-Huang ghstack dependencies: #163897	2025-10-01 18:01:16 +00:00
Ke Wen	e419dc6d08	[PP] Customize pipeline's submod name (#164037 ) Changing PP submodules' name from `submod_i` to `submod_pp_i` to distinguish from the submodule created by HOP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164037 Approved by: https://github.com/H-Huang ghstack dependencies: #164045, #164035	2025-10-01 16:29:19 +00:00
PyTorch MergeBot	36a37b81cd	Revert "[PP] Customize pipeline's submod name (#164037 )" This reverts commit `704cd771f6`. Reverted https://github.com/pytorch/pytorch/pull/164037 on behalf of https://github.com/yangw-dev due to internal build failed Buck build failed for this target, and is likely caused by your changes. ([comment](https://github.com/pytorch/pytorch/pull/164035#issuecomment-3357113348))	2025-10-01 16:09:50 +00:00
Yuanyuan Chen	a293206bd5	Fix invalid f-strings (#164112 ) Fixes invalid f-strings detected by `ruff`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164112 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-09-30 04:17:13 +00:00
Yuanyuan Chen	85012fe167	Remove unnecessary list comprehensions (#164103 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/164103 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos	2025-09-30 03:56:54 +00:00
Yuanyuan Chen	da003d7b95	[3/N] Import Callable from collections.abc in torch/distributed (#164104 ) This is the result of applying the ruff `UP035` check. `Callable` is imported from `collections.abc` instead of `typing`. This PR is the follow-up of #164054. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164104 Approved by: https://github.com/Skylion007	2025-09-30 00:28:53 +00:00
Ke Wen	704cd771f6	[PP] Customize pipeline's submod name (#164037 ) Changing PP submodules' name from `submod_i` to `submod_pp_i` to distinguish from the submodule created by HOP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164037 Approved by: https://github.com/H-Huang ghstack dependencies: #164045, #164035	2025-09-29 23:29:52 +00:00
Ke Wen	5ddad22196	[PP] Use default export mode (non-strict) (#164045 ) export's default mode has switched from strict to non-strict. We just follow suit in PP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164045 Approved by: https://github.com/H-Huang	2025-09-29 06:31:06 +00:00
can-gaa-hou	7c7ae86991	[Fix] Adding missing `f` prefixes to formatted strings [2/N] (#164066 ) As stated in the title. * #164068 * #164067 * __->__ #164066 * #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164066 Approved by: https://github.com/Skylion007	2025-09-29 04:40:44 +00:00
Timm Ruland	5fcde74aed	Fix pipeline parallelism not correctly initializing backwards stages when evaluating before training. (#162823 ) Previously, an eval() call before a training step() would not correctly initialize the backward pass of the pipeline stages, leading to errors during the subsequent training step. This PR ensures that the backward stages can still be initialized after an eval() call. Fixes #162822 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162823 Approved by: https://github.com/dcci, https://github.com/H-Huang	2025-09-25 15:13:19 +00:00
Howard Huang	9de22bc5da	Inspect schedule IR comms (#162996 ) Small change to util to allow us to see comms (e.g. `SEND`, `RECV`, etc.) in the schedule IR Pull Request resolved: https://github.com/pytorch/pytorch/pull/162996 Approved by: https://github.com/fegin	2025-09-16 16:59:06 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	de05dbc39c	Replace export_for_training with export (#162396 ) Summary: replace export_for_training with epxort Test Plan: CI Rollback Plan: Differential Revision: D81935792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162396 Approved by: https://github.com/angelayi, https://github.com/jerryzh168	2025-09-10 14:19:34 +00:00
Howard Huang	e1be887870	[PP] Add spacing to visualizer (#160474 ) When visualizing the schedules using `_PipelineScheduleExecution`, we don't provide any spacing between dependencies, so when visualizing `DualPipeV` it looks like this: <img width="3168" height="486" alt="image" src="https://github.com/user-attachments/assets/d2c881ad-4ee0-46b6-ac03-13e5600b5a55" /> While it has the correct order of operations, it does not show the dependencies correctly. As shown in the original implementation, it should look something like this: <img width="3542" height="384" alt="image" src="https://github.com/user-attachments/assets/c930fa98-848e-4951-a58b-c81f41092d14" /> This allows an option to add spacing to the visualizer, so it is easier to see dependencies. After change: <img width="3633" height="486" alt="image" src="https://github.com/user-attachments/assets/7708367e-bdb4-46e8-a7c4-f19e18047f59" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160474 Approved by: https://github.com/fegin	2025-09-09 17:52:52 +00:00
Avik Chaudhuri	711c8c821e	shape guards (#161178 ) Summary: This PR introduces shape guards to export. Previously only value ranges, equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program. Test Plan: updated several tests Rollback Plan: Differential Revision: D80713603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178 Approved by: https://github.com/tugsbayasgalan	2025-09-08 22:44:09 +00:00
Howard Huang	abc447174c	[PP] Add profiling to schedule execution (#160753 ) Profiling title will be `str(action)` <img width="1545" height="694" alt="image" src="https://github.com/user-attachments/assets/60b3506b-b8d6-4ae0-8b32-0d51d45fa2f0" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160753 Approved by: https://github.com/wconstab	2025-09-03 21:31:50 +00:00

1 2 3 4 5

230 Commits