Commit Graph

26 Commits

Author SHA1 Message Date
Howard Huang
b0c3d39e0d [pipelining] Update tutorials and documentation (#143045)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143045
Approved by: https://github.com/wconstab, https://github.com/kwen2501
2024-12-12 18:42:17 +00:00
Howard Huang
88154024b3 [pipelining] Add ZBV schedule (#142084)
Adds ZBV schedule which is explained in https://arxiv.org/pdf/2401.10241, Section 6. Tested it works under the new PipelineScheduleRuntime by fixing a small bug in handling V-shaped schedules. This PR is a replacement for https://github.com/pytorch/pytorch/pull/138444

cc the original authors: @QPHutu @ufotalent https://github.com/pytorch/pytorch/pull/138444#issuecomment-2472684977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142084
Approved by: https://github.com/kwen2501
2024-12-11 02:00:57 +00:00
Howard Huang
75109682b6 [Pipelining] Refactor Interleaved1F1B and ZeroBubble (#137783)
NOTE: this PR removes `ScheduleFlexibleInterleaved1F1B`, let me know if theres any concerns.

`ScheduleFlexibleInterleaved1F1B` is a superset of `Interleaved1F1B` and uses most of the same implementation, but relaxes the condition that `n_microbatches % pp_size == 0`. This is refactors the implementation into `Interleaved1F1B` and then removes it since it is confusing to have both schedules with similar names. This also refactors the zero bubble logic to belong in the `ZeroBubble` schedule class.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137783
Approved by: https://github.com/wconstab
2024-10-16 03:05:14 +00:00
Howard Huang
108a75b454 [PP] Add ZeroBubble schedule (#133467)
Zero bubble can be expressed through `ScheduleFlexibleInterleaved1F1B` by setting `enable_zero_bubble=True`. But instead of having to include this flag in schedule initialization we should create a separate ZeroBubbleSchedule and also transition `Interleaved1F1B` to derive from `ScheduleFlexibleInterleaved1F1B`. Then we dont need to expose `ScheduleFlexibleInterleaved1F1B` since the naming is not obvious

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133467
Approved by: https://github.com/wconstab
ghstack dependencies: #132691
2024-08-22 13:32:15 +00:00
Wouter Devriendt
e8645fa2b9 [Doc] fix some typos (found by codespell and typos) (#132544)
Applying doc fixes from PR https://github.com/pytorch/pytorch/pull/127267 - with CLA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132544
Approved by: https://github.com/kit1980
2024-08-05 17:21:56 +00:00
PyTorch MergeBot
eb9409511e Revert "support zb1p and zb2p algorithms (#130752)"
This reverts commit 8fe5b93667.

Reverted https://github.com/pytorch/pytorch/pull/130752 on behalf of https://github.com/atalman due to Broke Periodic CI: distributed/pipelining/test_composability.py::ComposabilityTest::test_manual_with_data_parallel_dp_type_DDP_ScheduleClass4 [GH job link](https://github.com/pytorch/pytorch/actions/runs/10131472868/job/28014900187) [HUD commit link](8fe5b93667) ([comment](https://github.com/pytorch/pytorch/pull/130752#issuecomment-2255819078))
2024-07-29 12:40:00 +00:00
Haoci Zhang
8fe5b93667 support zb1p and zb2p algorithms (#130752)
Previously, we have proved that ZB2P is not truly zero bubble when num_local_stages exceed 4 and so only ZB1P was supported.

We did a few tweaks to the ZB2P to really make it zero bubble. Algorithm and proof is attached.
[zero_bubble.pdf](https://github.com/user-attachments/files/16238738/zero_bubble.pdf)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130752
Approved by: https://github.com/H-Huang
2024-07-24 17:58:46 +00:00
Howard Huang
4eb449f7dc [pipelining] add small logging section to docs (#129368)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129368
Approved by: https://github.com/wconstab
2024-07-02 18:19:28 +00:00
Haoci Zhang
1ad683033b Implemented flexible PP schedule (#129597)
Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have

num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support

pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0.
pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0.

Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129)

Tested using the config in (1), schedule looks like the following graph:

```
=========== ALL_RANK_ACTIONS ===========
         Rank 0  Rank 1  Rank 2  Rank 3
Step 00: F0_s0   None    None    None
Step 01: F1_s0   F0_s1   None    None
Step 02: F2_s0   F1_s1   F0_s2   None
Step 03: F3_s0   F2_s1   F1_s2   F0_s3
Step 04: F4_s0   F3_s1   F2_s2   F1_s3
Step 05: F0_s4   F4_s1   F3_s2   F2_s3
Step 06: F1_s4   F0_s5   F4_s2   F3_s3
Step 07: F2_s4   F1_s5   F0_s6   F4_s3
Step 08: F3_s4   F2_s5   F1_s6   F0_s7
Step 09: F4_s4   F3_s5   None    B0_s7
Step 10: F5_s0   None    F2_s6   F1_s7
Step 11: None    None    B0_s6   B1_s7
Step 12: None    F4_s5   F3_s6   F2_s7
Step 13: None    B0_s5   B1_s6   B2_s7
Step 14: F6_s0   F5_s1   F4_s6   F3_s7
Step 15: B0_s4   B1_s5   B2_s6   B3_s7
Step 16: F7_s0   F6_s1   F5_s2   F4_s7
Step 17: B1_s4   B2_s5   B3_s6   B4_s7
Step 18: F8_s0   F7_s1   F6_s2   F5_s3
Step 19: B2_s4   B3_s5   B4_s6   B0_s3
Step 20: F9_s0   F8_s1   F7_s2   F6_s3
Step 21: B3_s4   B4_s5   B0_s2   B1_s3
Step 22: F5_s4   F9_s1   F8_s2   F7_s3
Step 23: B4_s4   B0_s1   B1_s2   B2_s3
Step 24: F6_s4   F5_s5   F9_s2   F8_s3
Step 25: B0_s0   B1_s1   B2_s2   B3_s3
Step 26: F7_s4   F6_s5   F5_s6   F9_s3
Step 27: B1_s0   B2_s1   B3_s2   B4_s3
Step 28: F8_s4   F7_s5   F6_s6   F5_s7
Step 29: B2_s0   B3_s1   B4_s2   B5_s7
Step 30: F9_s4   F8_s5   F7_s6   F6_s7
Step 31: B3_s0   B4_s1   B5_s6   B6_s7
Step 32: None    F9_s5   F8_s6   F7_s7
Step 33: B4_s0   B5_s5   B6_s6   B7_s7
Step 34: None    None    F9_s6   F8_s7
Step 35: B5_s4   B6_s5   B7_s6   B8_s7
Step 36: None    None    None    F9_s7
Step 37: B6_s4   B7_s5   B8_s6   B9_s7
Step 38: None    None    None    None
Step 39: B7_s4   B8_s5   B9_s6   B5_s3
Step 40: None    None    None    None
Step 41: B8_s4   B9_s5   B5_s2   B6_s3
Step 42: None    None    None    None
Step 43: B9_s4   B5_s1   B6_s2   B7_s3
Step 44: None    None    None    None
Step 45: B5_s0   B6_s1   B7_s2   B8_s3
Step 46: None    None    None    None
Step 47: B6_s0   B7_s1   B8_s2   B9_s3
Step 48: None    None    None
Step 49: B7_s0   B8_s1   B9_s2
Step 50: None    None
Step 51: B8_s0   B9_s1
Step 52: None
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597
Approved by: https://github.com/H-Huang
2024-07-02 07:54:38 +00:00
Ke Wen
fe39c07826 [pipelining][doc] Remove duplicated words (#128368)
"for execution" is used in both step titles

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128368
Approved by: https://github.com/wconstab
ghstack dependencies: #128361
2024-06-11 04:52:57 +00:00
Ke Wen
4077cdd589 [pipelining][doc] Update arg list of pipeline API (#128361)
And document the use of `build_stage` API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128361
Approved by: https://github.com/wconstab
2024-06-11 02:55:17 +00:00
Ke Wen
613c7d270d [pipelining] Format doc (#128279)
- Should use two dots around `var`
- Wrap lines
- Add section cross ref
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128279
Approved by: https://github.com/H-Huang
ghstack dependencies: #128273, #128278
2024-06-08 04:59:04 +00:00
Ke Wen
2e42671619 [pipelining] Rename to stage.py and schedules.py (#128278)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128278
Approved by: https://github.com/H-Huang
ghstack dependencies: #128273
2024-06-08 04:42:35 +00:00
Ke Wen
0e3fe694d1 [pipelining] Restore a stage constructor for tracer path (#128273)
In case user modified stage module out of place, such as
mod = DDP(mod)
mod = torch.compile(mod)

They need a stage builder else than `pipe.build_stage()`.

This PR provides an API to do so:
```
def build_stage(
  stage_module,
  stage_index,
  pipe.info(),
  ...
)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128273
Approved by: https://github.com/wconstab
2024-06-08 04:42:35 +00:00
Will Constable
f9508b4c1f [pipelining] Update Pipelining Docs (#128236)
----

- Bring PipelineStage/Schedule more front-and-center
- provide details on how to manually construct PipelineStage
- move tracer example and manual example below so the high-level flow
  (e2e) is closer to the top
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128236
Approved by: https://github.com/H-Huang
ghstack dependencies: #128201, #128228
2024-06-08 02:03:46 +00:00
Ke Wen
ad96f991a5 [pipelining] Add pipe.build_stage() (#128240)
Given `PipelineStage` name to manual side.
Thus adding a method under `Pipe` to create PipelineStage.
Moved `PipeInfo` to utils.py to avoid circular dependency between `_IR` and `PipelineStage`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128240
Approved by: https://github.com/wconstab, https://github.com/H-Huang
2024-06-08 01:26:02 +00:00
Howard Huang
bef586111a [pipelining] pipelining.rst updates (#128228)
fix some nits and add `PipelineStage` (manual)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128228
Approved by: https://github.com/wconstab
ghstack dependencies: #128201
2024-06-07 23:29:54 +00:00
Ke Wen
3090667cf9 [pipelining] pipeline() taking microbatch as example input (#128163)
Changed the API of `pipeline()` to take microbatch instead of full batch as example args.

Main purpose is to:
- make this API more atomic;
- decouple tracing frontend from runtime info like `num_chunks`.

Side effects:
- Creates opportunity for varying `num_chunks` of schedules with the same `pipe` object.
- User has to create example microbatch input.
- Chunk spec stuff are now all moved to runtime side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128163
Approved by: https://github.com/H-Huang
2024-06-07 15:51:53 +00:00
Howard Huang
543a870943 [pipelining] Rename ManualPipelineStage -> PipelineStage (#128157)
Renaming ManualPipelineStage to remove the "Manual" part. I needed to replace the existing `PipelineStage` which takes in the `pipe` argument, so I have renamed that to `TracerPipelineStage`. @kwen2501 will remove this entirely in favor of adding a util to `Pipe` to just create the stage directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128157
Approved by: https://github.com/wconstab
2024-06-07 09:24:16 +00:00
Ke Wen
01601ebd41 Retire torch.distributed.pipeline (#127354)
Actually retiring module after deprecation warning for a while.
The new supported module is: torch.distributed.pipelining.
Please migrate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127354
Approved by: https://github.com/wconstab
2024-06-07 08:11:58 +00:00
Ke Wen
96806b1777 [pipelining][doc] Add frontend description and change tracer example (#128070)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128070
Approved by: https://github.com/wconstab, https://github.com/H-Huang
2024-06-07 04:09:36 +00:00
Ke Wen
403012b50a [pipelining] expose APIs per pytorch rule (#126812)
Rule is enforced by #126103.

The rule:
- If `torch.a.b` defines a public class `C` (i.e. to be exposed in torch API namespace), then `torch.a.b` must be a public path, i.e. no `_`.
- `torch.a.b` should ideally have an `__all__` that defines what should be imported from this file when it is imported.
- All other definitions in `torch.a.b` that you don't want to expose should have a `_` prefix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126812
Approved by: https://github.com/wconstab
2024-05-22 16:21:13 +00:00
Ke Wen
07d6ab5aa2 [pipelining] Add pipeline schedules (#125975)
1. Add pipeline schedules:
- GPipe
- 1F1B
- Interleaved 1F1B
- LoopedBFS

2. Add basic forward and backward tests:
test_schedule.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125975
Approved by: https://github.com/wconstab
ghstack dependencies: #125729
2024-05-11 21:17:53 +00:00
Ke Wen
5cd7c75bd9 [pipelining] Add tracing frontend (#125448)
This PR allows user to transform a model into a pipeline representation with split stages, according to a split spec.
```
def pipeline(
    module: torch.nn.Module,
    num_chunks: int,
    example_args: Tuple[Any, ...],
    example_kwargs: Optional[Dict[str, Any]] = None,
    split_spec: Optional[Dict[str, SplitPoint]] = None,
    split_policy: Optional[Callable[[fx.GraphModule], fx.GraphModule]] = None,
) -> Pipe:
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125448
Approved by: https://github.com/H-Huang
ghstack dependencies: #125273
2024-05-04 09:00:25 +00:00
Ke Wen
0199ce8d6c [pipelining] Add microbatch split and merge utils (#125273)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125273
Approved by: https://github.com/H-Huang
ghstack dependencies: #124776, #124875, #124958
2024-05-02 21:09:47 +00:00
Ke Wen
52142192d4 [pipelining] Add stage backward function (#124958)
This is a helper function which:
1. computes the gradients for the stage inputs, and
2. accumulates gradients for the stage module's parameters.

A unit test for this function is also added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124958
Approved by: https://github.com/wconstab
ghstack dependencies: #124776, #124875
2024-05-01 07:56:58 +00:00