pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Haoci Zhang 1ad683033b Implemented flexible PP schedule (#129597 ) Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0. pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0. Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129) Tested using the config in (1), schedule looks like the following graph: ``` =========== ALL_RANK_ACTIONS =========== Rank 0 Rank 1 Rank 2 Rank 3 Step 00: F0_s0 None None None Step 01: F1_s0 F0_s1 None None Step 02: F2_s0 F1_s1 F0_s2 None Step 03: F3_s0 F2_s1 F1_s2 F0_s3 Step 04: F4_s0 F3_s1 F2_s2 F1_s3 Step 05: F0_s4 F4_s1 F3_s2 F2_s3 Step 06: F1_s4 F0_s5 F4_s2 F3_s3 Step 07: F2_s4 F1_s5 F0_s6 F4_s3 Step 08: F3_s4 F2_s5 F1_s6 F0_s7 Step 09: F4_s4 F3_s5 None B0_s7 Step 10: F5_s0 None F2_s6 F1_s7 Step 11: None None B0_s6 B1_s7 Step 12: None F4_s5 F3_s6 F2_s7 Step 13: None B0_s5 B1_s6 B2_s7 Step 14: F6_s0 F5_s1 F4_s6 F3_s7 Step 15: B0_s4 B1_s5 B2_s6 B3_s7 Step 16: F7_s0 F6_s1 F5_s2 F4_s7 Step 17: B1_s4 B2_s5 B3_s6 B4_s7 Step 18: F8_s0 F7_s1 F6_s2 F5_s3 Step 19: B2_s4 B3_s5 B4_s6 B0_s3 Step 20: F9_s0 F8_s1 F7_s2 F6_s3 Step 21: B3_s4 B4_s5 B0_s2 B1_s3 Step 22: F5_s4 F9_s1 F8_s2 F7_s3 Step 23: B4_s4 B0_s1 B1_s2 B2_s3 Step 24: F6_s4 F5_s5 F9_s2 F8_s3 Step 25: B0_s0 B1_s1 B2_s2 B3_s3 Step 26: F7_s4 F6_s5 F5_s6 F9_s3 Step 27: B1_s0 B2_s1 B3_s2 B4_s3 Step 28: F8_s4 F7_s5 F6_s6 F5_s7 Step 29: B2_s0 B3_s1 B4_s2 B5_s7 Step 30: F9_s4 F8_s5 F7_s6 F6_s7 Step 31: B3_s0 B4_s1 B5_s6 B6_s7 Step 32: None F9_s5 F8_s6 F7_s7 Step 33: B4_s0 B5_s5 B6_s6 B7_s7 Step 34: None None F9_s6 F8_s7 Step 35: B5_s4 B6_s5 B7_s6 B8_s7 Step 36: None None None F9_s7 Step 37: B6_s4 B7_s5 B8_s6 B9_s7 Step 38: None None None None Step 39: B7_s4 B8_s5 B9_s6 B5_s3 Step 40: None None None None Step 41: B8_s4 B9_s5 B5_s2 B6_s3 Step 42: None None None None Step 43: B9_s4 B5_s1 B6_s2 B7_s3 Step 44: None None None None Step 45: B5_s0 B6_s1 B7_s2 B8_s3 Step 46: None None None None Step 47: B6_s0 B7_s1 B8_s2 B9_s3 Step 48: None None None Step 49: B7_s0 B8_s1 B9_s2 Step 50: None None Step 51: B8_s0 B9_s1 Step 52: None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597 Approved by: https://github.com/H-Huang		2024-07-02 07:54:38 +00:00
..
_static	Clean up distributed/CONTRIBUTING.md (#128450 )	2024-06-22 02:41:22 +00:00
_templates	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
community	[ONNX] Update 'person_of_interest.rst', 'CODEOWNERS' and 'merge_rules.yaml' (#126364 )	2024-06-16 04:52:16 +00:00
elastic	DOC: add docstring to construct_and_record_rdzv_event() (#128189 )	2024-06-10 22:17:33 +00:00
notes	[docs] Redirect custom ops landing page to the correct place (#129177 )	2024-06-21 13:31:32 +00:00
rpc
scripts	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )"	2024-06-29 00:47:15 +00:00
amp.rst	add xpu for amp (#127276 )	2024-06-20 21:49:35 +00:00
autograd.rst	Add torch.library.register_autograd (#124071 )	2024-04-18 12:47:59 +00:00
backends.rst	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 )	2024-06-30 19:22:16 +00:00
benchmark_utils.rst	Adding Compare in torch.utils.benchmark documentation (#125009 )	2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst	[checkpoint] Clean up selective activation checkpoint and make public (#125795 )	2024-06-18 18:18:50 +00:00
complex_numbers.rst	Document complex optimizer semantic behavior (#121667 )	2024-03-16 00:43:47 +00:00
cond.rst	Fix typo under docs directory (#119657 )	2024-02-15 21:14:34 +00:00
conf.py	Remove Caffe2 handling from onnx_unpack_quantized_weights (#129021 )	2024-06-21 06:16:44 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst	Add current_device() to torch.cpu (#110987 )	2023-10-11 05:13:10 +00:00
cuda_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst	Created docs (and example) for cudart function in torch.cuda (#128741 )	2024-06-17 16:50:37 +00:00
cuda.tunable.rst	[ROCm] TunableOp improvements (#124362 )	2024-06-03 22:30:11 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst	Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131 )"	2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst	[DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721 )	2023-06-10 00:15:00 +00:00
debugging_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst	[DCP] Provides default AsyncStager (#124939 )	2024-05-02 19:48:54 +00:00
distributed.elastic.rst	Reapply "distributed debug handlers (#126601 )" (#127805 )	2024-06-04 19:44:30 +00:00
distributed.optim.rst
distributed.pipelining.rst	Implemented flexible PP schedule (#129597 )	2024-07-02 07:54:38 +00:00
distributed.rst	Retire torch.distributed.pipeline (#127354 )	2024-06-07 08:11:58 +00:00
distributed.tensor.parallel.rst	[tp] doc fixes (#121431 )	2024-03-08 17:46:44 +00:00
distributions.rst	Add inverse gamma distribution and fix `sign` bug in `PowerTransform`. (#104501 )	2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Remove torch._export.export (#119095 )	2024-02-08 21:22:04 +00:00
export.rst	[export] minor typo fix (#129543 )	2024-06-26 18:35:31 +00:00
fft.rst
fsdp.rst	[FSDP][state_dict] Expose optimizer state_dict config (#105949 )	2023-08-21 07:29:49 +00:00
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst	Add swap_tensors path to nn.Module._apply (#117167 )	2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst	Traceable wrapper subclass support for deferred runtime asserts (#126198 )	2024-05-21 01:21:46 +00:00
fx.rst	Implement Graph Transform Observer (#127427 )	2024-06-02 06:49:47 +00:00
hub.rst
index.rst	Retire torch.distributed.pipeline (#127354 )	2024-06-07 08:11:58 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst
jit_language_reference.rst
jit_python_reference.rst
jit_unsupported.rst	Add support for `torch.Generator` type in TorchScript (#110413 )	2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
library.rst	New Custom Ops Documentation landing page (#127400 )	2024-05-30 01:06:04 +00:00
linalg.rst
logging.rst	Change classification to beta for TORCH_LOGS (#118682 )	2024-01-31 21:50:55 +00:00
masked.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
math-quantizer-equation.png
meta.rst	Add documentation for meta device (#119119 )	2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
mobile_optimizer.rst
model_zoo.rst
module_tracker.rst	Add module tracker (#125352 )	2024-05-04 18:33:35 +00:00
monitor.rst
mps_environment_variables.rst	[MPS] Fast math env var (#129007 )	2024-06-25 13:52:07 +00:00
mps.rst	Add support in Python API for the recommended max working set size. (#128289 )	2024-06-12 16:03:57 +00:00
mtia.rst	[MTIA] Add set_device support (#128040 )	2024-06-10 23:42:52 +00:00
multiprocessing.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
name_inference.rst	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 )	2023-08-29 22:13:33 +00:00
named_tensor.rst	fixing named tensor unflatten example (#106921 )	2023-08-22 18:00:10 +00:00
nested.rst
nn.attention.bias.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.attention.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.functional.rst	Add RMSNorm module (#121364 )	2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst	Cleanup some duplicated placeholder py:module docs (#123244 )	2024-04-05 03:18:53 +00:00
onnx_dynamo_onnxruntime_backend.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx_dynamo.rst	[ez][doc] Fix sample code in onnx_dynamo.rst (#114770 )	2023-11-29 19:27:52 +00:00
onnx_torchscript_supported_aten_ops.rst	Refactor torch.onnx documentation (#108379 )	2023-09-08 18:23:48 +00:00
onnx_torchscript.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx.rst	fix pytorch version for onnx in doc (#124182 )	2024-04-17 18:05:15 +00:00
optim.rst	[MPS] Fused SGD optimizer (#129350 )	2024-06-27 04:37:14 +00:00
package.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
profiler.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst	[quant][pt2e] Add `model_is_exported` util function (#119726 )	2024-02-16 19:29:36 +00:00
quantization.rst	Cleanup some duplicated placeholder py:module docs (#123244 )	2024-04-05 03:18:53 +00:00
random.rst
rpc.rst	[BE] RPC is missing RRef docs (#106902 )	2023-08-10 16:26:27 +00:00
signal.rst
size.rst	Added a docstring for torch.Size.numel. (#124186 )	2024-04-19 09:23:02 +00:00
sparse.rst	Fix typo in sparse.rst (#121826 )	2024-03-19 00:17:19 +00:00
special.rst
storage.rst
tensor_attributes.rst	Include the scalar tensor auto-transfer in the doc (#119967 )	2024-02-15 22:37:39 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	add xpu to torch.tensors (#127280 )	2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch_environment_variables.rst	[Docs][MPS] Add mps environment variable table (#129008 )	2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst	[c10d][doc] add a doc page for NCCL ENVs (#128235 )	2024-06-09 16:08:38 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor.rst	[AOTI] docs: add suggestion to turn on freezing on CPU (#128010 )	2024-06-07 08:57:02 +00:00
torch.compiler_api.rst	[torch.export] Support is_compiling() flag for non-strict mode (#119602 )	2024-02-29 05:52:51 +00:00
torch.compiler_best_practices_for_backends.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_cudagraph_trees.rst	[CUDAGraph] add more docs for cudagraph trees (#127963 )	2024-06-18 02:07:07 +00:00
torch.compiler_custom_backends.rst	Fix a link in the compiler backend doc (#126079 )	2024-05-21 20:16:04 +00:00
torch.compiler_dynamic_shapes.rst	feat: Add min, max ranges to mark_dynamic API (#119737 )	2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst	Fix links rendering when surrounding code in Dynamo deepdive (#123427 )	2024-04-13 04:55:15 +00:00
torch.compiler_dynamo_overview.rst	Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431 )	2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_faq.rst	Fixed broken link and removed unfinished sentence from issue #126367 (#127938 )	2024-06-05 07:37:32 +00:00
torch.compiler_fine_grain_apis.rst	[torch.export] Support is_compiling() flag for non-strict mode (#119602 )	2024-02-29 05:52:51 +00:00
torch.compiler_get_started.rst	add xpu to torch.compile (#127279 )	2024-06-13 21:15:09 +00:00
torch.compiler_inductor_profiling.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_ir.rst	[export] torch.export landing page (#108783 )	2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 )	2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst	Restructure torch.compile docs (#105376 )	2023-07-28 20:58:57 +00:00
torch.compiler_profiling_torch_compile.rst	[docs] Update PT2+Profiler docs (#122272 )	2024-03-28 17:52:28 +00:00
torch.compiler_transformations.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting.rst	Add force_disable_caches to the docs (#126184 )	2024-05-15 07:16:08 +00:00
torch.compiler.rst	add xpu to torch.compile (#127279 )	2024-06-13 21:15:09 +00:00
torch.overrides.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
torch.rst	torch.mtia module for MTIA device backend (#123612 )	2024-04-26 16:17:54 +00:00
type_info.rst
utils.rst	New swap function (#111747 )	2023-12-08 18:49:35 +00:00
xpu.rst	[2/2] Intel GPU Runtime Upstreaming for Generator (#118613 )	2024-02-28 05:28:11 +00:00