pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Andrew Gu c932b39739 [FSDP2] Added `_set_unshard_async_op` (#135523 ) This PR adds a private API `_set_unshard_async_op` that allows for running pre-forward and pre-backward all-gathers using the `async_op=True` path so that all-gather allocations happen in the default stream to avoid inter-stream fragmentation. If using this option, forward requires explicit prefetching e.g. via the `unshard(async_op=True)` API for overlap. fp32 -> bf16 casts and the all-gather copy-in will not overlap with compute. Differential Revision: [D62401551](https://our.internmc.facebook.com/intern/diff/D62401551) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135523 Approved by: https://github.com/weifengpy		2024-09-10 19:28:02 +00:00
..
_composable	[FSDP2] Added `_set_unshard_async_op` (#135523 )	2024-09-10 19:28:02 +00:00
_shard	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 )	2024-08-15 15:50:19 +00:00
_sharded_tensor
_sharding_spec
_symmetric_memory	[micro_pipeline_tp] support all _scaled_mm args (#131984 )	2024-08-05 21:44:37 +00:00
_tensor	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
_tools	Runtime Estimator for estimating GPU compute time (#134243 )	2024-08-28 20:06:54 +00:00
algorithms	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 )	2024-08-15 15:50:19 +00:00
autograd
benchmarks	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 )	2024-08-15 15:50:19 +00:00
checkpoint	[DCP] Fixes the stateless optimizer issue of distributed state_dict (#135535 )	2024-09-10 03:10:00 +00:00
elastic	[elastic] support local_addr across all rendezvous impls (#135262 )	2024-09-06 17:55:43 +00:00
examples
fsdp	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
launcher
nn	Revert "added persistent option to buffers and namedbuffers (#132994 )"	2024-08-09 18:14:53 +00:00
optim	Revert "[BE] typing for decorators - _jit_internal (#131573 )"	2024-07-28 03:29:32 +00:00
pipelining	[PP] Fix zero bubble composability with DP (#134052 )	2024-09-04 23:46:29 +00:00
rpc	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 )	2024-08-15 15:50:19 +00:00
tensor	[CP] Extend CP to support load-balancing shards (#132442 )	2024-09-09 18:04:38 +00:00
__init__.py	Remove ProcessGroupRoundRobin (#132888 )	2024-08-08 01:07:40 +00:00
_checkpointable.py
_composable_state.py
_functional_collectives_impl.py
_functional_collectives.py	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
_state_dict_utils.py	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
argparse_util.py
c10d_logger.py	[DCP] Fix duplicated logging messages when enable both c10d and dcp l… (#130423 )	2024-07-11 13:43:39 +00:00
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py	[DeviceMesh][Easy] Make RuntimeError a bit more descriptive by including the actual world_size (#135271 )	2024-09-06 06:23:20 +00:00
distributed_c10d.py	Revert "[c10d] Remove Option for ProcessGroup and Expose backend Options to reflect the correct code structure (#132931 )"	2024-08-30 16:27:40 +00:00
launch.py
logging_handlers.py
remote_device.py	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 )	2024-07-14 08:17:52 +00:00
rendezvous.py
run.py	fix torchrun log message (#131652 )	2024-07-25 14:50:10 +00:00
utils.py	[FSDP] casting input args with dataclass(frozen=True) (#135067 )	2024-09-05 01:19:53 +00:00