pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Saurabh Mishra 381d0cb239 [DCP] Avoid in-place update and deepcopy during dudpe (#149320 ) Summary: Avoid in-place update and deepcopy during dudpe. Deepcopy becomes prohibitively expensive with models having a huge number of FQNs. This was manifestd in the Ads 2K experiment as well. Here are the results from the TextRay model in Mitra: #### Control job with deepcopy regression: First save ~24.8s Global step latency is ~7-8s Test job with the new fix to avoid deepcopy: First save is ~21s global step latency ~2s Test Plan: ``` buck test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/distributed/checkpoint:test_planner ``` https://www.internalfb.com/intern/testinfra/testrun/3940649945104822 Differential Revision: D71245218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149320 Approved by: https://github.com/MeetVadakkanchery		2025-03-18 16:08:40 +00:00
..
_composable	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_shard	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_sharded_tensor	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_sharding_spec
_symmetric_memory	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_tensor	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_tools	Add support for non functional collectives under FakeTensorMode and fake_pg for memory tracking (#147566 )	2025-03-08 18:00:49 +00:00
algorithms	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
autograd
benchmarks	[BE][CI] bump `ruff` to 0.8.4 (#143753 )	2024-12-24 12:24:10 +00:00
checkpoint	[DCP] Avoid in-place update and deepcopy during dudpe (#149320 )	2025-03-18 16:08:40 +00:00
elastic	Expose the rendezvous keepalive arguments (#145228 )	2025-03-03 19:11:56 +00:00
examples
fsdp	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 )	2025-03-18 00:46:07 +00:00
launcher	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
nn	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
optim	[BE][Ez]: Use itertools.chain.from_iterable when possible (#148190 )	2025-03-06 20:37:06 +00:00
pipelining	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
rpc	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
tensor	Add batch dim sharding rule to sdpa (#149253 )	2025-03-18 07:54:02 +00:00
__init__.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
_checkpointable.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
_composable_state.py
_functional_collectives_impl.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
_functional_collectives.py	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
_serialization.py	PEP585: More UP006 fixes (#146392 )	2025-02-20 06:18:13 +00:00
_state_dict_utils.py	Create and send `full_tensor` on `ProcessGroup`-supported device in `_broadcast_tensors` (#148865 )	2025-03-12 20:56:31 +00:00
argparse_util.py
c10d_logger.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
collective_utils.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
constants.py
CONTRIBUTING.md
device_mesh.py	[DeviceMesh] Add some documentation for `from_group` API and add a 2D test (#146364 )	2025-03-01 00:57:37 +00:00
distributed_c10d.py	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )"	2025-03-17 22:43:15 +00:00
launch.py	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
logging_handlers.py	PEP585 update - torch/distributed (#145164 )	2025-01-21 04:23:29 +00:00
remote_device.py
rendezvous.py	Fix dist.init_process_group on windows (#148266 )	2025-03-05 00:07:56 +00:00
run.py	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00
utils.py	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 )	2025-02-28 07:35:56 +00:00