pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	a60d9e1f6d	Fix flake8 B028 warnings (#166224 ) This PR fixes flake8 B028 warning by specifying stacklevel=2 in `warnings.warn`. The advantage is that users can know more contextual information about PyTorch warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166224 Approved by: https://github.com/ezyang	2025-10-26 06:18:55 +00:00
Maggie Moss	8f80892359	Use correct pyrefly syntax in suppressions distributed/... (#166241 ) Updates the pyrefy-ignores in the torch/distributed directory to use the correct syntax. No functional changes. pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166241 Approved by: https://github.com/oulgen	2025-10-26 04:16:41 +00:00
Maggie Moss	7457d139c5	Add pyrefly suppressions to torch/distributed (7/n) (#165002 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 One more PR after this one. Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002 Approved by: https://github.com/oulgen	2025-10-09 04:08:25 +00:00
joshuamarkovic	559e8d1c20	[doc]: Small typos (#162982 ) Small typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/162982 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-16 17:42:19 +00:00
Will Constable	e3d68dfae2	[DTensor] Make default RNG semantics match user-passed generator (#160482 ) Previously, DTensor kept its own copy of the generator state after the first time a random operator was called on a DTensor. This copy would evolve independently from the generator outside of DTensor. After adding support for users to pass a specific generator into random operators (e.g. `uniform_(..., generator=)`), it was determined (in discussion on #159991) to change the semantics so that any random operations performed on DTensor would evolve the state of the publicly visible generators (either the default one or user-passed one). The upsides are (1) it is now possible to call torch.manual_seed() at any point in the program and have a consistent effect on DTensor, (2) DTensor ops have an observable effect on the generator. The downside is that users are now responsible for seeding their generator before using DTensor, ensuring all ranks use the same seed. Fixes #159991 confirmed docs rendered OK <img width="897" height="414" alt="image" src="https://github.com/user-attachments/assets/c082f0f0-5447-47aa-834f-65342eb237cd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160482 Approved by: https://github.com/wanchaol	2025-08-25 04:21:19 +00:00
PyTorch MergeBot	c7a77470c5	Revert "[DTensor] Make default RNG semantics match user-passed generator (#160482 )" This reverts commit `d1faf2ef04`. Reverted https://github.com/pytorch/pytorch/pull/160482 on behalf of https://github.com/jeffdaily due to failing cuda and rocm jobs ([comment](https://github.com/pytorch/pytorch/pull/160482#issuecomment-3214694297))	2025-08-22 15:04:28 +00:00
Will Constable	d1faf2ef04	[DTensor] Make default RNG semantics match user-passed generator (#160482 ) Previously, DTensor kept its own copy of the generator state after the first time a random operator was called on a DTensor. This copy would evolve independently from the generator outside of DTensor. After adding support for users to pass a specific generator into random operators (e.g. `uniform_(..., generator=)`), it was determined (in discussion on #159991) to change the semantics so that any random operations performed on DTensor would evolve the state of the publicly visible generators (either the default one or user-passed one). The upsides are (1) it is now possible to call torch.manual_seed() at any point in the program and have a consistent effect on DTensor, (2) DTensor ops have an observable effect on the generator. The downside is that users are now responsible for seeding their generator before using DTensor, ensuring all ranks use the same seed. Fixes #159991 confirmed docs rendered OK <img width="897" height="414" alt="image" src="https://github.com/user-attachments/assets/c082f0f0-5447-47aa-834f-65342eb237cd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160482 Approved by: https://github.com/wanchaol	2025-08-21 22:02:16 +00:00
Will Constable	3cf7b4024e	[DTensor] Support user-supplied Generator for random ops (#159933 ) If the user provides a generator kwarg to a random op (e.g. nn.init.uniform_(..., generator=my_generator)), we can still advance that generator's state in a SPMD-global way so that each local-tensor gets appropriate values and the generator advances to the same state as if it had operated on the full tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159933 Approved by: https://github.com/fduwjj, https://github.com/XilunWu, https://github.com/wanchaol	2025-08-07 18:47:22 +00:00
pralay	5430990bd7	Added philox based RNG context for HPU device in Dtensor scenarios (#156581 ) In this PR, we are enabling `HPU` device-specific function calls for random operations. These calls will manage the setting and unsetting of the `context of Random Number Generator`. While HPU devices typically utilize a `Mersenne-based RNG`, Dtensor-specific random operations employ an `offset-based (Philox) RNG tracker` which is specifically integrated with `CUDA` in scope. To integrate a similar offset-based RNG tracker within the `HPU backend`, a backend-specific device handle function is necessary to identify the execution context of these random operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156581 Approved by: https://github.com/jeromean, https://github.com/wanchaol	2025-07-08 08:50:24 +00:00
Will Constable	a8b48ff14c	[DTensor] clean up _local_shard_size_and_offset (#150650 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150650 Approved by: https://github.com/wanchaol, https://github.com/XilunWu ghstack dependencies: #150490	2025-04-09 22:07:48 +00:00
_githubsgi	f0e1a0838c	Enabling xpu in OffsetBasedRNGTracker . (#148360 ) Else torch.distributed breaks on xpu devices. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148360 Approved by: https://github.com/zhangxiaoli73, https://github.com/guangyey, https://github.com/gujinghui, https://github.com/XilunWu, https://github.com/kwen2501 Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-03-27 10:55:05 +00:00
Xilun Wu	ef61c290e1	[DTensor][random] defer DTensor RNG state sync until first random op call or manual_seed call; support more flexible OffsetBasedRNGTracker init (#147025 ) Resolves https://github.com/pytorch/pytorch/issues/146767. May also resolve https://github.com/pytorch/pytorch/issues/147584. ### Summary This PR removes the RNG tracker init from the `distribute_tensor` call for the following reasons: 1. if the user does not use random ops on DTensor, there's no need to init DTensor RNG which currently requires CUDA device to be present. 2. this complies with the 0-communication semantic of `src_data_rank=None` shard distribution. Besides, `OffsetBasedRNGTracker` only accepts `DeviceMesh` argument to its constructor method. ### Consequence DTensor RNG initialization is delayed till the first DTensor random ops call or `torch.distributed.tensor.random.manual_seed`. ### Test `pytest test/distributed/tensor/test_random_ops.py` `pytest test/distributed/tensor/parallel/test_tp_random_state.py` `pytest test/distributed/tensor/parallel/test_tp_style.py` Differential Revision: [D70201856](https://our.internmc.facebook.com/intern/diff/D70201856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147025 Approved by: https://github.com/kwen2501	2025-02-26 17:33:22 +00:00
Aaron Orenstein	c95efc37ba	PEP585 update - torch/distributed/tensor (#145141 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145141 Approved by: https://github.com/bobrenjc93	2025-01-18 20:01:59 +00:00
Xilun Wu	ce572fedfc	[dtensor][random] use torch.uint64 as the seed/offset tensor dtype to avoid overflow (#141532 ) Summary DTensor RNG code raises error if the seed passed in is beyong `torch.int64` range (e.g. `torch.tensor([2**64-1])` raises error). The solution is to specify the `dtype=torch.uint64` in the `torch.tensor()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141532 Approved by: https://github.com/wconstab ghstack dependencies: #141731, #141220, #141223	2024-11-29 07:59:34 +00:00
Xilun Wu	93cbb287c2	[dtensor][random] allow user to manual_seed different seed on device mesh; only sync RNG state in WORLD when manual_seed has not been called (#141223 ) Summary This PR proposes 4 changes to DTensor RNG management: 1. DTensor allows users to eagerly initialize the RNG tracker by calling `torch.distributed.tensor._random.manual_seed`. 2. DTensor `manual_seed` no longer checks the integrity of the `seed` argument. Users are responsible for setting the same seed on all ranks within an SPMD group, but if there are multiple separate SPMD groups (e.g. across pipeline stages), users should set a _different_ seed for each SPMD group. For cases like Pipeline Parallel, users can set different initial seed for pipelining stages by calling ``` world_mesh = init_device_mesh( device_type="cuda", mesh_shape=(2, 2, 2), mesh_dim_names=("pp", "dp", "tp"), ) pp_mesh = world_mesh["pp"] pp_rank = pp_mesh.get_local_rank() spmd_mesh = world_mesh["dp", "tp"]._flatten("spmd") # this flattening is only needed if you need to call collective over this mesh torch.distributed.tensor._random.manual_seed(123+pp_rank, spmd_mesh) ``` In other word, if users want to call `torch.distributed.tensor._random.manual_seed`, they will be responsible for passing in the right value and DTensor won't perform any checks on it. If the current rank is not a part of the mesh, it will use the current device RNG state to initialize. 3. `OffsetBasedRNGTracker` still performs RNG state synchronization by broadcasting the RNG state on rank 0 to `WORLD`. However, calling `torch.distributed.tensor._random.manual_seed` is an exception. In this case, no broadcast will happen. 4. Enforce that the `manual_seed` call only accept "full mesh" i.e. the DTensor RNG state on every rank must be set through the call. This makes sure that no rank has its RNG state left uninitialized and the SPMD ranks have their RNG state synchronous. Motivation tl;dr 1. Lazily initializing DTensor RNG tracker causes hang in non-SPMD code such as Pipeline Parallel. 2. Users may want to set different seed on ranks in one device mesh. 3. We want to keep the old behavior if users prefer not curating the RNG state and want to have DTensor take care of it. see detail in https://github.com/pytorch/pytorch/issues/140301 Test `pytest test/distributed/_tensor/test_random_ops.py` `pytest test/distributed/tensor/parallel/test_tp_random_state.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141223 Approved by: https://github.com/wconstab ghstack dependencies: #141731, #141220	2024-11-29 07:59:34 +00:00
Xilun Wu	7f5bc9dd87	[dtensor][random][tp] remove the adhoc DTensor RNG tracker TensorParallelRNGTracker since it does not match FSDP2+TP (#141220 ) Summary The ad-hoc DTensor RNG tracker was used to mimic Megatron DDP+TP RNG behavior but it turns out not compatible with PyTorch Distributed FSDP2+TP so we decide to deprecate it and use `OffsetBasedRNGTracker` to replace, which follows the SPMD semantics (replicas get the same random sampling result, shards get different results). Motivation `TensorParallelRNGTracker` was designed for DDP+TP where the random operators produce the same result along the data parallel mesh dimension and different results along the tensor parallel dimension. However this does not apply to the new FSDP+TP composable combination where the model weights are sharded along data parallel mesh dimension as well. Therefore we decide to remove this outdated RNG tracker type for now. If users have demands for exact match between PyTorch Distributed and Megatron on Random Number generation result, feel free to file an issue. Impact `TensorParallelRNGTracker` was only used when Tensor Parallel is used (i.e. calling `parallelize_module`). For non-FSDP users, the "replicas get the same random numbers and shards get different ones" remains unchanged. Unlike `TensorParallelRNGTracker` which sets different seeds (`base_seed + 2718 + TP_rank`) within the TP group, DTensor now sets the same seed (default value is 1234 but users can call `torch.distributed.tensor._random.manual_seed` to modify) on all ranks but choose the right RNG offset based on DTensor placements to enforce the "replicas get the same random numbers and shards get different ones" invariant. For FSDP2 users, improvement should be observed in a way that DTensor sharded within DP group now gets different random number sampling which `TensorParallelRNGTracker` failed to do, though we're not sure how much this change will improve the eventual training loss convergence. Test 1-d model weight meta init: `pytest test/distributed/_tensor/test_random_ops.py -s -k test_tp_model_meta_init` 2-d model weight meta init: `pytest test/distributed/_tensor/test_random_ops.py -s -k test_fsdp_tp_model_meta_init` TP model weight init test: `pytest test/distributed/tensor/parallel/test_tp_random_state.py` FSDP+TP model weight init test: `pytest test/distributed/_composable/fsdp/test_fully_shard_init.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141220 Approved by: https://github.com/wconstab ghstack dependencies: #141731	2024-11-29 07:59:26 +00:00
Xilun Wu	c55191f3a2	[dtensor][random] add 1d and 2d model meta init tests (#141731 ) Summary Added tests for model meta init on 1-d mesh (TP) and 2-d mesh (FSDP+TP). This exploits the issue where DTensor RNG failed to initialize weights differently across FSDP ranks. Test `pytest test/distributed/_tensor/test_random_ops.py -s -k meta_init` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141731 Approved by: https://github.com/wconstab	2024-11-29 07:59:20 +00:00
Wanchao Liang	cfc227ad43	[reland][dtensor] move DTensor to public namespace (#134203 ) reland of https://github.com/pytorch/pytorch/pull/133113 I have to create a new PR because the previous reverted PR could not either be rebased, or imported successfully :( ---- Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the torch.distributed._tensor, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/134203 Approved by: https://github.com/tianyu-l	2024-09-08 17:08:40 +00:00

18 Commits