pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Orenstein	7c12cc7ce4	Flip default value for mypy disallow_untyped_defs [6/11] (#127843 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127843 Approved by: https://github.com/oulgen ghstack dependencies: #127842	2024-06-08 18:49:29 +00:00
Jeeja	556e4ec6c9	[FSDP] Add device in pin_memory argument (#119878 ) Add device to pin_memory argument to support other backends like HPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/119878 Approved by: https://github.com/awgu	2024-05-14 10:30:00 +00:00
Andrew Gu	79af814369	[FSDP] Added private `_unshard` API (#124304 ) Some toy example: <img width="998" alt="Screenshot 2024-04-17 at 2 00 05 PM" src="https://github.com/pytorch/pytorch/assets/31054793/b5665a63-beb0-4ca1-92c6-c57a052812fd"> We define `FullyShardedDataParallel._unshard(async_op: bool = False)` that can be used to prefetch all-gathers. The user should make sure: 1. Run lazy init before the first `_unshard` call of training. For example, this can hackily be done via `root_module.check_is_root()` on the root FSDP module `root_module`. 2. Call `root_module._wait_unshard_streams_on_current_stream()` before the first `_unshard` call of the current iteration (just need to call it once after last optimizer step and before first `_unshard` of this iteration). Differential Revision: [D56262876](https://our.internmc.facebook.com/intern/diff/D56262876) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124304 Approved by: https://github.com/wanchaol	2024-05-03 13:14:15 +00:00
willfengg	d60135e915	[FSDP1] fix _same_storage check for DTensor (#123617 ) for FSDP (SHARD_GRAD_OP + use_orig_params) + TP, params in the backward are DTensors. However, ``DTensor.untyped_storage().data_ptr()`` does not work in ``_same_storage``. Thus desugar to ``DTensor._local_tensor.untyped_storage().data_ptr()`` https://github.com/pytorch/pytorch/issues/123272 credit to @bigning for the original fix. after landing, we would not need patching in mosaic composer https://github.com/mosaicml/composer/pull/3175/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/123617 Approved by: https://github.com/awgu	2024-04-10 10:26:12 +00:00
Chirag Pandya	b6201a60c5	[BE] minor logging cleanup in distributed (#122921 ) Summary: Minor logging cleanup in distributed library 1. Don't use "f" formatted strings - address linter issues. 2. Nits: Make use of unused `e` (error) in a few logs. 3. Change info->debug as asked in issue #113545 4. Nit: rename log -> logger in a few files for consistency 5. Fix a linter error. Test Plan: 1. Local build passes. 2. Linter is happy. Reviewers: wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/122921 Approved by: https://github.com/wanchaol	2024-03-29 03:34:01 +00:00
Catherine Lee	4f5785b6b3	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Catherine Lee <csl@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 21:07:01 +00:00
PyTorch MergeBot	40ece2e579	Revert "Enable possibly-undefined error code (#118533 )" This reverts commit `4f13f69a45`. Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))	2024-01-30 19:00:34 +00:00
Edward Z. Yang	4f13f69a45	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 05:08:10 +00:00
Wei (Will) Feng	91d5f94f85	[FSDP] Idempotent reshard (#117997 ) address assertion error "Expects storage to be allocated" by making reshard idempotent https://github.com/pytorch/pytorch/issues/117510 ```pytest test/distributed/fsdp/test_fsdp_fine_tune.py -k test_parity_with_non_frozen_fsdp``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117997 Approved by: https://github.com/awgu	2024-01-25 23:29:23 +00:00
Wei (Will) Feng	8b0bfb3aaa	[FSDP] remove unused flat_param_part_view (#117082 ) flat_param_part_view is unused in pytorch repo: https://fburl.com/ssaomd7x it became unused since refactoring in https://github.com/pytorch/pytorch/pull/115497 before that, the original code is below. Since flat_param is 1D, we do not need .view for reshaping ``` self.flat_param.data = padded_unsharded_flat_param[ : unsharded_size.numel() ].view( unsharded_size ) ``` unit test: pytest test/distributed/fsdp/test_fsdp_core.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/117082 Approved by: https://github.com/awgu, https://github.com/wconstab, https://github.com/Skylion007	2024-01-11 21:59:51 +00:00
Wei (Will) Feng	ebedce24ab	[FSDP] enable autograd in forward prefetching (#116792 ) problem when prefetching for next forward, current forward may be annotated by `@torch.no_grad`. `param.grad_fn` keeps being None during prefetching. `_post_backward_hook` never gets triggered repro ```pytest test/distributed/fsdp/test_fsdp_freezing_weights.py``` solution this PR enabled autograd during prefetching (`_use_unsharded_views`), so `param.grad_fn` are properly assigned for next forward a longer-term fix would be moving `_use_unsharded_views` out of `_prefetch_handle` and put it in `_pre_forward_unshard` Pull Request resolved: https://github.com/pytorch/pytorch/pull/116792 Approved by: https://github.com/awgu	2024-01-05 18:44:27 +00:00
drisspg	5f5405f809	I have seen this deprecation and I am curious if this is the fix (#116714 ) Lets see what CI/CD says Pull Request resolved: https://github.com/pytorch/pytorch/pull/116714 Approved by: https://github.com/awgu, https://github.com/wanchaol	2024-01-05 07:02:58 +00:00
voznesenskym	74e8cfc9a0	Forward fix torch package bug - dont depend on dynam in fsdp directly (#116229 ) Differential Revision: [D52350752](https://our.internmc.facebook.com/intern/diff/D52350752) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116229 Approved by: https://github.com/janeyx99, https://github.com/zou3519	2023-12-21 03:10:22 +00:00
voznesenskym	77d5f60740	[fsdp][torch.compile] FSDP changes (#115497 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115497 Approved by: https://github.com/albanD	2023-12-19 18:44:36 +00:00
voznesenskym	310f6ab11a	[fsdp] Replace acc_grad hooking with register_post_accumulate_grad_hook on flat_param (#112184 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112184 Approved by: https://github.com/albanD ghstack dependencies: #115315	2023-12-13 16:24:44 +00:00
CK Luk	0ea126e834	add use_fake_all_gather and use_fake_reduce_scatter to FSDP for ablation studies (#113106 ) Summary: As titled Test Plan: Not needed because this is only for doing ablation studies Reviewed By: awgu Differential Revision: D50867908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113106 Approved by: https://github.com/awgu	2023-11-17 05:43:30 +00:00
Konstantin Dobler	3700894099	Fix FSDP `summon_full_params(..., with_grads=True)` when grad precision is not `fp32` (#112746 ) Fixes #112717 I moved the `torch.empty` call after the conditional so that we don't need to check whether `flat_param.grad` is None Pull Request resolved: https://github.com/pytorch/pytorch/pull/112746 Approved by: https://github.com/awgu	2023-11-13 19:04:24 +00:00
BJ Hargrave	670abff6ff	docs: Fix docstring lint errors in torch/distributed/fsdp/_flat_param.py & torch/distributed/fsdp/_init_utils.py (#113358 ) Fixes #113189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113358 Approved by: https://github.com/kit1980	2023-11-11 01:53:02 +00:00
wz337	31ded95cd5	[2D] Bind _fsdp_extension to FSDP instances (#113237 ) Currently, when we have 2D composition, a global variable _extensions controls the 2D deviation we need to take in state_dict calls (See https://github.com/pytorch/pytorch/blob/release/2.1/torch/distributed/fsdp/_fsdp_extensions.py#L66-L68). This is problematic when we have both a 2D model and a plain FSDP model in the same dist environment, as the _extensions will be mistakenly turned on for the plain FSDP model, resulting in state_dict error (RuntimeError: No parent device_mesh is found for FSDP device_mesh.). This PR binds _fsdp_extension to the FSDP instances to make sure that state_dict calls would not get interfered with each other when mixing both 2D and 1D parallelism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113237 Approved by: https://github.com/fduwjj, https://github.com/fegin	2023-11-09 03:31:03 +00:00
Ke Wen	a2dcf26df4	[c10d] Pass avoidRecordStreams into collective() function (#112195 ) Even after PR #111431, the `collective(...)` function still uses the underlined version `avoidRecordStreams_` inside and does not respect each collective call's preference, as the underlined `avoidRecordStreams_` is only controlled by environment variable. As a fix, we pass `avoidRecordStreams` into the collective() function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112195 Approved by: https://github.com/awgu	2023-10-28 03:28:51 +00:00
Matthew Hoffman	68b0db1274	Define the public API for torch.distributed.fsdp (#109922 ) Related: https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation Related: https://github.com/microsoft/pylance-release/issues/2953 This fixes pylance issues for these classes: ``` "FullyShardedDataParallel" is not exported from module "torch.distributed.fsdp" ``` These classes all have public docs: * [`BackwardPrefetch`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.BackwardPrefetch) * [`CPUOffload`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.CPUOffload) * [`FullyShardedDataParallel`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel) * [`MixedPrecision`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.MixedPrecision) * [`ShardingStrategy`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.ShardingStrategy) And it seems like all the newly added classes will have docs once they are released. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109922 Approved by: https://github.com/wanchaol	2023-09-28 02:15:58 +00:00

21 Commits