pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Yichen Yan	b1eb9e172a	remove jit from dynamo benchmark (#113338 ) Continuous of https://github.com/pytorch/pytorch/pull/106071, without this dynamo dist cannot run at the moment. Related to https://github.com/pytorch/benchmark/pull/1787 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113338 Approved by: https://github.com/ezyang	2023-11-10 18:02:08 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Will Constable	bdc9911575	Fix typo in dist_util.py (#89167 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89167 Approved by: https://github.com/davidberard98	2022-11-17 08:45:27 +00:00
Will Constable	f920bfaf2a	Use torchrun for dynamo/distributed.py (#89149 ) Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. Partially addresses issue #1779 New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] Pull Request resolved: https://github.com/pytorch/pytorch/pull/89149 Approved by: https://github.com/aazzolini	2022-11-16 23:05:34 +00:00
Andrew Gu	4284862db6	[Dynamo][FSDP] Migrate to `ModuleWrapPolicy` (#88453 ) Hello @wconstab! As you saw, `transformer_auto_wrap_policy()` is a misnomer and actually works for any module classes. The PR before this one tries to add a class `ModuleWrapPolicy` that takes in the `module_classes` in its constructor and works just like `transformer_auto_wrap_policy()` without requiring the `functools.partial()`. I hope you do not mind if we update the dynamo benchmarks util file with this migration. The PR before this one might require some back and forth within FSDP devs, so I apologize for any consequent updates to this PR, which in itself is an easy change. I will request review once we know the previous PR is good for land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88453 Approved by: https://github.com/wconstab	2022-11-13 14:56:30 +00:00
Will Constable	a3f3ec8fac	[FSDP+dynamo]: forward treats parameter-views as params (#88781 ) Dynamo+AotAutograd needs a way to wrap all tensors (whether inputs or params/buffers) in FakeTensor wrappers, and FSDP's mangling of parameters hides them from this wrapping. This PR unblocks running hf_bert and hf_T5 with FSDP under dynamo, whether using recursive wrapping around transformer layers or only applying FSDP around the whole model. Perf/memory validation and possibly optimization is the next step. `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager --fsdp_wrap` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager --fsdp_wrap` The problem: Dynamo (Actually aot_autograd) trips up with FSDP becuase it must wrap all input tensors in FakeTensor wrappers, and it only knows to wrap graph inputs or named_(parameters, buffers). FSDP's pre_forward hook sets views (which are not nn.param) into the flatparam as attrs on the module with the same name as the original param, but they will not show up in named_parameters. - in use_orig_params mode, FSDP still de-registers params during pre-forward hook, then re-registers them post-forward - during forward (between the hooks), the params are setattr'd on the module as regular view tensors, not nn.Parameters - note: use_orig_params is the recommended way to use FSDP, and use_orig_params=False is being deprecated. So i only consider use_orig_params=True for this enablement The solution: - adding them to named_buffers is not possible because it interferes with how FSDP's `_apply` works - since they are not actual nn.parameters, register_parameter will complain about registering them - simply seting `module._parameters[name] = view` seems to be a viable workaround, despite being hacky, and FSDP code does modify _parameters directly already. Note: Manual checkpointing still isn't working with FSDP+dynamo, so that will have to be addressed in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88781 Approved by: https://github.com/ezyang, https://github.com/awgu	2022-11-12 01:17:23 +00:00
Will Constable	7a4d91cac4	Add distributed dynamo benchmarking utils (#87419 ) Util for convenient local benchmarking/debugging of distributed models. Not to be confused with the 'real' distributed benchmark script we use for torchbench experiments on slurm. Tries to be simple/hackable and let you use different combinations of DDP/FSDP with models and dynamo backends. Example usage `python benchmarks/dynamo/distributed.py --toy_model --dynamo inductor --ddp` `--dynamo` flag accepts normal dynamo backends (plus 'print' which literally prints graphs to screen) `--torchbench_model <model_name>` works in place of `--toy_model` `--fsdp` is WIP cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87419 Approved by: https://github.com/jansel	2022-10-24 17:39:57 +00:00

9 Commits