pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
PyTorch MergeBot	cbee9c1fd2	Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 )" This reverts commit `0e7e61f7ce`. Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2272370386))	2024-08-07 00:05:20 +00:00
Xuehai Pan	0e7e61f7ce	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-08-03 09:43:38 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Mikayla Gawarecki	1dd10ac802	[BE] [Reland] Make nn.Module state_dict load_state_dict pre-hook and state_dict post-hook public (#131690 ) Reland https://github.com/pytorch/pytorch/pull/126704 #### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic: Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up. If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised #### Copied from previous PR description Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437 - `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook` - Add a test as this API was previously untested - `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True` ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~ - For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook - Document this for private `_register_state_dict_hook` - Remove this for the public `register_state_dict_post_hook` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690 Approved by: https://github.com/albanD	2024-07-26 18:14:07 +00:00
Jovian Anthony Jaison	f6edd1f7c9	[BE] Make ActivationWrapper an abstract class (#129808 ) Fixes #95481 Test Plan: Unit tested checkpoint_wrapper.py by instantizing ActivationWrapper and got TypeError as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129808 Approved by: https://github.com/Skylion007	2024-07-02 04:29:43 +00:00
Xuehai Pan	e6d4451ae8	[BE][Easy] enable UFMT for `torch/distributed/{algorithms,autograd,benchmarks,checkpoint,elastic}/` (#128866 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128866 Approved by: https://github.com/fegin	2024-06-18 13:51:53 +00:00
PyTorch MergeBot	1d233b8f50	Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 )" This reverts commit `c38b3381a1`. Reverted https://github.com/pytorch/pytorch/pull/126704 on behalf of https://github.com/clee2000 due to broke internal typecheck D58394110 (which probably means the code wouldn't work either but I guess it didn't run on the diff). Probably an easy fix? ([comment](https://github.com/pytorch/pytorch/pull/126704#issuecomment-2161299193))	2024-06-11 17:45:20 +00:00
Mikayla Gawarecki	c38b3381a1	Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 ) Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437 - `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook` - Add a test as this API was previously untested - `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True` ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~ - Document issue pointed out by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook - Remove this for the public `register_state_dict_post_hook` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126704 Approved by: https://github.com/albanD ghstack dependencies: #126906	2024-06-10 21:50:17 +00:00
PyTorch MergeBot	90bb510ece	Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 )" This reverts commit `348b181a97`. Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/clee2000 due to sorry I think https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456 is still relevant, I will reach out to them to see what needs to be done in internal to get this remerged ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2159248859))	2024-06-10 20:44:42 +00:00
Aaron Orenstein	3a0d088517	Flip default value for mypy disallow_untyped_defs [5/11] (#127842 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842 Approved by: https://github.com/oulgen	2024-06-08 18:49:18 +00:00
Xuehai Pan	348b181a97	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007	2024-06-08 15:25:03 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Jeeja	1110edb94b	Fix stream type to generic in comms default hooks (#120069 ) In comms default_hooks - decompress stream is hardcoded to cuda type. fix this to use generic type based on the grad tensor device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120069 Approved by: https://github.com/jgong5, https://github.com/fegin	2024-05-27 10:27:30 +00:00
Chien-Chin Huang	1d2382f141	[DDP] Use compiled_autograd to trace DDP backward allreduce (#110662 ) Summary The reducer of `DistributedDataParallel` is implemented with C++ and it is not easy to trace the allreduce launched in the reducer. This PR modifies `DistributedDataParallel` to launch one allreduce per gradient when `compiled_autograd` is enabled. The changes allow us to use `compiled_autograd` to trace the allreduce and later be optimized (fused) in the Inductor. Key Logic 1. If `ddp_python_hook` is True, we assume `compiled_autograd` is used. `DistributedDataParallel` registers `compiled_accum_grad_hook` for all parameters. 2. In the first forward() call, if `DistributedDataParallel` is not compiled, all `compiled_accum_grad_hook` are deregistered. If `DistributedDataParallel` is compiled, all `compiled_accum_grad_hook` will be compiled by `compiled_autograd`. 3. `compiled_accum_grad_hook` launches an allreduce to reduce the gradient of the parameter. Bucketing The compiled backward is slow because there is no bucketing for the allreduces. We rely on Inductor to bucket the allreduces. The bucketing is done in a separate PR. Differential Revision: [D49428482](https://our.internmc.facebook.com/intern/diff/D49428482/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110662 Approved by: https://github.com/wconstab	2024-02-08 03:03:15 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
mingxzhao	1859895ffa	Docs: fix docstring errors in model_averaging (#117038 ) pydocstyle check averagers.py Pre /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:1 at module level: D100: Missing docstring in public module /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:20 in public method `__init__`: D107: Missing docstring in __init__ /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:27 in public method `average_parameters`: D102: Missing docstring in public method /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:84 in public method `__init__`: D107: Missing docstring in __init__ /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:106 in public method `average_parameters`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:106 in public method `average_parameters`: D400: First line should end with a period (not '`') 6 Post /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:1 at module level: D100: Missing docstring in public module /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:20 in public method `__init__`: D107: Missing docstring in __init__ /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:27 in public method `average_parameters`: D102: Missing docstring in public method /workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:84 in public method `__init__`: D107: Missing docstring in __init__ 4 utils.py Pre /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:1 at module level: D100: Missing docstring in public module /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:17 in public function `average_parameters`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:45 in public function `get_params_to_average`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:45 in public function `get_params_to_average`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:68 in public function `average_parameters_or_parameter_groups`: D200: One-line docstring should fit on one line with quotes (found 3) 5 Post /workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:1 at module level: D100: Missing docstring in public module 1 hierarchical_model_averager.py Pre /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:1 at module level: D100: Missing docstring in public module /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:16 in public class `HierarchicalModelAverager`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:98 in public method `__init__`: D107: Missing docstring in __init__ /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`: D400: First line should end with a period (not ',') /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:151 in public method `average_parameters`: D205: 1 blank line required between summary line and description (found 0) /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:151 in public method `average_parameters`: D400: First line should end with a period (not '`') 8 Post /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:1 at module level: D100: Missing docstring in public module /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:99 in public method `__init__`: D107: Missing docstring in __init__ 2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117038 Approved by: https://github.com/H-Huang	2024-01-18 04:12:51 +00:00
mingxzhao	b4f1ab4505	Docs: fix docstring errors in ddp_comm_hooks (#116866 ) Reopens #115272 Fixes ddp_comm_hooks errors in https://github.com/pytorch/pytorch/issues/112644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116866 Approved by: https://github.com/awgu	2024-01-10 01:24:06 +00:00
Ashvanth.S	e6bffc6b87	Fix docstring errors in default_hooks.py, optimizer_overlap.py, checkpoint_wrapper.py, copy.py, benchmark_ddp_rpc.py, utils.py, dependency.py, phony.py, pipeline.py, checkpoint.py, worker.py, batchnorm.py, quantization.py (#113511 ) Fixes #112645 Updated the files by fixing the docstring errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113511 Approved by: https://github.com/weifengpy	2023-11-14 18:52:41 +00:00
ChanBong	c0b57d4e3b	fix docstring issues in torch.distributed (#113337 ) Fixes #112643 Fixes all the issues listed ### Error Count \|File \| Count Before \| Count now\| \|---- \| ---- \| ---- \| \|`torch/distributed/optim/named_optimizer.py` \| 13 \| 1\| \|`torch/distributed/nn/functional.py` \| 7 \| 1\| \|`torch/distributed/nn/api/remote_module.py` \| 25 \| 3\| \|`torch/distributed/algorithms/join.py` \| 43 \| 4\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113337 Approved by: https://github.com/ezyang	2023-11-13 19:37:29 +00:00
Fabrice Pont	053367b1ed	fix: flake8-bugbear code B024 (#107265 ) See #106571 item B024 This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes. Should I also include PEP8 compliant reformatting on the files I had to modify ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265 Approved by: https://github.com/kit1980	2023-10-04 23:52:52 +00:00
redwrasse	ba4782e3c0	cleanup typos; redundant parentheses (#109003 ) - minor spelling fixes in `aten/src/ATen/core/TransformationHelper.h` - remove redundant parentheses in control statements in `torch/distributed/algorithms/_quantization/quantization.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109003 Approved by: https://github.com/davidradl, https://github.com/H-Huang	2023-09-11 17:09:17 +00:00
Rohan Varma	208fd1cb84	[RFC] Somewhat BC breaking: make checkpoint_wrapper default to NO_REENTRANT (#108435 ) We should use no_reentrant. There are a lot of users of this API, but it is in a prototype state so should be fine to change. Differential Revision: [D48898148](https://our.internmc.facebook.com/intern/diff/D48898148/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108435 Approved by: https://github.com/awgu ghstack dependencies: #108032, #108033	2023-09-05 21:43:41 +00:00
Andrew Gu	15953fdf35	[FSDP][8/N] Replace `_FSDPPolicy.policy` with `_Policy._run_policy` (#104969 ) This does some code organization improvement. - It renames `_FSDPPolicy` to `_Policy` to show that it is not only for FSDP but for any module-level API. - It formalizes the contract that such a policy should return something like `target_module_to_kwargs: Dict[nn.Module, Dict[str, Any]]` that maps each module to wrap to its kwargs. It does so by requiring a `_run_policy` abstract method (this time private since users do not need to care about it). Then, our auto wrapping can just call `_run_policy()` to generate the dict and do any validation or post-processing. This PR is technically BC-breaking because it removes the public `ModuleWrapPolicy.policy`. However, I do not think anyone was using that anyway, so this is a pretty safe breakage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104969 Approved by: https://github.com/rohan-varma ghstack dependencies: #104427, #104967, #104999	2023-08-03 12:42:14 +00:00
Rohan Varma	2ec7cd2db2	[CheckpointWrapper] Test for kwarg propagation, remove checkpoint_fn_arg support (#102679 ) Closes https://github.com/pytorch/pytorch/issues/100576 Differential Revision: [D46342398](https://our.internmc.facebook.com/intern/diff/D46342398/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102679 Approved by: https://github.com/awgu	2023-07-28 21:18:35 +00:00
Andrew Gu	800287fb56	[FSDP] Optimize away intermediate `div_` for HSDP (#106034 ) ### Background: Gradient Pre-Divide Consider $N$ data parallel workers. Define $g_i$ to be the $i$ th worker's local unsharded gradient. Data parallel gradient reduction computes $\overline g = \frac{1}{N} \sum_{i \in [N]} g_i$. $\sum_{i \in [N]} g_i$ increases the magnitude by a factor of $N$, which may overflow for fp16. However, if we pre-divide and compute $\sum_{i \in [N]} \frac{g_i}{N}$, then the $\frac{g_i}{N}$ may underflow. The current solution from Myle for FSDP is to pre-divide by $\sqrt{N}$ and post-divide by $\sqrt{N}$: $$\overline{g} = \frac{1}{\sqrt{N}} \sum_{i \in [N]} \frac{g_i}{\sqrt{N}}.$$ Now, consider HSDP with $N = S \cdot R$ data parallel workers, sharding over $S$ workers and replicating over $R$ workers. Define $g_{i,j}$ to be the $i \cdot S + j$ th worker's local unsharded gradient (so sharding indexes with $i$ and replication indexes with $j$). The existing implementation computes $$\overline{g} = \frac{1}{\sqrt{R}} \sum_{j \in [R]} \textcolor{red}{ \frac{1}{\sqrt{R}} \frac{1}{\sqrt{S}} } \sum_{i \in [S]} \frac{g_i}{\sqrt{S}},$$ where the $\frac{1}{\sqrt{R}} \frac{1}{\sqrt{S}}$ involves two separate `aten::div_` kernels. ### Revisiting Pre-Divide for HSDP A minor optimization that we can do is with this intermediate `div_`. There are two options: 1. Compute $\overline{g}$ in the same way as FSDP: $$\overline{g} = \frac{1}{\sqrt{N}} \sum_{j \in [R]} \sum_{i \in [S]} \frac{g_{i,j}}{\sqrt{N}}.$$ 2. Compute $\overline{g}$ still with an intermediate division for rescaling but coalescing the two `divs_` into one: $$\overline{g} = \frac{1}{\sqrt{R}} \sum_{j \in [R]} \textcolor{red}{ \frac{1}{\sqrt{N}} } \sum_{i \in [S]} \frac{g_i}{\sqrt{S}}$$ This PR goes with the 1st approach prioritizing performance because (1) it matches the existing FSDP behavior and (2) it avoids a memor-bandwidth bound `div_` kernel that blocks all-reduce launch. ### Implementation Details In order to accommodate this, we need to refactor the communication hook logic that baked the gradient pre/post-division into the default hook. - We raise an error if registering a communication hook for HSDP since the current implementation would only apply the hook to the reduce-scatter, not the all-reduce, which may be unexpected. - We change it so that `state._comm_hook is not None` iff a communication hook is registered. This makes the collectives and the pre/post-division in the default no-communication-hook path more visible in the code. Differential Revision: [D47852459](https://our.internmc.facebook.com/intern/diff/D47852459) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106034 Approved by: https://github.com/rohan-varma	2023-07-28 18:36:26 +00:00
Justin Chu	232b96b6e2	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433 Approved by: https://github.com/albanD	2023-07-19 14:27:11 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	b4d91b1c5b	Revert "[Typing] Fix PEP 484 Violation (#105022 )" This reverts commit `4148b7bada`. Reverted https://github.com/pytorch/pytorch/pull/105022 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/105022#issuecomment-1635967734))	2023-07-14 14:45:09 +00:00
Nikita Shulga	4148b7bada	[Typing] Fix PEP 484 Violation (#105022 ) Not sure, how it worked before, but if arguments must be annotated is optional if they are defaulted to None Towards enabling mypy-1.4.1 in lintrunner <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 5e1b9f4</samp> > _We annotate the arguments of doom_ > _To show the `None` values of gloom_ > _We improve the type checking and readability_ > _With `Optional` annotations of metal-ity_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/105022 Approved by: https://github.com/izaitsevfb, https://github.com/huydhn, https://github.com/Skylion007	2023-07-12 10:20:48 +00:00
Rohan Varma	a748be93df	[CheckpointWrapper] Warn on reentrant use (#102890 ) We'd like to encourage users to try non-reentrant as much as possible, and identify any gaps this way. Differential Revision: [D46397786](https://our.internmc.facebook.com/intern/diff/D46397786/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102890 Approved by: https://github.com/awgu	2023-06-04 18:31:22 +00:00
Rohan Varma	957ea485c4	[FSDP/AC] checkpoint_wrapper acccept auto_wrap_policy (#102672 ) Some feedback for this API is that folks would like to use auto_wrap_policy similar to FSDP instead of having to adapt to the signature of ``check_fn``. Differential Revision: [D46340320](https://our.internmc.facebook.com/intern/diff/D46340320/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102672 Approved by: https://github.com/awgu	2023-06-04 18:31:19 +00:00
Dirk Groeneveld	75945d54f7	Properly propagates checkpoint wrapper args and kwargs (#99791 ) It looks like passing `args` and `kwargs` to `checkpoint_wrapper()` does not work because someone forgot some ``s. This adds them back in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99791 Approved by: https://github.com/awgu	2023-05-03 23:19:21 +00:00
Rohan Varma	bba2090831	Enable fused optimizer for DP (#98270 ) Differential Revision: [D42714482](https://our.internmc.facebook.com/intern/diff/D42714482/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D42714482/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/98270 Approved by: https://github.com/awgu	2023-04-13 20:16:32 +00:00
Edward Z. Yang	5a7aad9681	Convert logging f-strings to use % format, part four (#98705 ) This does multi-line concatenated string literals. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98705 Approved by: https://github.com/voznesenskym	2023-04-11 13:17:59 +00:00
Kazuaki Ishizaki	6514d71add	Fix typos under torch/distributed directory (#98225 ) This PR fixes typos in comments and messages of `.py` files under `torch/distributed` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/98225 Approved by: https://github.com/soulitzer, https://github.com/kit1980	2023-04-05 00:21:33 +00:00
Edward Z. Yang	5df59f957f	Fix G001,G002,G003 in logs to % syntax (#97812 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812 Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos	2023-04-01 01:43:33 +00:00
Kazuaki Ishizaki	35fd5c548e	Fix typos under torch/distributed directory (#95638 ) This PR fixes typos in comments and messages of `.py` files under torch/distributed directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638 Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980	2023-03-27 21:13:44 +00:00
Rohan Varma	32f11f58c9	DDP native mixed precision (#92882 ) Implements native mixed precision support for DDP in a similar fashion to how it is enabled for FSDP. The implementation works as follows: 1. In DDP init, we save `_mp_param` and `_fp_param` variables to manage mixed precision parameter usage. In particular, _mp_param will represent the parameter in the reduced precision, while _fp_param will represent the param in regular precision. During forward/backward, we swap back and forth as needed. 2. The root module gets a root pre-forward hook that kicks off copies to the reduced precision for all submodules. An event is recorded for each submodule to allow for waiting, as we run these asynchronously. 3. Each module gets a pre-forward hook that waits on its corresponding event. note that modules might be reused during training, in this case the wait is only done for the first module. After this wait, the module's parameters are in reduced precision. 4. In the pre-forward hook, we register a backward hook on the lower precision parameters in order to run reduced precision allreduce + parameter upcast. We can't rely on the Reducer's constructor setting up these hooks because the gradient is accumulated on the low precision param, so we need to register them ourselves. 5. In the backward pass, when the hook runs, we first run allreduce + divide in the reduced precision. Next, we upcast parameters and gradients back to fp32 asynchronously. We also queue a callback at the end of backward to wait on these upcasts so that the upcast is complete before optim.step() runs. 6. Parameters that don't require grad are also cast since they may be used in computation, they are upcast back in the final autograd callback. 7. DDP Ignored parameters are not touched. Follow-ups: 1. Unify comm hooks and make it work with apply optimizer in backward 2. implement keep_low_precision_grads, 3. allow BN, LN, or custom units to run in reduced precision, 4. support for cast_forward_inputs 5. Unify certain APIs / helpers with FSDP where possible, such as for _cast_forward_inputs 6. Integrate this with replicate() API. 7. The order in which we kick off copies and wait for them is set by the iteration order of module.modules(), but this might not be how the modules are used in the actual training. In the worst case, the last module in module.modules() could be used first which would result in waiting for all copies unnecessarily. For static graphs, we should record the module execution order and copy / wait in this order. 8. Entirely unused modules probably don't need to be cast. Differential Revision: [D42515803](https://our.internmc.facebook.com/intern/diff/D42515803/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92882 Approved by: https://github.com/zhaojuanmao	2023-03-13 14:10:31 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
Rohan Varma	e8bf7c21e4	Integrate apply_optim_in_backward with DDP (#89194 ) Allow _apply_optim_in_backward to work with DDP. Example: ``` dist.init_process_group("nccl", rank=rank, world_size=2) torch.cuda.set_device(rank) e = enc().cuda(rank) _apply_optimizer_in_backward( optimizer_class=torch.optim.SGD, params=e.parameters(), optimizer_kwargs={"lr": 0.03}, ) e = DDP(e, device_ids=[rank]) inp = torch.randn(1, 10, device=rank) e(inp).sum().backward() ``` Constraints: 1. Custom communication hook not yet supported 2. _apply_optim_in_backward needs to be called _before_ wrapping model in DDP. 3. DDP will remove the gradient hooks _apply_optim_in_backward registers, so these gradient hooks will not be fired and cannot be used. 4. All DDP managed parameters have grads set to None by default once optimizer is applied. There is no support for setting only some parameter grads to None, this must be done manually by user (and DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0 needs to be set.) Differential Revision: [D41329694](https://our.internmc.facebook.com/intern/diff/D41329694/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41329694/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/89194 Approved by: https://github.com/zhaojuanmao	2022-12-21 07:35:19 +00:00
Andrew Gu	fc429512d5	[FSDP] Clean up `FlatParamHandle` dtypes, post-backward hook (#90660 ) This PR reworks the internal handling of parameter and gradient reduction mixed precision, cleans up the post-backward hook logic, and adds some minor changes to the communication hooks. Overview This PR addresses everything in https://github.com/pytorch/pytorch/issues/90657 except renaming `keep_low_precision_grads` to `keep_grads_in_reduce_dtype` since that is BC breaking. I recommend reading the issue before preceding. For `MixedPrecision(param_dtype, reduce_dtype, ...)`, the exact rule for parameter and gradient reduction mixed precision that we are following is: > If `param_dtype is not None` and `reduce_dtype is None`, then we infer `reduce_dtype = param_dtype`. Otherwise, we take `param_dtype` and `reduce_dtype` as is. This PR enforces that, at the `FlatParamHandle` level, `handle._config.fwd_bwd_param_dtype` and `handle._config.reduce_dtype` are never `None`. The way to check if mixed precision is enabled is to compare against the original parameter dtype, which is now stored in `handle._orig_param_dtype`. It is no longer to check against `None`. This avoids ambiguous cases such as when the user passes `MixedPrecision(param_dtype=torch.float32)`. In that case, our existing implementation mistakenly thinks that parameter mixed precision is enabled and either relies on no-ops silently or errors (such as one case reported by MosaicML). Additional Details - We remove `FullyShardedDataParallel._mixed_precision_enabled_for_params`, `FullyShardedDataParallel._mixed_precision_enabled_for_reduce`, and `FullyShardedDataParallel._mixed_precision_keep_low_precision_grads` since they are not used. - The unit test `test_meta_device_with_mixed_precision()` exercises a tricky edge case with meta device initialization, `apply()` (calling into `summon_full_params()`), and `param_dtype=torch.float32` for a nested wrapping case, where each nested instance has parameters. - We include some minor fixes/improvements to the communication hook implementation. Follow-Ups - We should get rid of `HandleConfig` and store its fields as attributes on `FlatParamHandle` directly. - Rename `keep_low_precision_grads` to `keep_grads_in_reduce_dtype`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90660 Approved by: https://github.com/zhaojuanmao	2022-12-13 07:34:59 +00:00
Rohan Varma	a5532929da	Remove DDP import (#89982 ) This import is only used for typing, removing it to avoid circular ref in next diffs Differential Revision: [D41636897](https://our.internmc.facebook.com/intern/diff/D41636897/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89982 Approved by: https://github.com/zhaojuanmao	2022-12-01 14:56:48 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00

1 2 3 4 5

248 Commits