pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wanchao Liang	2ee6b97464	[dtensor] move DTensor to public namespace (#133113 ) Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306	2024-08-17 05:09:52 +00:00
Xuehai Pan	b25ef91bf1	[BE][Easy][18/19] enforce style for empty lines in import segments in `torch/d*/` (#129770 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129770 Approved by: https://github.com/wconstab	2024-08-01 04:22:50 +00:00
Lucas Pasqualin	69c34f6e4c	Corrects Error Codes from cudaHostRegister (#132089 ) Causing some terrible error messages e.g. : ``` # printing directly: cudaError.??? # casting to int first: 712 Traceback (most recent call last): File "/data/users/lpasqualin/fbsource/fbcode/scripts/lpasqualin/playground.py", line 15, in <module> main() File "/data/users/lpasqualin/fbsource/fbcode/scripts/lpasqualin/playground.py", line 11, in main _create_cpu_state_dict(sd, share_memory=True, pin_memory=True) File "/home/lpasqualin/pytorch/torch/distributed/_state_dict_utils.py", line 436, in _create_cpu_state_dict ret = _iterate_state_dict( ^^^^^^^^^^^^^^^^^^^^ File "/home/lpasqualin/pytorch/torch/distributed/_state_dict_utils.py", line 143, in _iterate_state_dict ret = { ^ File "/home/lpasqualin/pytorch/torch/distributed/_state_dict_utils.py", line 144, in <dictcomp> key: _iterate_state_dict( ^^^^^^^^^^^^^^^^^^^^ File "/home/lpasqualin/pytorch/torch/distributed/_state_dict_utils.py", line 125, in _iterate_state_dict ret = tensor_func(iter_object, pg, device, companion_obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lpasqualin/pytorch/torch/distributed/_state_dict_utils.py", line 428, in tensor_func succ == 0 AssertionError: Pinning shared memory failed with error-code: cudaError.??? ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132089 Approved by: https://github.com/Skylion007	2024-07-30 21:42:00 +00:00
Teja	b61600f6cc	[pytorch] fix the leak for pinned memory when using _create_cpu_state… (#131270 ) When pin_memory and share_memory both are set to True in _create_cpu_state_dict, the memory is pinned using cudaHostRegister but is never unpinned. So, once tensor is created and freed, when a new tensor is created the caching allocator is allocating the same memory. This fails with below error. ``` obj = <[RuntimeError('CUDA error: part or all of the requested memory range is already mapped\nCUDA kernel errors might be a...pile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f0028a4d6c0> pg = None, device = None, _ = None ``` This PR fixes this by unregistering this memory on tensor free by attaching a hook. This is easily reproducible with xlformers checkpointing unit tests and the fix is verified with the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131270 Approved by: https://github.com/LucasLLC	2024-07-23 15:47:21 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Xuehai Pan	94dc3253a0	[BE][Easy] enable UFMT for `torch/distributed/` (#128870 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870 Approved by: https://github.com/fegin, https://github.com/wconstab	2024-06-22 18:53:28 +00:00
PyTorch MergeBot	9c929f6ce9	Revert "[BE][Easy] enable UFMT for `torch/distributed/` (#128870 )" This reverts commit `a0e1e20c41`. Reverted https://github.com/pytorch/pytorch/pull/128870 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128870#issuecomment-2181780356))	2024-06-21 00:38:28 +00:00
Xuehai Pan	a0e1e20c41	[BE][Easy] enable UFMT for `torch/distributed/` (#128870 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870 Approved by: https://github.com/fegin ghstack dependencies: #128868, #128869	2024-06-18 21:49:08 +00:00
mori360	d71f92213c	[DSD] keep 'exp_avg' as DTensor after torch.distributed.checkpoint.state_dict.set_optimizer_state_dict (#128004 ) Fixes #126950 `ptd_state_dict` with `broadcast_from_rank0=False` might miss 2 condition checks in the `set_optimizer_state_dict` Here we add another condition `full_state_dict=True` with corresponding tensor distribution without broadcasting if broadcast_from_rank0=False Pull Request resolved: https://github.com/pytorch/pytorch/pull/128004 Approved by: https://github.com/fegin	2024-06-12 18:14:56 +00:00
Aaron Orenstein	3a0d088517	Flip default value for mypy disallow_untyped_defs [5/11] (#127842 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842 Approved by: https://github.com/oulgen	2024-06-08 18:49:18 +00:00
Chien-Chin Huang	6d21685b45	[DSD] Fixes various bugs for broadcast_from_rank0 (#127635 ) Fixes https://github.com/pytorch/pytorch/issues/126285 Summary: 1. Fixes https://github.com/pytorch/pytorch/issues/126285 2. Broadcasting one tensor per time to avoid OOM. 3. Add some docstring Pull Request resolved: https://github.com/pytorch/pytorch/pull/127635 Approved by: https://github.com/weifengpy	2024-06-03 06:35:21 +00:00
Lucas Pasqualin	42312a52b3	[DSD] Adds type_check param to copy state dict utils (#127417 ) [DSD] Adds type_check param to copy state dict utils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127417 Approved by: https://github.com/fegin	2024-06-01 17:50:52 +00:00
Chien-Chin Huang	15a9770225	[DSD] Implement broadcast_from_rank0 option for optim state_dict (#125339 ) Summary: This is useful if users would like to avoid CPU memory OOM when loading from a full state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125339 Approved by: https://github.com/weifengpy ghstack dependencies: #125708, #125338	2024-05-08 07:22:20 +00:00
Chien-Chin Huang	0542fd485f	[DSD] Implement broadcast_from_rank0 option for model state_dict (#125338 ) Summary: This is useful if users would like to avoid CPU memory OOM when loading from a full state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125338 Approved by: https://github.com/weifengpy ghstack dependencies: #125708	2024-05-08 07:11:18 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Lucas Pasqualin	46a25cc0db	[DCP] Adds support for non-primatives in async_save by deep copying during cpu offloading (#123941 ) Adds support for non-primatives in async_save by deep copying during cpu offloading. If users are not type checking, the expectation in async is likely that the object is copied Differential Revision: [D56065237](https://our.internmc.facebook.com/intern/diff/D56065237/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123941 Approved by: https://github.com/fegin	2024-04-16 20:49:25 +00:00
Lucas Pasqualin	d838cc8f66	[DCP] Returns a copy of sd in copy sd (#123567 ) I found that returning the copy is actually useful in situations where you might do something like: ``` ret = _copy_state_dict(obj, cache) ret.update(some_other_values) ``` and would like `cache` not to change structure from `ret.update(some_other_values)`. Open to some notes here, not returning a copy might force the user to do some additional copies for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123567 Approved by: https://github.com/wz337	2024-04-16 15:29:32 +00:00
Lucas Pasqualin	620aaaf0cb	[DCP] Adds ability to create a CPU state dict that is both shared and pinned (#122338 ) [DCP] Adds ability to create a CPU state dict that is both shared and pinned, as well as a new utility specific to copying the state dict https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c Pull Request resolved: https://github.com/pytorch/pytorch/pull/122338 Approved by: https://github.com/fegin	2024-04-03 20:05:01 +00:00
Chien-Chin Huang	0811f15270	[DCP][state_dict] Let _offload_state_dict_to_cpu to return the companion_obj if it exist. (#121273 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/121273 Approved by: https://github.com/wz337, https://github.com/LucasLLC	2024-03-08 00:24:29 +00:00
Chien-Chin Huang	5abf7972d1	[DCP][state_dict] Implement pin_memory and shared_memory copy for _offload_state_dict_to_cpu (#120378 ) Summary This PR extend `_offload_state_dict_to_cpu` to accept a `cpu_offload_state_dict` argument. If `cpu_offload_state_dict` is not None, `_offload_state_dict_to_cpu` will use `copy_` to copy the GPU data to the CPU tensors. This allows users to pass a pin_memory or share_memory version of `cpu_offload_state_dict`. This PR also adds `_create_cpu_state_dict` to allow users to easily create a pin_memory or share_memory cpu state_dict. Performance improvement ``` # The micro-benchmark has a source state_dict with 150 tensors, and each tensor is 50MB. # The micro-benchmark is run on a H100 machine with PCIe 5 cpu_state_dict_2 = _create_cpu_state_dict(state_dict, pin_memory=True) cpu_state_dict_3 = _create_cpu_state_dict(state_dict, share_memory=True) # GPU->CPU memory: 4.6556 seconds cpu_state_dict = _offload_state_dict_to_cpu(state_dict) # GPU->pin memory: 0.1566 seconds _offload_state_dict_to_cpu(state_dict, cpu_offload_state_dict=cpu_state_dict_2) # GPU->shared memory: 0.5509 seconds (variation is quite large) _offload_state_dict_to_cpu(state_dict, cpu_offload_state_dict=cpu_state_dict_3) # GPU->pin memory->shared memory: 0.2550 seconds _offload_state_dict_to_cpu(state_dict, cpu_offload_state_dict=cpu_state_dict_2) _offload_state_dict_to_cpu(cpu_state_dict_2, cpu_offload_state_dict=cpu_state_dict_3) ``` Differential Revision: [D54045845](https://our.internmc.facebook.com/intern/diff/D54045845/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120378 Approved by: https://github.com/LucasLLC	2024-03-05 17:48:15 +00:00
Yue Dong	2bda6b4cb8	[DTensor] Only wait on AsyncCollectiveTensor after DTensor-based state dict loading (#119716 ) Summary: This PR serves as a follow-up fix to address numerical correctness concerns identified in PR #118197, and we should only wait on `AsyncCollectiveTensor`. Without the change, we occasionally ran into exception: `AttributeError("'Tensor' object has no attribute 'wait'")` Test Plan: CI: Wait for the CI test Test with prod model: - Tested with models and no-longer ran into the exception after checkpoint loading. Differential Revision: D53680406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119716 Approved by: https://github.com/fegin, https://github.com/Skylion007, https://github.com/wz337	2024-02-13 04:30:45 +00:00
Catherine Lee	f9971daaee	Fix divergence between internal + external (#118509 ) D53049807 and https://github.com/pytorch/pytorch/pull/118197 got out of sync somehow Fixing externally since I'm pretty sure the internal version is correct Pull Request resolved: https://github.com/pytorch/pytorch/pull/118509 Approved by: https://github.com/malfet	2024-01-29 14:53:50 +00:00
Chien-Chin Huang	4f78869c18	[state_dict] Calls wait() for the DTensor to_local() result (#118197 ) See the discussion in https://github.com/pytorch/pytorch/pull/117799. There are some issues when returning a AsyncCollectiveTensor (haven't found the root causes), including OOM and unexpected values. This PR forces `_gather_state_dict()` to be synchronous with respect to the mian stream. Differential Revision: [D53049807](https://our.internmc.facebook.com/intern/diff/D53049807/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118197 Approved by: https://github.com/wz337, https://github.com/LucasLLC	2024-01-25 17:14:08 +00:00
Chien-Chin Huang	cc28f61fa3	[DCP][BE] Move DCP._state_dict_utils out from DCP (#115523 ) DCP._state_dict_utils is also used by FSDP. This can cause circular import sometimes. Move it out from DCP to avoid circular import. Differential Revision: [D52022440](https://our.internmc.facebook.com/intern/diff/D52022440/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115523 Approved by: https://github.com/wz337	2023-12-13 08:59:48 +00:00

24 Commits