mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
## Motivation
The default device for tensor.device both for sharded as well as non sharded is set to cuda by default. Hence while checking the FSDP UTs we see the following errors. This change updates the actual device type based on the created tensor.
```
[rank3] File "/root/repos/pytorch-training-tests/tests/pytorch/v2.4.0/distributed_hpu/fsdp/test_fsdp_dtensor_state_dict.py", line 143, in test_dtensor_sharded_tensor_state_dict_identical
[rank3] sharded_tensor_sd = ref_model.state_dict()
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1944, in state_dict
[rank3] hook_result = hook(self, destination, prefix, local_metadata)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank3] return func(*args, **kwargs)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/_state_dict_utils.py", line 752, in _post_state_dict_hook
[rank3] tensor.device,
[rank3] File "/usr/local/lib/python3.10/dist-packages/typing_extensions.py", line 2853, in wrapper
[rank3] return arg(*args, **kwargs)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/distributed/_shard/sharded_tensor/api.py", line 1152, in __torch_function__
[rank3] return dispatch(st_instance, func)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/distributed/_shard/sharded_tensor/api.py", line 1134, in dispatch
[rank3] return _SHARDED_OPS[func](types, args, kwargs, st._process_group)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/distributed/_shard/op_registry_utils.py", line 33, in wrapper
[rank3] return wrapped_func(types, args, kwargs, process_group)
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py", line 52, in tensor_device
[rank3] dev = torch.device(torch.cuda.current_device())
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 878, in current_device
[rank3] _lazy_init()
[rank3] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 305, in _lazy_init
[rank3] raise AssertionError("Torch not compiled with CUDA enabled")
[rank3] AssertionError: Torch not compiled with CUDA enabled
````
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134994
Approved by: https://github.com/fegin
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| _common_utils.py | ||
| _debug_utils.py | ||
| _dynamo_utils.py | ||
| _exec_order_utils.py | ||
| _flat_param.py | ||
| _fsdp_extensions.py | ||
| _init_utils.py | ||
| _limiter_utils.py | ||
| _optim_utils.py | ||
| _runtime_utils.py | ||
| _shard_utils.py | ||
| _state_dict_utils.py | ||
| _trace_utils.py | ||
| _traversal_utils.py | ||
| _unshard_param_utils.py | ||
| _wrap_utils.py | ||
| api.py | ||
| fully_sharded_data_parallel.py | ||
| sharded_grad_scaler.py | ||
| wrap.py | ||