Resolves https://github.com/pytorch/pytorch/issues/146767.
May also resolve https://github.com/pytorch/pytorch/issues/147584.
### Summary
This PR removes the RNG tracker init from the `distribute_tensor` call for the following reasons:
1. if the user does not use random ops on DTensor, there's no need to init DTensor RNG which currently requires CUDA device to be present.
2. this complies with the 0-communication semantic of `src_data_rank=None` shard distribution.
Besides, `OffsetBasedRNGTracker` only accepts `DeviceMesh` argument to its constructor method.
### Consequence
DTensor RNG initialization is delayed till the first DTensor random ops call or `torch.distributed.tensor.random.manual_seed`.
### Test
`pytest test/distributed/tensor/test_random_ops.py`
`pytest test/distributed/tensor/parallel/test_tp_random_state.py`
`pytest test/distributed/tensor/parallel/test_tp_style.py`
Differential Revision: [D70201856](https://our.internmc.facebook.com/intern/diff/D70201856)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147025
Approved by: https://github.com/kwen2501
Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices.
Chnages: There are general changes in common_dtesnor module for device type generalization so that tests can be executed on non cuda devices too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139749
Approved by: https://github.com/kwen2501