Also partially fixes#122109
This PR:
- We add a C++ flag (only_lift_cpu_tensors) to toggle the
torch.tensor(1, device='cuda') ctor strategy.
When false (default), it does the current PyTorch behavior
of unconditionally constructing a concrete CUDA tensor then calling
lift_fresh on it. When true, we instead construct a concrete CPU
tensor, call lift_fresh, and then call Tensor.to(device) (under any ambient
modes).
- FakeTensorMode flips this flag depending on if CUDA is available or
not. We don't unconditionally set the flag to True because that is
likely BC-breaking.
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124413
Approved by: https://github.com/eellison
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.
# Design
We maintain a flag for each backend to manage the lazy initialization state separately.
# Additional Context
No need more UTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118846
Approved by: https://github.com/malfet