Place local_used_map_dev_ on CPU for MTIA (#111581)

Summary:
The dist backend used on MTIA doesn't support int32 allreduce for now. The local_used_map_dev_ has to be placed on CPU.

Test Plan: See diff D50387636

Differential Revision: D50460304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111581
Approved by: https://github.com/fduwjj
This commit is contained in:
Jun Luo 2023-10-24 17:02:44 +00:00 committed by PyTorch MergeBot
parent ad3572a5dc
commit fb7047e1a1

View File

@ -291,8 +291,11 @@ void Reducer::initialize_local_used_map() {
// This tensor needs to be on the same device as the replica params because
// backend such as NCCL may not support CPU tensors, and hence it might not
// work if we always put it on CPU.
options = options.device(params_[0].device());
// work if we always put it on CPU. The dist backend for MTIA doesn't support
// int32 allreduce for now, so it has to be placed on CPU.
options = options.device(
(params_[0].is_mtia()) ? c10::Device(c10::DeviceType::CPU)
: params_[0].device());
local_used_map_dev_ = at::empty({static_cast<long>(variable_count)}, options);
}