Place local_used_map_dev_ on CPU for MTIA (#111581)

Summary: The dist backend used on MTIA doesn't support int32 allreduce for now. The local_used_map_dev_ has to be placed on CPU. Test Plan: See diff D50387636 Differential Revision: D50460304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111581 Approved by: https://github.com/fduwjj
2025-12-07 12:21:27 +01:00 · 2023-10-24 17:02:44 +00:00 · 2023-10-24 17:02:44 +00:00 · fb7047e1a1
commit fb7047e1a1
parent ad3572a5dc
1 changed files with 5 additions and 2 deletions
--- a/torch/csrc/distributed/c10d/reducer.cpp
+++ b/torch/csrc/distributed/c10d/reducer.cpp
@ -291,8 +291,11 @@ void Reducer::initialize_local_used_map() {

  // This tensor needs to be on the same device as the replica params because
  // backend such as NCCL may not support CPU tensors, and hence it might not
-  // work if we always put it on CPU.
-  options = options.device(params_[0].device());
+  // work if we always put it on CPU. The dist backend for MTIA doesn't support
+  // int32 allreduce for now, so it has to be placed on CPU.
+  options = options.device(
+      (params_[0].is_mtia()) ? c10::Device(c10::DeviceType::CPU)
+                             : params_[0].device());
  local_used_map_dev_ = at::empty({static_cast<long>(variable_count)}, options);
 }