[ez][c10d] change ERROR to WARNING (#134349)

Summary:
Change error to warning because TCPStore can be torn down during a normal shutdown. It's OK if we're unable to access TCPStore. Should not be an error.

Test Plan:
Ran locally

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134349
Approved by: https://github.com/fduwjj, https://github.com/wconstab
This commit is contained in:
Chirag Pandya 2024-08-25 14:22:55 +00:00 committed by PyTorch MergeBot
parent 4648848696
commit 08d111250a

View File

@ -1399,7 +1399,7 @@ void ProcessGroupNCCL::heartbeatMonitor() {
checkExceptionDump =
globalStore_->check({std::string(EXCEPTION_DUMP)});
} catch (const std::exception& e) {
LOG(ERROR)
LOG(WARNING)
<< logPrefix()
<< "Failed to check the \"should dump\" flag on TCPStore, "
<< "(maybe TCPStore server has shut down too early), with error: "