pytorch/torch/distributed/elastic
Amandeep Chhabra e15848669f [1/n]adding torch.distributed.run option to provide destination for event logging (#154644) (#155268)
Summary:

**Problem Statement**
Currently, torch distributed elastic does not support to an option specify destination for event logging from torch.distributed.run.
*recording events to default destination:* https://fburl.com/code/7f9b0993
The default destination is "null".

***Solution***
adding option in torch.destributed.run to specify event_logging_destination. The default value will be "null" which is current default so it won;t affect users unless the specify it via command line.

Test Plan:

https://www.internalfb.com/mlhub/pipelines/runs/mast/f738408681-TrainingApplication_torch_distributed_run_3?job_attempt=0&version=0&tab=execution_details&env=PRODUCTION

Rollback Plan:

Reviewed By: kiukchung

Differential Revision: D75183591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155268
Approved by: https://github.com/d4l3k
2025-06-09 10:43:52 +00:00
..
agent [1/n]adding torch.distributed.run option to provide destination for event logging (#154644) (#155268) 2025-06-09 10:43:52 +00:00
events [BE][PYFMT] migrate PYFMT for torch.{distributed,distributions} to ruff format (#144547) 2025-02-28 07:35:56 +00:00
metrics [BE][PYFMT] migrate PYFMT for torch.{distributed,distributions} to ruff format (#144547) 2025-02-28 07:35:56 +00:00
multiprocessing Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)" 2025-05-16 08:29:26 +00:00
rendezvous Expose the rendezvous keepalive arguments (#145228) 2025-03-03 19:11:56 +00:00
timer Propagate callable parameter types using ParamSpec (#142306) (#151014) 2025-04-13 20:38:11 +00:00
utils remove allow-untyped-defs from torch/distributed/elastic/utils/logging.py (#154625) 2025-05-30 07:37:56 +00:00
__init__.py
control_plane.py Propagate callable parameter types using ParamSpec (#142306) (#151014) 2025-04-13 20:38:11 +00:00