mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/elastic/pull/148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56811 Moves docs sphinx `*.rst` files from the torchelastic repository to torch. Note: only moves the rst files the next step is to link it to the main pytorch `index.rst` and write new `examples.rst` Reviewed By: H-Huang Differential Revision: D27974751 fbshipit-source-id: 8ff9f242aa32e0326c37da3916ea0633aa068fc5
62 lines
1.4 KiB
ReStructuredText
62 lines
1.4 KiB
ReStructuredText
Elastic Agent
|
|
==============
|
|
|
|
.. automodule:: torch.distributed.elastic.agent
|
|
.. currentmodule:: torch.distributed.elastic.agent
|
|
|
|
Server
|
|
--------
|
|
|
|
.. automodule:: torch.distributed.elastic.agent.server
|
|
|
|
Below is a diagram of an agent that manages a local group of workers.
|
|
|
|
.. image:: agent_diagram.jpg
|
|
|
|
Concepts
|
|
--------
|
|
|
|
This section describes the high-level classes and concepts that
|
|
are relevant to understanding the role of the ``agent`` in torchelastic.
|
|
|
|
.. currentmodule:: torch.distributed.elastic.agent.server
|
|
|
|
.. autoclass:: ElasticAgent
|
|
:members:
|
|
|
|
.. autoclass:: WorkerSpec
|
|
:members:
|
|
|
|
.. autoclass:: WorkerState
|
|
:members:
|
|
|
|
.. autoclass:: Worker
|
|
:members:
|
|
|
|
.. autoclass:: WorkerGroup
|
|
:members:
|
|
|
|
Implementations
|
|
-------------------
|
|
|
|
Below are the agent implementations provided by torchelastic.
|
|
|
|
.. currentmodule:: torch.distributed.elastic.agent.server.local_elastic_agent
|
|
.. autoclass:: LocalElasticAgent
|
|
|
|
|
|
Extending the Agent
|
|
---------------------
|
|
|
|
To extend the agent you can implement ```ElasticAgent`` directly, however
|
|
we recommend you extend ``SimpleElasticAgent`` instead, which provides
|
|
most of the scaffolding and leaves you with a few specific abstract methods
|
|
to implement.
|
|
|
|
.. currentmodule:: torch.distributed.elastic.agent.server
|
|
.. autoclass:: SimpleElasticAgent
|
|
:members:
|
|
:private-members:
|
|
|
|
.. autoclass:: torch.distributed.elastic.agent.server.api.RunResult
|