pytorch/docs/source
Aliaksandr Ivanou 4e181dfc35 [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#60925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925

* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies

Issues resolved:

https://github.com/pytorch/pytorch/issues/60716
https://github.com/pytorch/pytorch/issues/60754

Test Plan:
sandcastle

    python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
    python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts

    python -m torch.distributed.launch --nproc_per_node=4  --use_env --no_python  main.py -> produces error
    python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py -> no warning
    python -m torch.distributed.launch --nproc_per_node=4  --no_python  main.py ->warning

Output of running torch.distributed.launch without --use_env:

    $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torch.distributed.run.
    Note that --use_env is set by default in torch.distributed.run.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ('LOCAL_RANK')` instead.

New section:

{F628923078}

{F628974089}

Reviewed By: kiukchung, cbalioglu

Differential Revision: D29413019

fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630
2021-06-30 23:31:02 -07:00
..
_static DOC Improve documentation for LayerNorm (#59178) 2021-06-07 14:34:10 -07:00
_templates Remove master documentation from being indexable by search engines (#58056) 2021-05-18 06:20:09 -07:00
community Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
elastic [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#60925) 2021-06-30 23:31:02 -07:00
notes [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421) 2021-06-24 17:34:02 -07:00
rpc Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
scripts Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
__config__.rst Fix __config__ docs (#48557) 2020-11-29 23:57:06 -08:00
amp.rst Moves grid_sampler to autocast promote list (#58618) 2021-06-21 10:22:36 -07:00
autograd.rst Add no-grad inference mode note (#58513) 2021-05-25 13:06:54 -07:00
backends.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
benchmark_utils.rst Expand benchmark utils docs (#51664) 2021-02-04 00:22:41 -08:00
bottleneck.rst [docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763) 2018-04-19 13:15:27 -04:00
checkpoint.rst Stashing checkpointing RNG states based on devices of arg tensors (#14518) 2018-12-11 09:48:45 -08:00
complex_numbers.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
conf.py Use proper Google Analytics id (#56578) 2021-05-04 13:23:16 -07:00
cpp_extension.rst correct some cpp extension code usages and documents (#39766) 2020-06-10 08:31:22 -07:00
cpp_index.rst Add C++ Landing Page (#38450) 2020-05-14 16:02:01 -07:00
cuda.rst breakup optim, cuda documentation (#55673) 2021-04-14 12:44:00 -07:00
cudnn_persistent_rnn.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
cudnn_rnn_determinism.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
data.rst [DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528) 2021-04-22 09:40:45 -07:00
ddp_comm_hooks.rst [Gradient Compression] Remove unnecessary warning on the rst file and the check on C++ version (#58170) 2021-05-12 14:15:10 -07:00
distributed.elastic.rst [1/n][torch/elastic] Move torchelastic docs *.rst (#148) 2021-05-04 00:57:56 -07:00
distributed.optim.rst [Reland] Update and expose ZeroRedundancyOptimizer docs (#53112) 2021-03-02 14:16:12 -08:00
distributed.rst [reland] Document debugability features in torch.distributed (#59726) 2021-06-09 16:40:11 -07:00
distributions.rst Add sample validation for LKJCholesky.log_prob (#52763) 2021-02-25 16:12:29 -08:00
dlpack.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
docutils.conf Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778) 2020-05-04 14:32:35 -07:00
fft.rst Use autosummary on torch.fft, torch.linalg (#55748) 2021-04-13 12:02:36 -07:00
futures.rst Update docs to mention CUDA support for Future (#50048) 2021-05-11 08:26:33 -07:00
fx.rst [FX][docs][EZ] Fix link to fuser example (#59670) 2021-06-08 17:32:55 -07:00
hub.rst Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204) 2020-09-21 14:17:21 -07:00
index.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
jit_builtin_functions.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
jit_language_reference_v2.rst Fix hasattr support type (#57950) 2021-05-10 12:21:56 -07:00
jit_language_reference.rst add type annotations to torch.nn.modules.conv (#49564) 2021-01-15 11:16:11 -08:00
jit_python_reference.rst [JIT] improve documentation (#57991) 2021-05-19 11:47:32 -07:00
jit_unsupported.rst [JIT] Update docs for recently added features (#45232) 2020-09-28 18:17:42 -07:00
jit.rst Remove caption for Lang Reference (#56526) 2021-04-20 14:33:42 -07:00
linalg.rst Add torch.linalg.inv_ex without checking for errors by default (#58039) 2021-05-13 09:42:15 -07:00
math-quantizer-equation.png adding quantization.rst file for quantization feature (#27559) 2019-10-09 16:45:09 -07:00
mobile_optimizer.rst Mod lists to neutral+descriptive terms in caffe2/docs (#49803) 2020-12-23 11:37:11 -08:00
model_zoo.rst add/move a few apis in torch.hub (#18758) 2019-04-10 23:10:39 -07:00
multiprocessing.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
name_inference.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
named_tensor.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
nn.functional.rst Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
nn.init.rst Bag of documentation fixes; fix more sphinx warnings (#27850) 2019-10-15 07:31:14 -07:00
nn.rst Added GLU and FeatureAlphaDropout to nn docs (#60590) 2021-06-24 08:00:18 -07:00
onnx.rst Update Autograd Export Docs (#56594) (#59534) 2021-06-15 12:23:00 -07:00
optim.rst To add Rectified Adam Algorithm to Optimizers (#58968) 2021-06-23 18:27:57 -07:00
package.rst [package] fix tutorial link (#60113) 2021-06-16 11:27:25 -07:00
pipeline.rst Add tutorials to pipeline docs. (#55209) 2021-04-05 20:01:00 -07:00
profiler.rst docs: fix profiler docstring (#55750) 2021-04-13 00:23:14 -07:00
quantization-support.rst [docs][quant] Add fx graph mode quant api doc (#55306) 2021-04-05 13:56:23 -07:00
quantization.rst quantization: improve documentation on natively supported backends (#58925) 2021-06-07 17:29:03 -07:00
random.rst Remove duplicated entries in random.rst (#39725) 2020-06-10 16:51:15 -07:00
rpc.rst Add a disclaimer about limited CUDA support in RPC (#58023) 2021-05-12 00:11:22 -07:00
sparse.rst Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
special.rst [special] add zeta (#59623) 2021-06-24 00:00:12 -07:00
storage.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
tensor_attributes.rst Remove legacy constructor calls from pytorch codebase. (#54142) 2021-04-11 15:45:17 -07:00
tensor_view.rst Conjugate View (#54987) 2021-06-04 14:12:41 -07:00
tensorboard.rst Add method add_hparams to API doc (#27344) 2019-10-03 17:07:45 -07:00
tensors.rst Implemented torch.corrcoef (#60420) 2021-06-30 12:36:02 -07:00
testing.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
torch.nn.intrinsic.qat.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.intrinsic.quantized.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.intrinsic.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.qat.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.quantized.dynamic.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
torch.nn.quantized.rst [quant] add docs for embedding/embedding_bag (#51770) 2021-02-05 11:43:15 -08:00
torch.overrides.rst Add documentation for torch.overrides submodule. (#48170) 2020-11-30 11:25:31 -08:00
torch.quantization.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.rst Implemented torch.corrcoef (#60420) 2021-06-30 12:36:02 -07:00
type_info.rst DOC: split quantization.rst into smaller pieces (#41321) 2020-07-25 23:59:40 -07:00