pytorch/docs
Paul de Supinski 768a1017c5 Allow parallel start NUMA binding (#161576)
# Context
In #161183, we added NUMA-binding support for `Callable` entrypoints to `elastic_launch`.

However, we would raise an exception if the subprocesses would be spawned in parallel via `ThreadPoolExecutor`, which is an option configurable via the `TORCH_MP_PARALLEL_START` environment variable (see diff).

The logic here was that `os.sched_setaffinity`, which we used to set CPU affinities, is [per process](https://docs.python.org/3/library/os.html#os.sched_setaffinity), so there could be a race condition during a parallel start:

> Restrict the process with PID pid (or the current process if zero) to a set of CPUs. mask is an iterable of integers representing the set of CPUs to which the process should be restricted.

But on further reading, the Linux docs say [`sched_setaffinity` is per *thread*.](https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html) As it turns out, the Python doc is a misnomer.

I [verified that `sched_setaffinity` only affects the calling thread, not the entire calling process.](https://gist.github.com/pdesupinski/7e2de3cbe5bb48d489f257b83ccddf07)

The upshot is that we actually *can* safely use the inheritance trick from #161183 even with parallel start, since the setting will be inherited from the calling thread, and `os.sched_setaffinity` only affects the calling thread.

# This PR
Remove restrictions against parallel start for NUMA binding.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161576
Approved by: https://github.com/d4l3k
2025-08-28 01:15:58 +00:00
..
cpp Removing conda references from PyTorch Docs (#152702) 2025-05-20 20:33:28 +00:00
source Allow parallel start NUMA binding (#161576) 2025-08-28 01:15:58 +00:00
.gitignore
libtorch.rst Add ROCm documentation to libtorch (C++) reST. (#136378) 2024-09-25 02:30:56 +00:00
make.bat
Makefile [ONNX] Filter out torchscript sentences (#158850) 2025-07-24 20:59:06 +00:00
README.md
requirements.txt Revert "Switch to standard pep517 sdist generation (#152098)" 2025-07-01 14:14:52 +00:00

Please see the Writing documentation section of CONTRIBUTING.md for details on both writing and building the docs.