pytorch/tools/setup_helpers
Ke Wen 5bf0c3518c Detect NVSHMEM location (#153010)
### Changes
- Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem`
- Added link dir via `target_link_directories`
- Removed direct dependency on mlx5
- Added preload rule (following other other NVIDIA libs)

### Plan of Record
1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits)
2. Developer experience: at compile time, prefers wheel dependency than using Git submodule
General rule: submodule for small lib that torch can statically link with
If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example)
3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac
4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all.

For now, we have symbol dependency on two particular libs from NVSHMEM:
- libnvshmem_host.so: contains host side APIs;
- libnvshmem_device.a: contains device-side global variables AND device function impls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010
Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007
2025-05-07 23:35:04 +00:00
..
__init__.py
BUILD.bazel
build.bzl
cmake_utils.py Add scripts to check xrefs and urls (#151844) 2025-04-28 09:30:07 +00:00
cmake.py Detect NVSHMEM location (#153010) 2025-05-07 23:35:04 +00:00
env.py Remove conda refs in tools (#152368) 2025-04-29 02:45:47 +00:00
gen_unboxing.py [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374) 2024-12-29 17:23:13 +00:00
gen_version_header.py [3/N] Apply py39 ruff fixes (#142115) 2024-12-11 17:50:10 +00:00
gen.py [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374) 2024-12-29 17:23:13 +00:00
generate_code.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
generate_linker_script.py Allow users to overwrite ld with environment variable in linker optimization script (#137331) 2024-11-26 22:54:24 +00:00