An Open Source Machine Learning Framework for Everyone tensorflow.org
Go to file
Terry Sun 8134117476 PR #32836: [GPU] Dispatch S-curve model to single-partition multi-host topology
Imported from GitHub PR https://github.com/openxla/xla/pull/32836

📝 Summary of Changes
Updated SINGLE_HOST communication type to SINGLE_PARTITION (fast-interconnect domain) to meet the need of multi-node NVLink (MNNVL) topology. Piped auto-detected partition size for communication type determination, also exposed partition size in SolGPUCostModel::Config for AOT compilation.

🎯 Justification
S-curve model cannot handle NVLink latency, single fast-interconnect domain including MNNVL topology should use latency table model. This PR updates the routing mechanism so that MNNVL will be treated as a single partition, while previously host is assumed equivalent to partition.

🚀 Kind of Contribution
 New Feature

📊 Benchmark (for Performance Improvements)
N/A

🧪 Unit Tests:
Added unit tests for model dispatching mechanism.

🧪 Execution Tests:
Behavior unchanged for non-MNNVL topology, N/A.

Copybara import of the project:

--
a9544375934873f7b888fdb5ff6c9dc6ee8b0e6c by Terry Sun <tesun@nvidia.com>:

use partition size for static model dispatching

--
e3445a5deb8da10146e90c50da5598f91cfe0a69 by Terry Sun <tesun@nvidia.com>:

expose partition size to config

--
212535ce891b8eb96ebb3c1e215a91d2b5035594 by Terry Sun <tesun@nvidia.com>:

better modularity

--
a9fe8a0f89dea9e2811d76a3570c7398df8dd756 by Terry Sun <tesun@nvidia.com>:

better code structure and doc string

--
a64a2b5ed1d45d815c6a2c47628b4d9ebb8368bd by Terry Sun <tesun@nvidia.com>:

update naming

Merging this change closes #32836

PiperOrigin-RevId: 826697791
2025-10-31 18:28:25 -07:00
.github Bump the github-actions group with 6 updates 2025-10-01 08:14:08 +00:00
ci Update ML Build Docker container to use hermetic C++ 2025-10-30 13:25:44 -07:00
tensorflow Remove deprecated float_format/double_format in python proto text_format. 2025-10-31 16:30:05 -07:00
third_party PR #32836: [GPU] Dispatch S-curve model to single-partition multi-host topology 2025-10-31 18:28:25 -07:00
tools
.bazelignore
.bazelrc Remove usage of mirrored tar files from CI because hermetic xz tool helps to unpack tar.xz faster. 2025-10-22 16:08:18 -07:00
.bazelversion Update Bazel version to 7.7.0. 2025-10-30 10:27:38 -07:00
.clang-format
.gitignore
.pylintrc
.zenodo.json
arm_compiler.BUILD
AUTHORS
BUILD
CITATION.cff
CODE_OF_CONDUCT.md
CODEOWNERS
configure
configure.cmd
configure.py
CONTRIBUTING.md
ISSUES.md
LICENSE
models.BUILD
README.md
RELEASE.md Add i4 support in tfl.slice 2025-10-28 15:27:41 -07:00
requirements_lock_3_9.txt Update from flatbuffers 25.2.10 to 25.9.23. 2025-10-01 16:25:25 -07:00
requirements_lock_3_10.txt Update from flatbuffers 25.2.10 to 25.9.23. 2025-10-01 16:25:25 -07:00
requirements_lock_3_11.txt Update from flatbuffers 25.2.10 to 25.9.23. 2025-10-01 16:25:25 -07:00
requirements_lock_3_12.txt Update from flatbuffers 25.2.10 to 25.9.23. 2025-10-01 16:25:25 -07:00
requirements_lock_3_13.txt Update from flatbuffers 25.2.10 to 25.9.23. 2025-10-01 16:25:25 -07:00
SECURITY.md
WORKSPACE Replace RBE Docker container image: use Docker image without pre-installed CUDA packages. 2025-09-23 15:16:44 -07:00

Python PyPI DOI CII Best Practices OpenSSF Scorecard Fuzzing Status Fuzzing Status OSSRank Contributor Covenant

Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

Other devices (DirectX and MacOS-metal) are supported using Device Plugins.

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPI.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow Tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the Contribution Guidelines. This project adheres to TensorFlow's Code of Conduct. By participating, you are expected to uphold this code.

We use GitHub Issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development.

Patching guidelines

Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:

  • Clone the TensorFlow repository and switch to the appropriate branch for your desired version—for example, r2.8 for version 2.8.
  • Apply the desired changes (i.e., cherry-pick them) and resolve any code conflicts.
  • Run TensorFlow tests and ensure they pass.
  • Build the TensorFlow pip package from source.

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build Community Builds Table.

Official Builds

Build Type Status Artifacts
Linux CPU Status PyPI
Linux GPU Status PyPI
Linux XLA Status TBA
macOS Status PyPI
Windows CPU Status PyPI
Windows GPU Status PyPI
Android Status Download
Raspberry Pi 0 and 1 Status Py3
Raspberry Pi 2 and 3 Status Py3
Libtensorflow MacOS CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux GPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows GPU Status Temporarily Unavailable Nightly Binary Official GCS

Resources

Learn more about the TensorFlow Community and how to Contribute.

Courses

License

Apache License 2.0