tensorflow

mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

An Open Source Machine Learning Framework for Everyone tensorflow.org

Go to file

Sevin Fide Varoglu c655468288 PR #31375 : [XLA:GPU] Add NVLink domain check to CollectiveBackendAssigner Imported from GitHub PR https://github.com/openxla/xla/pull/31375 📝 Summary of Changes This PR updates the CollectiveBackendAssigner pass to account for NVLink domain connectivity when deciding between NVSHMEM and DEFAULT backends. It does this by adding a slice_size parameter to the compilation pipeline and introducing an IsIntraNVLinkDomain check. 🎯 Justification The CollectiveBackendAssigner now uses NVSHMEM not only for single-host scenarios, but also when all devices are within the same NVLink domain. 🚀 Kind of Contribution ⚡️ Performance Improvement, 🧪 Tests 📊 Benchmark (for Performance Improvements) H100 \| \| NVSHMEM enabled \| NVSHMEM disabled \| \|----------\|----------\|----------\| \| llama31_8b_fp8_1x8 \| 1095330 us \| 1093816 us \| \| llama31_8b_bf16_2x8 \| 1368948 us \| 1370896 us \| \| llama31_8b_fp8_2x8 \| 1096447 us \| 1092437 us \| \| llama31_70b_fp8_16x8 \| 9723821 us \| 9707544 us \| 🧪 Unit Tests: Added unit tests to xla/service/gpu/transforms/collectives/collective_backend_assigner_test.cc 🧪 Execution Tests: Tested with llama3-8b on 2 GB200 nodes (fsdp = 8). The average step time in NVSHMEM case was 3.69s (vs. 3.76s in the default case). Copybara import of the project: -- a02b77cec9622314af01ae481d0fb28b149f1b45 by Sevin Varoglu <svaroglu@nvidia.com>: Add NVLink domain check to CollectiveBackendAssigner Merging this change closes #31375 PiperOrigin-RevId: 826649437		2025-10-31 15:48:52 -07:00
.github	Bump the github-actions group with 6 updates	2025-10-01 08:14:08 +00:00
ci	Update ML Build Docker container to use hermetic C++	2025-10-30 13:25:44 -07:00
tensorflow	Update tflite schema to allow external buffer	2025-10-31 15:07:53 -07:00
third_party	PR #31375 : [XLA:GPU] Add NVLink domain check to CollectiveBackendAssigner	2025-10-31 15:48:52 -07:00
tools
.bazelignore
.bazelrc	Remove usage of mirrored `tar` files from CI because hermetic `xz` tool helps to unpack `tar.xz` faster.	2025-10-22 16:08:18 -07:00
.bazelversion	Update Bazel version to 7.7.0.	2025-10-30 10:27:38 -07:00
.clang-format
.gitignore
.pylintrc
.zenodo.json
arm_compiler.BUILD
AUTHORS
BUILD
CITATION.cff
CODE_OF_CONDUCT.md
CODEOWNERS
configure
configure.cmd
configure.py
CONTRIBUTING.md
ISSUES.md
LICENSE
models.BUILD
README.md
RELEASE.md	Add i4 support in tfl.slice	2025-10-28 15:27:41 -07:00
requirements_lock_3_9.txt	Update from flatbuffers 25.2.10 to 25.9.23.	2025-10-01 16:25:25 -07:00
requirements_lock_3_10.txt	Update from flatbuffers 25.2.10 to 25.9.23.	2025-10-01 16:25:25 -07:00
requirements_lock_3_11.txt	Update from flatbuffers 25.2.10 to 25.9.23.	2025-10-01 16:25:25 -07:00
requirements_lock_3_12.txt	Update from flatbuffers 25.2.10 to 25.9.23.	2025-10-01 16:25:25 -07:00
requirements_lock_3_13.txt	Update from flatbuffers 25.2.10 to 25.9.23.	2025-10-01 16:25:25 -07:00
SECURITY.md
WORKSPACE	Replace RBE Docker container image: use Docker image without pre-installed CUDA packages.	2025-09-23 15:16:44 -07:00

README.md

`Documentation`

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

Other devices (DirectX and MacOS-metal) are supported using Device Plugins.

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPI.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow Tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the Contribution Guidelines. This project adheres to TensorFlow's Code of Conduct. By participating, you are expected to uphold this code.

We use GitHub Issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development.

Patching guidelines

Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:

Clone the TensorFlow repository and switch to the appropriate branch for your desired version—for example, r2.8 for version 2.8.
Apply the desired changes (i.e., cherry-pick them) and resolve any code conflicts.
Run TensorFlow tests and ensure they pass.
Build the TensorFlow pip package from source.

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build Community Builds Table.

Official Builds

Build Type	Status	Artifacts
Linux CPU		PyPI
Linux GPU		PyPI
Linux XLA		TBA
macOS		PyPI
Windows CPU		PyPI
Windows GPU		PyPI
Android		Download
Raspberry Pi 0 and 1		Py3
Raspberry Pi 2 and 3		Py3
Libtensorflow MacOS CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Linux CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Linux GPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Windows CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Windows GPU	Status Temporarily Unavailable	Nightly Binary Official GCS