Imported from GitHub PR https://github.com/openxla/xla/pull/31375 📝 Summary of Changes This PR updates the CollectiveBackendAssigner pass to account for NVLink domain connectivity when deciding between NVSHMEM and DEFAULT backends. It does this by adding a slice_size parameter to the compilation pipeline and introducing an IsIntraNVLinkDomain check. 🎯 Justification The CollectiveBackendAssigner now uses NVSHMEM not only for single-host scenarios, but also when all devices are within the same NVLink domain. 🚀 Kind of Contribution ⚡️ Performance Improvement, 🧪 Tests 📊 Benchmark (for Performance Improvements) H100 | | NVSHMEM enabled | NVSHMEM disabled | |----------|----------|----------| | llama31_8b_fp8_1x8 | 1095330 us | 1093816 us | | llama31_8b_bf16_2x8 | 1368948 us | 1370896 us | | llama31_8b_fp8_2x8 | 1096447 us | 1092437 us | | llama31_70b_fp8_16x8 | 9723821 us | 9707544 us | 🧪 Unit Tests: Added unit tests to xla/service/gpu/transforms/collectives/collective_backend_assigner_test.cc 🧪 Execution Tests: Tested with llama3-8b on 2 GB200 nodes (fsdp = 8). The average step time in NVSHMEM case was 3.69s (vs. 3.76s in the default case). Copybara import of the project: -- a02b77cec9622314af01ae481d0fb28b149f1b45 by Sevin Varoglu <svaroglu@nvidia.com>: Add NVLink domain check to CollectiveBackendAssigner Merging this change closes #31375 PiperOrigin-RevId: 826649437 |
||
|---|---|---|
| .github | ||
| ci | ||
| tensorflow | ||
| third_party | ||
| tools | ||
| .bazelignore | ||
| .bazelrc | ||
| .bazelversion | ||
| .clang-format | ||
| .gitignore | ||
| .pylintrc | ||
| .zenodo.json | ||
| arm_compiler.BUILD | ||
| AUTHORS | ||
| BUILD | ||
| CITATION.cff | ||
| CODE_OF_CONDUCT.md | ||
| CODEOWNERS | ||
| configure | ||
| configure.cmd | ||
| configure.py | ||
| CONTRIBUTING.md | ||
| ISSUES.md | ||
| LICENSE | ||
| models.BUILD | ||
| README.md | ||
| RELEASE.md | ||
| requirements_lock_3_9.txt | ||
| requirements_lock_3_10.txt | ||
| requirements_lock_3_11.txt | ||
| requirements_lock_3_12.txt | ||
| requirements_lock_3_13.txt | ||
| SECURITY.md | ||
| WORKSPACE | ||
Documentation |
|---|
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.
TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.
Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.
Install
See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.
To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):
$ pip install tensorflow
Other devices (DirectX and MacOS-metal) are supported using Device Plugins.
A smaller CPU-only package is also available:
$ pip install tensorflow-cpu
To update TensorFlow to the latest version, add --upgrade flag to the above
commands.
Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPI.
Try your first TensorFlow program
$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'
For more examples, see the TensorFlow Tutorials.
Contribution guidelines
If you want to contribute to TensorFlow, be sure to review the Contribution Guidelines. This project adheres to TensorFlow's Code of Conduct. By participating, you are expected to uphold this code.
We use GitHub Issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.
The TensorFlow project strives to abide by generally accepted best practices in open-source software development.
Patching guidelines
Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:
- Clone the TensorFlow repository and switch to the appropriate branch for
your desired version—for example,
r2.8for version 2.8. - Apply the desired changes (i.e., cherry-pick them) and resolve any code conflicts.
- Run TensorFlow tests and ensure they pass.
- Build the TensorFlow pip package from source.
Continuous build status
You can find more community-supported platforms and configurations in the TensorFlow SIG Build Community Builds Table.
Official Builds
| Build Type | Status | Artifacts |
|---|---|---|
| Linux CPU | PyPI | |
| Linux GPU | PyPI | |
| Linux XLA | TBA | |
| macOS | PyPI | |
| Windows CPU | PyPI | |
| Windows GPU | PyPI | |
| Android | Download | |
| Raspberry Pi 0 and 1 | Py3 | |
| Raspberry Pi 2 and 3 | Py3 | |
| Libtensorflow MacOS CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
| Libtensorflow Linux CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
| Libtensorflow Linux GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
| Libtensorflow Windows CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
| Libtensorflow Windows GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Resources
- TensorFlow.org
- TensorFlow Tutorials
- TensorFlow Official Models
- TensorFlow Examples
- TensorFlow Codelabs
- TensorFlow Blog
- Learn ML with TensorFlow
- TensorFlow Twitter
- TensorFlow YouTube
- TensorFlow model optimization roadmap
- TensorFlow White Papers
- TensorBoard Visualization Toolkit
- TensorFlow Code Search
Learn more about the TensorFlow Community and how to Contribute.