|
|
||
|---|---|---|
| .. | ||
| containers | ||
| envs | ||
| requirements_updater | ||
| utilities | ||
| any.sh | ||
| bisect.sh | ||
| code_check_changed_files.sh | ||
| code_check_full.sh | ||
| debug_tfci.sh | ||
| installer_wheel.sh | ||
| libtensorflow.sh | ||
| pycpp.sh | ||
| README.md | ||
| upload.sh | ||
| wheel.sh | ||
Official CI Directory
Maintainer: TensorFlow and TensorFlow DevInfra
Issue Reporting: File an issue against this repo and tag @devinfra
TensorFlow's Official CI and Build/Test Scripts
TensorFlow's official CI jobs run the scripts in this folder. Our internal CI
system, Kokoro, schedules our CI jobs by combining a build script with a file
from the envs directory that is filled with configuration options:
- Nightly jobs (Run nightly on the
nightlybranch)- Uses
wheel.sh,libtensorflow.sh,code_check_full.sh
- Uses
- Continuous jobs (Run on every GitHub commit)
- Uses
pycpp.sh
- Uses
- Presubmit jobs (Run on every GitHub PR)
- Uses
pycpp.sh,code_check_changed_files.sh
- Uses
These "env" files match up with an environment matrix that roughly covers:
- Different Python versions
- Linux, MacOS, and Windows machines (these pool definitions are internal)
- x86 and arm64
- CPU-only, or with NVIDIA CUDA support (Linux only), or with TPUs
How to Test Your Changes to TensorFlow
You may check how your changes will affect TensorFlow by:
- Creating a PR and observing the presubmit test results
- Running the CI scripts locally, as explained below
- Google employees only: Google employees can use an internal-only tool called "MLCI" that makes testing more convenient: it can execute any full CI job against a pending change. Search for "MLCI" internally to find it.
You may invoke a CI script of your choice by following these instructions:
cd tensorflow-git-dir
# Here is a single-line example of running a script on Linux to build the
# GPU version of TensorFlow for Python 3.12, using the public TF bazel cache and
# a local build cache:
TFCI=py312,linux_x86_cuda,public_cache,disk_cache ci/official/wheel.sh
# First, set your TFCI variable to choose the environment settings.
# TFCI is a comma-separated list of filenames from the envs directory, which
# are all settings for the scripts. TF's CI jobs are all made of a combination
# of these env files.
#
# If you've clicked on a test result from our CI (via a dashboard or GitHub link),
# click to "Invocation Details" and find BUILD_CONFIG, which will contain a TFCI
# value in the "env_vars" list that you can choose to copy that environment.
# Ex. 1: TFCI=py311,linux_x86_cuda,nightly_upload (nightly job)
# Ex. 2: TFCI=py39,linux_x86,rbe (continuous job)
# Non-Googlers should replace "nightly_upload" or "rbe" with
# "public_cache,disk_cache".
# Googlers should replace "nightly_upload" with "public_cache,disk_cache" or
# "rbe", if you have set up your system to use RBE (see further below).
#
# Here is how to choose your TFCI value:
# 1. A Python version must come first, because other scripts reference it.
# Ex. py39 -- Python 3.9
# Ex. py310 -- Python 3.10
# Ex. py311 -- Python 3.11
# Ex. py312 -- Python 3.12
# 2. Choose the platform, which corresponds to the version of TensorFlow to
# build. This should also match the system you're using--you cannot build
# the TF MacOS package from Linux.
# Ex. linux_x86 -- x86_64 Linux platform
# Ex. linux_x86_cuda -- x86_64 Linux platform, with Nvidia CUDA support
# Ex. macos_arm64 -- arm64 MacOS platform
# 3. Add modifiers. Some modifiers for local execution are:
# Ex. disk_cache -- Use a local cache
# Ex. public_cache -- Use TF's public cache (read-only)
# Ex. public_cache_push -- Use TF's public cache (read and write, Googlers only)
# Ex. rbe -- Use RBE for faster builds (Googlers only; see below)
# Ex. no_docker -- Disable docker on enabled platforms
# See full examples below for more details on these. Some other modifiers are:
# Ex. versions_upload -- for TF official release versions
# Ex. nightly_upload -- for TF nightly official builds; changes version numbers
# Ex. no_upload -- Disable all uploads, usually for temporary CI issues
# Recommended: use a local+remote cache.
#
# Bazel will cache your builds in tensorflow/build_output/cache,
# and will also try using public build cache results to speed up
# your builds. This usually saves a lot of time, especially when
# re-running tests. However, note that:
#
# - New environments like new CUDA versions, changes to manylinux,
# compilers, etc. can cause undefined behavior such as build failures
# or tests passing incorrectly.
# - Automatic LLVM updates are known to extend build time even with
# the cache; this is unavoidable.
export TFCI=py311,linux_x86,public_cache,disk_cache
# Recommended: Configure Docker. (Linux only)
#
# TF uses hub.docker.com/r/tensorflow/build containers for CI,
# and scripts on Linux create a persistent container called "tf"
# which mounts your TensorFlow directory into the container.
#
# Important: because the container is persistent, you cannot change TFCI
# variables in between script executions. To forcibly remove the
# container and start fresh, run "docker rm -f tf". Removing the container
# destroys some temporary bazel data and causes longer builds.
#
# You will need the NVIDIA Container Toolkit for GPU testing:
# https://github.com/NVIDIA/nvidia-container-toolkit
#
# Note: if you interrupt a bazel command on docker (ctrl-c), you
# will need to run `docker exec tf pkill bazel` to quit bazel.
#
# Note: new files created from the container are owned by "root".
# You can run e.g. `docker exec tf chown -R $(id -u):$(id -g) build_output`
# to transfer ownership to your user.
#
# Docker is enabled by default on Linux. You may disable it if you prefer:
# export TFCI=py311,linux_x86,no_docker
# Advanced: Use Remote Build Execution (RBE) (internal developers only)
#
# RBE dramatically speeds up builds and testing. It also gives you a
# public URL to share your build results with collaborators. However,
# it is only available to a limited set of internal TensorFlow developers.
#
# RBE is incompatible with local caching, so you must remove
# disk_cache, public_cache, and public_cache_push from your $TFCI file.
#
# To use RBE, you must first run `gcloud auth application-default login`, then:
export TFCI=py311,linux_x86,rbe
# Finally: Run your script of choice.
# If you've clicked on a test result from our CI (via a dashboard or GitHub link),
# click to "Invocation Details" and find BUILD_CONFIG, which will contain a
# "build_file" item that indicates the script used.
ci/official/wheel.sh
# Advanced: Select specific build/test targets with "any.sh".
# TF_ANY_TARGETS=":your/target" TF_ANY_MODE="test" ci/official/any.sh
# Afterwards: Examine the results, which will include: The bazel cache,
# generated artifacts like .whl files, and "script.log", from the script.
# Note that files created under Docker will be owned by "root".
ls build_output
Contribution & Maintenance
The TensorFlow team does not yet have guidelines in place for contributing to this directory. We are working on it. Please join a TF SIG Build meeting (see: bit.ly/tf-sig-build-notes) if you'd like to discuss the future of contributions.
Brief System Overview
The top-level scripts and utility scripts should be fairly well-documented. Here is a brief explanation of how they tie together:
envs/*are lists of variables made with bash syntax. A user must set aTFCIenv param pointing to a list ofenvfiles.utilities/setup.sh, initialized by all top-level scripts, reads and sets values from thoseTFCIpaths.set -a/set -o allexportexports the variables fromenvfiles so all scripts can use them.utilities/setup_docker.shcreates a container calledtfwith allTFCI_variables shared to it.
- Top-level scripts (
wheel.sh, etc.) referenceenvvariables and callutilities/scripts.- The
tfrunfunction makes a command run correctly in Docker if Docker is enabled.
- The