mirror of https://github.com/zebrajr/tensorflow.git synced 2025-12-06 00:19:58 +01:00

History

Quoc Truong f01a7fea8c Update ML Build Docker container to use hermetic C++ PiperOrigin-RevId: 826147864		2025-10-30 13:25:44 -07:00
..
containers	Update ML Build Docker container to use hermetic C++	2025-10-30 13:25:44 -07:00
envs	Increase the Linux arm64 wheel size limit to 270 MB to unblock nightly builds.	2025-10-17 11:18:57 -07:00
requirements_updater	Fix MacOS nightly wheel builds by adding h5py version limit.	2025-10-15 13:30:40 -07:00
utilities	Ignore errors in brew update on MacOS platform.	2025-10-06 15:42:26 -07:00
any.sh	Update scripts/configs for Windows nightly/release builds.	2025-01-09 10:37:25 -08:00
bisect.sh	Update scripts/configs for Windows nightly/release builds.	2025-01-09 10:37:25 -08:00
code_check_changed_files.sh	Fix bats output directory for newer scripts	2023-10-05 11:41:59 -07:00
code_check_full.sh	Minor fix for addressing TF query analysis errors	2024-02-06 09:29:44 -08:00
debug_tfci.sh	Add debug_tfci.sh script	2023-11-01 12:54:25 -07:00
installer_wheel.sh	Refactor mechanisms of building TF wheel and storing TF project version.	2025-03-04 14:39:12 -08:00
libtensorflow.sh	Add a workaround for gsutil not working properly on MSYS2.	2025-01-24 12:17:08 -08:00
pycpp.sh	Update scripts/configs for Windows nightly/release builds.	2025-01-09 10:37:25 -08:00
README.md	Split "multicache" into specific cache envs	2024-01-31 19:21:19 -08:00
upload.sh	Refactor mechanisms of building TF wheel and storing TF project version.	2025-03-04 14:39:12 -08:00
wheel.sh	Move `--verbose_failures` option to the command that builds the first wheel.	2025-03-03 09:29:53 -08:00

README.md

Official CI Directory

Maintainer: TensorFlow and TensorFlow DevInfra

Issue Reporting: File an issue against this repo and tag @devinfra

TensorFlow's Official CI and Build/Test Scripts

TensorFlow's official CI jobs run the scripts in this folder. Our internal CI system, Kokoro, schedules our CI jobs by combining a build script with a file from the envs directory that is filled with configuration options:

Nightly jobs (Run nightly on the nightly branch)
- Uses wheel.sh, libtensorflow.sh, code_check_full.sh
Continuous jobs (Run on every GitHub commit)
- Uses pycpp.sh
Presubmit jobs (Run on every GitHub PR)
- Uses pycpp.sh, code_check_changed_files.sh

These "env" files match up with an environment matrix that roughly covers:

Different Python versions
Linux, MacOS, and Windows machines (these pool definitions are internal)
x86 and arm64
CPU-only, or with NVIDIA CUDA support (Linux only), or with TPUs

How to Test Your Changes to TensorFlow

You may check how your changes will affect TensorFlow by:

Creating a PR and observing the presubmit test results
Running the CI scripts locally, as explained below
Google employees only: Google employees can use an internal-only tool called "MLCI" that makes testing more convenient: it can execute any full CI job against a pending change. Search for "MLCI" internally to find it.

You may invoke a CI script of your choice by following these instructions:

cd tensorflow-git-dir

# Here is a single-line example of running a script on Linux to build the
# GPU version of TensorFlow for Python 3.12, using the public TF bazel cache and
# a local build cache:
TFCI=py312,linux_x86_cuda,public_cache,disk_cache ci/official/wheel.sh

# First, set your TFCI variable to choose the environment settings.
#   TFCI is a comma-separated list of filenames from the envs directory, which
#   are all settings for the scripts. TF's CI jobs are all made of a combination
#   of these env files.
#
#   If you've clicked on a test result from our CI (via a dashboard or GitHub link),
#   click to "Invocation Details" and find BUILD_CONFIG, which will contain a TFCI
#   value in the "env_vars" list that you can choose to copy that environment.
#      Ex. 1: TFCI=py311,linux_x86_cuda,nightly_upload  (nightly job)
#      Ex. 2: TFCI=py39,linux_x86,rbe                   (continuous job)
#   Non-Googlers should replace "nightly_upload" or "rbe" with
#   "public_cache,disk_cache".
#   Googlers should replace "nightly_upload" with "public_cache,disk_cache" or
#   "rbe", if you have set up your system to use RBE (see further below).
#
# Here is how to choose your TFCI value:
# 1. A Python version must come first, because other scripts reference it.
#      Ex. py39  -- Python 3.9
#      Ex. py310 -- Python 3.10
#      Ex. py311 -- Python 3.11
#      Ex. py312 -- Python 3.12
# 2. Choose the platform, which corresponds to the version of TensorFlow to
#    build. This should also match the system you're using--you cannot build
#    the TF MacOS package from Linux.
#      Ex. linux_x86        -- x86_64 Linux platform
#      Ex. linux_x86_cuda   -- x86_64 Linux platform, with Nvidia CUDA support
#      Ex. macos_arm64      -- arm64 MacOS platform
# 3. Add modifiers. Some modifiers for local execution are:
#      Ex. disk_cache -- Use a local cache
#      Ex. public_cache -- Use TF's public cache (read-only)
#      Ex. public_cache_push -- Use TF's public cache (read and write, Googlers only)
#      Ex. rbe        -- Use RBE for faster builds (Googlers only; see below)
#      Ex. no_docker  -- Disable docker on enabled platforms
#    See full examples below for more details on these. Some other modifiers are:
#      Ex. versions_upload -- for TF official release versions
#      Ex. nightly_upload -- for TF nightly official builds; changes version numbers
#      Ex. no_upload      -- Disable all uploads, usually for temporary CI issues

# Recommended: use a local+remote cache.
#
#   Bazel will cache your builds in tensorflow/build_output/cache,
#   and will also try using public build cache results to speed up
#   your builds. This usually saves a lot of time, especially when
#   re-running tests. However, note that:
#
#    - New environments like new CUDA versions, changes to manylinux,
#      compilers, etc. can cause undefined behavior such as build failures
#      or tests passing incorrectly.
#    - Automatic LLVM updates are known to extend build time even with
#      the cache; this is unavoidable.
export TFCI=py311,linux_x86,public_cache,disk_cache

# Recommended: Configure Docker. (Linux only)
#
#   TF uses hub.docker.com/r/tensorflow/build containers for CI,
#   and scripts on Linux create a persistent container called "tf"
#   which mounts your TensorFlow directory into the container.
#
#   Important: because the container is persistent, you cannot change TFCI
#   variables in between script executions. To forcibly remove the
#   container and start fresh, run "docker rm -f tf". Removing the container
#   destroys some temporary bazel data and causes longer builds.
#
#   You will need the NVIDIA Container Toolkit for GPU testing:
#   https://github.com/NVIDIA/nvidia-container-toolkit
#
#   Note: if you interrupt a bazel command on docker (ctrl-c), you
#   will need to run `docker exec tf pkill bazel` to quit bazel.
#
#   Note: new files created from the container are owned by "root".
#   You can run e.g. `docker exec tf chown -R $(id -u):$(id -g) build_output`
#   to transfer ownership to your user.
#
# Docker is enabled by default on Linux. You may disable it if you prefer:
# export TFCI=py311,linux_x86,no_docker

# Advanced: Use Remote Build Execution (RBE) (internal developers only)
#
#   RBE dramatically speeds up builds and testing. It also gives you a
#   public URL to share your build results with collaborators. However,
#   it is only available to a limited set of internal TensorFlow developers.
#
#   RBE is incompatible with local caching, so you must remove
#   disk_cache, public_cache, and public_cache_push from your $TFCI file.
#
# To use RBE, you must first run `gcloud auth application-default login`, then:
export TFCI=py311,linux_x86,rbe

# Finally: Run your script of choice.
#   If you've clicked on a test result from our CI (via a dashboard or GitHub link),
#   click to "Invocation Details" and find BUILD_CONFIG, which will contain a
#   "build_file" item that indicates the script used.
ci/official/wheel.sh

# Advanced: Select specific build/test targets with "any.sh".
# TF_ANY_TARGETS=":your/target" TF_ANY_MODE="test" ci/official/any.sh

# Afterwards: Examine the results, which will include: The bazel cache,
# generated artifacts like .whl files, and "script.log", from the script.
# Note that files created under Docker will be owned by "root".
ls build_output

Contribution & Maintenance

The TensorFlow team does not yet have guidelines in place for contributing to this directory. We are working on it. Please join a TF SIG Build meeting (see: bit.ly/tf-sig-build-notes) if you'd like to discuss the future of contributions.

Brief System Overview

The top-level scripts and utility scripts should be fairly well-documented. Here is a brief explanation of how they tie together:

envs/* are lists of variables made with bash syntax. A user must set a TFCI env param pointing to a list of env files.
utilities/setup.sh, initialized by all top-level scripts, reads and sets values from those TFCI paths.
- set -a / set -o allexport exports the variables from env files so all scripts can use them.
- utilities/setup_docker.sh creates a container called tf with all TFCI_ variables shared to it.
Top-level scripts (wheel.sh, etc.) reference env variables and call utilities/ scripts.
- The tfrun function makes a command run correctly in Docker if Docker is enabled.