[BE] Isolate pre-push hook dependencies in dedicated virtual environment (#160048)

This adds two changes:
- Isolates pre-push hook dependencies into an isolated venv, no longer affect your system environment
- Lets you manually run the pre-push lintrunner (including with lintrunner -a) by invoking `python scripts/lintrunner.py [-a]` (it's ugly, but better than nothing...for now)

This is a follow up to:
- https://github.com/pytorch/pytorch/pull/158389

## Problem
The current pre-push hook setup installs lintrunner and related dependencies globally, which makes developers nervous about system pollution and can cause version conflicts with existing installations.

Also, if the pre-push lintrunner found errors, you had to hope your normal lintrunner could fix them (which wasn't always the case, e.g. if those errors only manifested in certain python versions)

##  Key Changes:
  - Isolated Environment: Creates .git/hooks/linter/.venv/ with Python 3.9 (the python used in CI) and an isolated lintrunner installation
  - User-Friendly CLI: New python scripts/lintrunner.py wrapper allows developers to run lintrunner (including -a auto-fix) from any environment
  - Simplified Architecture: Eliminates pre-commit dependency entirely - uses direct git hooks

  File Changes:
  - scripts/setup_hooks.py: Rewritten to create isolated uv-managed virtual environment
  - scripts/lintrunner.py: New wrapper script with shared hash management logic
  - scripts/run_lintrunner.py: Removed (functionality merged into lintrunner.py)
  - .pre-commit-config.yaml: Removed (no longer needed)

##  Usage:
```
  # Setup (run once)
  python scripts/setup_hooks.py

  # Manual linting (works from any environment)
  python scripts/lintrunner.py        # Check mode
  python scripts/lintrunner.py -a     # Auto-fix mode

  # Git hooks work automatically
  git push  # Runs lintrunner in isolated environment

  # Need to skip the pre-push hook?
  git push --no-verify
```

##  Benefits:
  -  Zero global dependency installation
  -  Per-repository isolation prevents version conflicts
  -  Full lintrunner functionality is now accessible

##  Implementation Notes:
  - Virtual env is kept in a dedicated dir in .git, to keep per-repo mechanics
  - lintrunner.py does not need to be invoked from a specific venv.  It'll invoke the right venv itself.

A minor bug: It tends to garble the lintrunner output a bit, like the screenshot below shows, but I haven't found a workaround so far and it remains understandable to users:
<img width="241" height="154" alt="image" src="https://github.com/user-attachments/assets/9496f925-8524-4434-8486-dc579442d688" />

## What's next?
Features that could be added:
- Check for lintrunner updates, auto-update if needed
- Depending on dev response, this could be enabled by default for all pytorch/pytorch environments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160048
Approved by: https://github.com/seemethere
This commit is contained in:
Zain Rizvi 2025-08-12 01:58:44 +00:00 committed by PyTorch MergeBot
parent 7a974a88f2
commit 95210cc409
4 changed files with 251 additions and 207 deletions

View File

@ -1,12 +0,0 @@
repos:
- repo: local
hooks:
- id: lintrunner
name: Run Lintrunner in an isolated venv before every push. The first run may be slow...
entry: python scripts/run_lintrunner.py # wrapper below
language: python # precommit manages venv for the wrapper
additional_dependencies: [] # wrapper handles lintrunner install
always_run: true
stages: [pre-push] # fire only on prepush
pass_filenames: false # Lintrunner gets no perfile args
verbose: true # stream output as it is produced...allegedly anyways

181
scripts/lintrunner.py Normal file
View File

@ -0,0 +1,181 @@
#!/usr/bin/env python3
"""
Wrapper script to run the isolated hook version of lintrunner.
This allows developers to easily run lintrunner (including with -a for auto-fixes)
using the same isolated environment that the pre-push hook uses, without having
to manually activate/deactivate virtual environments.
Usage:
python scripts/lintrunner.py # Check mode (same as git push)
python scripts/lintrunner.py -a # Auto-fix mode
python scripts/lintrunner.py --help # Show lintrunner help
This module also provides shared functionality for lintrunner hash management.
"""
from __future__ import annotations
import hashlib
import os
import shlex
import shutil
import subprocess
import sys
from pathlib import Path
def find_repo_root() -> Path:
"""Find repository root using git."""
try:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
capture_output=True,
text=True,
check=True,
)
return Path(result.stdout.strip())
except subprocess.CalledProcessError:
sys.exit("❌ Not in a git repository")
def compute_file_hash(path: Path) -> str:
"""Returns SHA256 hash of a file's contents."""
hasher = hashlib.sha256()
with path.open("rb") as f:
while chunk := f.read(8192):
hasher.update(chunk)
return hasher.hexdigest()
def read_stored_hash(path: Path) -> str | None:
if not path.exists():
return None
try:
return path.read_text().strip()
except Exception:
return None
# Venv location - change this if the path changes
HOOK_VENV_PATH = ".git/hooks/linter/.venv"
def get_hook_venv_path() -> Path:
"""Get the path to the hook virtual environment."""
repo_root = find_repo_root()
return repo_root / HOOK_VENV_PATH
def find_hook_venv() -> Path:
"""Locate the isolated hook virtual environment."""
venv_dir = get_hook_venv_path()
if not venv_dir.exists():
sys.exit(
f"❌ Hook virtual environment not found at {venv_dir}\n"
" Please set this up by running: python scripts/setup_hooks.py"
)
return venv_dir
def check_lintrunner_installed(venv_dir: Path) -> None:
"""Check if lintrunner is installed in the given venv, exit if not."""
result = subprocess.run(
[
"uv",
"pip",
"show",
"--python",
str(venv_dir / "bin" / "python"),
"lintrunner",
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
if result.returncode != 0:
sys.exit(
"❌ lintrunner is required but was not found in the hook environment. "
"Please run `python scripts/setup_hooks.py` to reinstall."
)
print("✅ lintrunner is already installed")
def run_lintrunner(venv_dir: Path, args: list[str]) -> int:
"""Run lintrunner command in the specified venv and return exit code."""
# Run lintrunner directly from the venv's bin directory with environment setup
lintrunner_exe = venv_dir / "bin" / "lintrunner"
cmd = [str(lintrunner_exe)] + args
env = os.environ.copy()
# PATH: Ensures lintrunner can find other tools in the venv (like python, pip, etc.)
env["PATH"] = str(venv_dir / "bin") + os.pathsep + env.get("PATH", "")
# VIRTUAL_ENV: Tells tools like pip_init.py that we're in a venv (prevents --user flag issues)
env["VIRTUAL_ENV"] = str(venv_dir)
# Note: Progress tends to be slightly garbled due to terminal control sequences,
# but functionality and final results will be correct
return subprocess.call(cmd, env=env)
def initialize_lintrunner_if_needed(venv_dir: Path) -> None:
"""Check if lintrunner needs initialization and run init if needed."""
repo_root = find_repo_root()
lintrunner_toml_path = repo_root / ".lintrunner.toml"
initialized_hash_path = venv_dir / ".lintrunner_plugins_hash"
if not lintrunner_toml_path.exists():
print("⚠️ No .lintrunner.toml found. Skipping init.")
return
current_hash = compute_file_hash(lintrunner_toml_path)
stored_hash = read_stored_hash(initialized_hash_path)
if current_hash != stored_hash:
print("🔁 Running `lintrunner init` …", file=sys.stderr)
result = run_lintrunner(venv_dir, ["init"])
if result != 0:
sys.exit(f"❌ lintrunner init failed")
initialized_hash_path.write_text(current_hash)
else:
print("✅ Lintrunner plugins already initialized and up to date.")
def main() -> None:
"""Run lintrunner in the isolated hook environment."""
venv_dir = find_hook_venv()
python_exe = venv_dir / "bin" / "python"
if not python_exe.exists():
sys.exit(f"❌ Python executable not found at {python_exe}")
try:
print(f"🐍 Virtual env being used: {venv_dir}", file=sys.stderr)
# 1. Ensure lintrunner binary is available in the venv
check_lintrunner_installed(venv_dir)
# 2. Check for plugin updates and re-init if needed
initialize_lintrunner_if_needed(venv_dir)
# 3. Run lintrunner with any passed arguments and propagate its exit code
args = sys.argv[1:]
result = run_lintrunner(venv_dir, args)
# If lintrunner failed and we're not already in auto-fix mode, suggest the wrapper
if result != 0 and "-a" not in args:
print(
"\n💡 To auto-fix these issues, run: python scripts/lintrunner.py -a",
file=sys.stderr,
)
sys.exit(result)
except KeyboardInterrupt:
print("\n Lintrunner interrupted by user (KeyboardInterrupt)", file=sys.stderr)
sys.exit(1) # Tell git push to fail
if __name__ == "__main__":
main()

View File

@ -1,110 +0,0 @@
#!/usr/bin/env python3
"""
Prepush hook wrapper for Lintrunner.
Stores a hash of .lintrunner.toml in the venv
Re-runs `lintrunner init` if that file's hash changes
"""
from __future__ import annotations
import hashlib
import os
import shutil
import subprocess
import sys
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[1]
LINTRUNNER_TOML_PATH = REPO_ROOT / ".lintrunner.toml"
# This is the path to the pre-commit-managed venv
VENV_ROOT = Path(sys.executable).parent.parent
# Stores the hash of .lintrunner.toml from the last time we ran `lintrunner init`
INITIALIZED_LINTRUNNER_TOML_HASH_PATH = VENV_ROOT / ".lintrunner_plugins_hash"
def ensure_lintrunner() -> None:
"""Fail if Lintrunner is not on PATH."""
if shutil.which("lintrunner"):
print("✅ lintrunner is already installed")
return
sys.exit(
"❌ lintrunner is required but was not found on your PATH. Please run the `python scripts/setup_hooks.py` to install to configure lintrunner before using this script. If `git push` still fails, you may need to open an new terminal"
)
def ensure_virtual_environment() -> None:
"""Fail if not running within a virtual environment."""
in_venv = (
os.environ.get("VIRTUAL_ENV") is not None
or hasattr(sys, "real_prefix")
or (hasattr(sys, "base_prefix") and sys.base_prefix != sys.prefix)
)
if not in_venv:
sys.exit(
"❌ This script must be run from within a virtual environment. "
"Please activate your virtual environment before running this script."
)
def compute_file_hash(path: Path) -> str:
"""Returns SHA256 hash of a file's contents."""
hasher = hashlib.sha256()
with path.open("rb") as f:
while chunk := f.read(8192):
hasher.update(chunk)
return hasher.hexdigest()
def read_stored_hash(path: Path) -> str | None:
if not path.exists():
return None
try:
return path.read_text().strip()
except Exception:
return None
def initialize_lintrunner_if_needed() -> None:
"""Runs lintrunner init if .lintrunner.toml changed since last run."""
if not LINTRUNNER_TOML_PATH.exists():
print("⚠️ No .lintrunner.toml found. Skipping init.")
return
print(
f"INITIALIZED_LINTRUNNER_TOML_HASH_PATH = {INITIALIZED_LINTRUNNER_TOML_HASH_PATH}"
)
current_hash = compute_file_hash(LINTRUNNER_TOML_PATH)
stored_hash = read_stored_hash(INITIALIZED_LINTRUNNER_TOML_HASH_PATH)
if current_hash == stored_hash:
print("✅ Lintrunner plugins already initialized and up to date.")
return
print("🔁 Running `lintrunner init` …", file=sys.stderr)
subprocess.check_call(["lintrunner", "init"])
INITIALIZED_LINTRUNNER_TOML_HASH_PATH.write_text(current_hash)
def main() -> None:
# 0. Ensure we're running in a virtual environment
ensure_virtual_environment()
print(f"🐍 Virtual env being used: {VENV_ROOT}", file=sys.stderr)
# 1. Ensure lintrunner binary is available
ensure_lintrunner()
# 2. Check for plugin updates and re-init if needed
initialize_lintrunner_if_needed()
# 3. Run lintrunner with any passed arguments and propagate its exit code
args = sys.argv[1:] # Forward all arguments to lintrunner
result = subprocess.call(["lintrunner"] + args)
sys.exit(result)
if __name__ == "__main__":
main()

View File

@ -1,31 +1,51 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
Bootstrap Git prepush hook. Bootstrap Git prepush hook with isolated virtual environment.
Requires uv to be installed (fails if not available) Requires uv to be installed (fails if not available)
Installs/updates precommit with uv (global, venvproof) Creates isolated venv in .git/hooks/linter/.venv/ for hook dependencies
Registers the repo's prepush hook and freezes hook versions Installs lintrunner only in the isolated environment
Creates direct git hook that bypasses pre-commit
Run this from the repo root (inside or outside any project venv): Run this from the repo root (inside or outside any project venv):
python scripts/setup_hooks.py python scripts/setup_hooks.py
IMPORTANT: The generated git hook references scripts/lintrunner.py. If users checkout
branches that don't have this file, git push will fail with "No such file or directory".
Users would need to either:
1. Re-run the old setup_hooks.py from that branch, or
2. Manually delete .git/hooks/pre-push to disable hooks temporarily, or
3. Switch back to a branch with the new scripts/lintrunner.py
""" """
from __future__ import annotations from __future__ import annotations
import shlex
import shutil import shutil
import subprocess import subprocess
import sys import sys
from pathlib import Path from pathlib import Path
from typing import Tuple
# Add scripts directory to Python path so we can import lintrunner module
scripts_dir = Path(__file__).parent
sys.path.insert(0, str(scripts_dir))
# Import shared functions from lintrunner module
from lintrunner import find_repo_root, get_hook_venv_path
# Restore sys.path to avoid affecting other imports
sys.path.pop(0)
# ─────────────────────────────────────────── # ───────────────────────────────────────────
# Helper utilities # Helper utilities
# ─────────────────────────────────────────── # ───────────────────────────────────────────
def run(cmd: list[str]) -> None: def run(cmd: list[str], cwd: Path = None) -> None:
print(f"$ {' '.join(cmd)}") print(f"$ {' '.join(cmd)}")
subprocess.check_call(cmd) subprocess.check_call(cmd, cwd=cwd)
def which(cmd: str) -> bool: def which(cmd: str) -> bool:
@ -34,28 +54,7 @@ def which(cmd: str) -> bool:
def ensure_uv() -> None: def ensure_uv() -> None:
if which("uv"): if which("uv"):
# Ensure the path uv installs binaries to is part of the system path
print("$ uv tool update-shell")
result = subprocess.run(
["uv", "tool", "update-shell"], capture_output=True, text=True
)
if result.returncode == 0:
# Check if the output indicates changes were made
if (
"Updated" in result.stdout
or "Added" in result.stdout
or "Modified" in result.stdout
):
print(
"⚠️ Shell configuration updated. You may need to restart your terminal for changes to take effect."
)
elif result.stdout.strip():
print(result.stdout)
return return
else:
sys.exit(
f"❌ Warning: uv tool update-shell failed: {result.stderr}. uv installed tools may not be available."
)
sys.exit( sys.exit(
"\n❌ uv is required but was not found on your PATH.\n" "\n❌ uv is required but was not found on your PATH.\n"
@ -65,29 +64,6 @@ def ensure_uv() -> None:
) )
def ensure_tool_installed(
tool: str, force_update: bool = False, python_ver: Tuple[int, int] = None
) -> None:
"""
Checks to see if the tool is available and if not (or if force update requested) then
it reinstalls it.
Returns: Whether or not the tool is available on PATH. If it's not, a new terminal
needs to be opened before git pushes work as expected.
"""
if force_update or not which(tool):
print(f"Ensuring latest {tool} via uv …")
command = ["uv", "tool", "install", "--force", tool]
if python_ver:
# Add the Python version to the command if specified
command.extend(["--python", f"{python_ver[0]}.{python_ver[1]}"])
run(command)
if not which(tool):
print(
f"\n⚠️ {tool} installation succeed, but it's not on PATH. Launch a new terminal if your git pushes don't work.\n"
)
if sys.platform.startswith("win"): if sys.platform.startswith("win"):
print( print(
"\n⚠️ Lintrunner is not supported on Windows, so there are no pre-push hooks to add. Exiting setup.\n" "\n⚠️ Lintrunner is not supported on Windows, so there are no pre-push hooks to add. Exiting setup.\n"
@ -95,52 +71,61 @@ if sys.platform.startswith("win"):
sys.exit(0) sys.exit(0)
# ─────────────────────────────────────────── # ───────────────────────────────────────────
# 1. Install dependencies # 1. Setup isolated hook environment
# ─────────────────────────────────────────── # ───────────────────────────────────────────
ensure_uv() ensure_uv()
# Ensure pre-commit is installed globally via uv # Find repo root and setup hook directory
ensure_tool_installed("pre-commit", force_update=True, python_ver=(3, 9)) repo_root = find_repo_root()
venv_dir = get_hook_venv_path()
hooks_dir = venv_dir.parent.parent # Go from .git/hooks/linter/.venv to .git/hooks
# Don't force a lintrunner update because it might break folks
# who already have it installed in a different way
ensure_tool_installed("lintrunner")
# ─────────────────────────────────────────── print(f"Setting up isolated hook environment in {venv_dir}")
# 2. Activate (or refresh) the prepush hook
# ───────────────────────────────────────────
# ── Activate (or refresh) the repos prepush hook ────────────────────────── # Create isolated virtual environment for hooks
# Creates/overwrites .git/hooks/prepush with a tiny shim that will call if venv_dir.exists():
# `pre-commit run --hook-stage pre-push` on every `git push`. print("Removing existing hook venv...")
# This is why we need to install pre-commit globally. shutil.rmtree(venv_dir)
#
# The --allow-missing-config flag lets pre-commit succeed if someone changes to run(["uv", "venv", str(venv_dir), "--python", "3.9"])
# a branch that doesn't have pre-commit installed
# Install lintrunner in the isolated environment
print("Installing lintrunner in isolated environment...")
run( run(
[ ["uv", "pip", "install", "--python", str(venv_dir / "bin" / "python"), "lintrunner"]
"uv",
"tool",
"run",
"pre-commit",
"install",
"--hook-type",
"pre-push",
"--allow-missing-config",
]
) )
# ── Pin remotehook versions for reproducibility ──────────────────────────── # ───────────────────────────────────────────
# (Note: we don't have remote hooks right now, but it future-proofs this script) # 2. Create direct git pre-push hook
# 1. `autoupdate` bumps every remote hooks `rev:` in .pre-commit-config.yaml # ───────────────────────────────────────────
# to the latest commit on its default branch.
# 2. `--freeze` immediately rewrites each `rev:` to the exact commit SHA,
# ensuring all contributors and CI run identical hook code.
run(["uv", "tool", "run", "pre-commit", "autoupdate", "--freeze"])
pre_push_hook = hooks_dir / "pre-push"
python_exe = venv_dir / "bin" / "python"
lintrunner_script_path_quoted = shlex.quote(
str(repo_root / "scripts" / "lintrunner.py")
)
hook_script = f"""#!/bin/bash
set -e
# Check if lintrunner script exists (user might be on older commit)
if [ ! -f {lintrunner_script_path_quoted} ]; then
echo "⚠️ {lintrunner_script_path_quoted} not found - skipping linting (likely on an older commit)"
exit 0
fi
# Run lintrunner wrapper using the isolated venv's Python
{shlex.quote(str(python_exe))} {lintrunner_script_path_quoted}
"""
print(f"Creating git pre-push hook at {pre_push_hook}")
pre_push_hook.write_text(hook_script)
pre_push_hook.chmod(0o755) # Make executable
print( print(
"\n✅ precommit is installed globally via uv and the prepush hook is active.\n" "\nIsolated hook environment created and prepush hook is active.\n"
" Lintrunner will now run automatically on every `git push`.\n" " Lintrunner will now run automatically on every `git push`.\n"
f" Hook dependencies are isolated in {venv_dir}\n"
) )