Retry ONNX tests (the quick way) (#98627)

This is to mitigate a flaky ONNX test in trunk and also improve its reliability till we have https://github.com/pytorch/pytorch/issues/98626  (I figure that this is better than moving the job to unstable).

I try to disable the flaky test https://github.com/pytorch/pytorch/issues/98622, but that won't work as @clee2000 points out because ONNX isn't part of `run_test.py` to download and apply the list of disabled tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98627
Approved by: https://github.com/BowenBao
This commit is contained in:
Huy Do 2023-04-07 22:20:36 +00:00 committed by PyTorch MergeBot
parent 4f9dbc17a4
commit 0a0f107b50

View File

@ -3,11 +3,19 @@
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Use to retry ONNX test, only retry it twice
retry () {
"$@" || (sleep 60 && "$@")
}
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
# TODO: This can be removed later once vision is also part of the Docker image
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
"$ROOT_DIR/scripts/onnx/test.sh"
# NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we
# need to bring this to the standard PyTorch run_test eventually. The issue will be tracked in
# https://github.com/pytorch/pytorch/issues/98626
retry "$ROOT_DIR/scripts/onnx/test.sh"
fi