GitLab.com runner never finishes after script is completed

I have a runner on Windows 11 that never finishes. The bash script has an echo at the end of it, and I see that echo in the CICD log, so I know the script reached the end. But after that script the cache part never starts, and the job just hangs until it times out. Any ideas how to fix this?

Here’s CICD yml entry for it:
build-OculusQuestAppLab:
stage: deploy
cache:
key: “$CI_COMMIT_REF_SLUG-android-applab”
paths:
- Library/
script:
- bash “ci/build.sh”
- bash “ci/deploy-oculus-quest.sh”
variables:
BUILD_TARGET: Android
SCRIPTING_DEFINES: OCULUS;DISABLESTEAMWORKS
tags:
- unity
rules:
- if: ‘$CI_COMMIT_BRANCH == “master” || $CI_COMMIT_BRANCH == “qa” || $CI_COMMIT_BRANCH == “dev”’
when: always
- when: manual
allow_failure: true

It reaches the last line of “bash ci/deploy-oculus-quest.sh” which is ‘echo “Deploy finished”’, and then just freezes there.

Do you also store any artifacts? is the runner cache a accessed over the network? If so, then the artifact/cache uploading takes enough time that the runner job times out (or you just manually stopped it because you thought it was hanging)

No artifacts, only the cache. The cache is stored locally. The CICD has no logs for the cache starting, which it does have logs for in other pipelines that run correctly. It’s only this one pipeline that is having issues. It also doesn’t happen 100% of the time, maybe 90% or so at the moment. It’s just as if the runner thinks the previous script is still running or something and never starts the cache portion.

Can you check if the runner CPU usage is high when the job presumably hangs?
I suppose it could be the runner compressing the cache so it can be stored locally

Also, consider not using cache for packages (like node_modules for example, as it is in most cases faster if you just download them every time)

1 Like

It’s not an issue with the cache taking a long time to finish. The cache never even starts, because it has 0 log entries saying so. Every other pipeline will show a log entry when the cache is starting.

Anything in the GitLab Runner logs - Install GitLab Runner on Windows | GitLab

1 Like

The final message in the logs is just the “Deploy Finished” from the last script. Nothing else shows until it times out a few hours later.

Since you’re using Windows, with an .sh script - I’d like to ask which bash interpreter you’re using. It’s possible that that’s the root cause of your issue?
Maybe the interpreter hangs on script end? (though in this case the first script should hang before we even reach the second one, so this is probably not the case)

Can you add another script line to your .gitlab-ci.yml so

build-OculusQuestAppLab:
  script:
    - bash “ci/build.sh”
    - bash “ci/deploy-oculus-quest.sh”
    - echo "third step of the script"`

Also, please do use codeblocks next time :wink:

1 Like

The config is set to use powershell as the shell executor. I just tried it with the echo and it did reach that in the log, but it then froze after and never started the cache like usual.

Aight so the scripts are not the problem… but damn, you’re on windows… do you know how to see the whole process tree on Win? 'cause I have no clue. Something like htop or alike - so you create the job, and when it gets stuck you go see on which part it hangs and hopefully get some useful info?
Maybe there’s a specific command that’s ran right after the script finishes which fails.

Maybe it is a permission problem, or a more generic OS error. I’d suggest correlating all events at the date and time when the CI job runs and fails. win+r and eventvwr opens the event viewer.

Also, can you share the script content of deploy-oculus-quest.sh and build.sh to exactly see what it does? It may start background services and the like, which are blocking the termination of the job itself.

Other jobs that also use build.sh complete fine, so if its a script related thing then it would be the deploy script causing it I imagine.

Here’s the build script:

#!/usr/bin/env bash

set -e
set -x

echo "Building for $BUILD_TARGET"

export BUILD_PATH=./Builds/$BUILD_TARGET/
mkdir -p $BUILD_PATH

"C:\Program Files\Unity\Hub\Editor\\${UNITY_VERSION}\Editor\Unity.exe" \
  -projectPath $(pwd) \
  -quit \
  -batchmode \
  -gitBranch $CI_COMMIT_BRANCH \
  -buildTarget $BUILD_TARGET \
  -customBuildTarget $BUILD_TARGET \
  -customBuildName "$BUILD_NAME" \
  -customBuildPath $BUILD_PATH \
  -customScriptingDefines $SCRIPTING_DEFINES \
  -customZipPrefix $ZIP_PREFIX \
  -executeMethod BuildCommand.PerformBuild \
  -logFile -

UNITY_EXIT_CODE=$?

if [ $UNITY_EXIT_CODE -eq 0 ]; then
  echo "Run succeeded, no failures occurred";
elif [ $UNITY_EXIT_CODE -eq 2 ]; then
  echo "Run succeeded, some tests failed";
elif [ $UNITY_EXIT_CODE -eq 3 ]; then
  echo "Run failure (other failure)";
else
  echo "Unexpected exit code $UNITY_EXIT_CODE";
fi

ls -la $BUILD_PATH
[ -n "$(ls -A $BUILD_PATH)" ] # fail job if build folder is empty

Here’s the deploy script:

#!/usr/bin/env bash

set -e
set -x

if [ "$CI_COMMIT_REF_SLUG" = "master" ]; then
	channel="RC"
elif [ "$CI_COMMIT_REF_SLUG" = "qa" ]; then
	channel="BETA"
elif [ "$CI_COMMIT_REF_SLUG" = "dev" ]; then
	channel="ALPHA"
else
	channel=$CI_COMMIT_REF_SLUG
fi

gameVersion=`cat GameVersion.txt`
echo "Deploying Oculus Quest build for channel $channel version $gameVersion"

C:/OculusDeploy/ovr-platform-util.exe upload-quest-build -a $OCULUS_QUEST_APP_ID --app_secret $OCULUS_QUEST_APP_SECRET --apk "C:/OculusDeploy/Build/Cards & Tankards.apk" -c $channel

echo "Deploy finished"
exit 0