GitLab.com runner never finishes after script is completed

emrys90 · January 6, 2022, 5:59pm

I have a runner on Windows 11 that never finishes. The bash script has an echo at the end of it, and I see that echo in the CICD log, so I know the script reached the end. But after that script the cache part never starts, and the job just hangs until it times out. Any ideas how to fix this?

Here’s CICD yml entry for it:
build-OculusQuestAppLab:
stage: deploy
cache:
key: “$CI_COMMIT_REF_SLUG-android-applab”
paths:
- Library/
script:
- bash “ci/build.sh”
- bash “ci/deploy-oculus-quest.sh”
variables:
BUILD_TARGET: Android
SCRIPTING_DEFINES: OCULUS;DISABLESTEAMWORKS
tags:
- unity
rules:
- if: ‘$CI_COMMIT_BRANCH == “master” || $CI_COMMIT_BRANCH == “qa” || $CI_COMMIT_BRANCH == “dev”’
when: always
- when: manual
allow_failure: true

It reaches the last line of “bash ci/deploy-oculus-quest.sh” which is ‘echo “Deploy finished”’, and then just freezes there.

aljaxus · January 6, 2022, 7:26pm

Do you also store any artifacts? is the runner cache a accessed over the network? If so, then the artifact/cache uploading takes enough time that the runner job times out (or you just manually stopped it because you thought it was hanging)

emrys90 · January 6, 2022, 7:46pm

No artifacts, only the cache. The cache is stored locally. The CICD has no logs for the cache starting, which it does have logs for in other pipelines that run correctly. It’s only this one pipeline that is having issues. It also doesn’t happen 100% of the time, maybe 90% or so at the moment. It’s just as if the runner thinks the previous script is still running or something and never starts the cache portion.

aljaxus · January 6, 2022, 7:50pm

Can you check if the runner CPU usage is high when the job presumably hangs?
I suppose it could be the runner compressing the cache so it can be stored locally

Also, consider not using cache for packages (like node_modules for example, as it is in most cases faster if you just download them every time)

emrys90 · January 6, 2022, 8:02pm

It’s not an issue with the cache taking a long time to finish. The cache never even starts, because it has 0 log entries saying so. Every other pipeline will show a log entry when the cache is starting.

balonik · January 7, 2022, 2:04pm

Anything in the GitLab Runner logs - Install GitLab Runner on Windows | GitLab

emrys90 · January 7, 2022, 4:35pm

The final message in the logs is just the “Deploy Finished” from the last script. Nothing else shows until it times out a few hours later.

aljaxus · January 11, 2022, 2:34am

Since you’re using Windows, with an .sh script - I’d like to ask which bash interpreter you’re using. It’s possible that that’s the root cause of your issue?
Maybe the interpreter hangs on script end? (though in this case the first script should hang before we even reach the second one, so this is probably not the case)

Can you add another script line to your .gitlab-ci.yml so

build-OculusQuestAppLab:
  script:
    - bash “ci/build.sh”
    - bash “ci/deploy-oculus-quest.sh”
    - echo "third step of the script"`

Also, please do use codeblocks next time

emrys90 · January 11, 2022, 6:39pm

The config is set to use powershell as the shell executor. I just tried it with the echo and it did reach that in the log, but it then froze after and never started the cache like usual.

aljaxus · January 11, 2022, 7:11pm

Aight so the scripts are not the problem… but damn, you’re on windows… do you know how to see the whole process tree on Win? 'cause I have no clue. Something like htop or alike - so you create the job, and when it gets stuck you go see on which part it hangs and hopefully get some useful info?
Maybe there’s a specific command that’s ran right after the script finishes which fails.

dnsmichi · January 11, 2022, 8:23pm

Maybe it is a permission problem, or a more generic OS error. I’d suggest correlating all events at the date and time when the CI job runs and fails. win+r and eventvwr opens the event viewer.

Also, can you share the script content of deploy-oculus-quest.sh and build.sh to exactly see what it does? It may start background services and the like, which are blocking the termination of the job itself.

emrys90 · January 11, 2022, 8:32pm

Other jobs that also use build.sh complete fine, so if its a script related thing then it would be the deploy script causing it I imagine.

Here’s the build script:

#!/usr/bin/env bash

set -e
set -x

echo "Building for $BUILD_TARGET"

export BUILD_PATH=./Builds/$BUILD_TARGET/
mkdir -p $BUILD_PATH

"C:\Program Files\Unity\Hub\Editor\\${UNITY_VERSION}\Editor\Unity.exe" \
  -projectPath $(pwd) \
  -quit \
  -batchmode \
  -gitBranch $CI_COMMIT_BRANCH \
  -buildTarget $BUILD_TARGET \
  -customBuildTarget $BUILD_TARGET \
  -customBuildName "$BUILD_NAME" \
  -customBuildPath $BUILD_PATH \
  -customScriptingDefines $SCRIPTING_DEFINES \
  -customZipPrefix $ZIP_PREFIX \
  -executeMethod BuildCommand.PerformBuild \
  -logFile -

UNITY_EXIT_CODE=$?

if [ $UNITY_EXIT_CODE -eq 0 ]; then
  echo "Run succeeded, no failures occurred";
elif [ $UNITY_EXIT_CODE -eq 2 ]; then
  echo "Run succeeded, some tests failed";
elif [ $UNITY_EXIT_CODE -eq 3 ]; then
  echo "Run failure (other failure)";
else
  echo "Unexpected exit code $UNITY_EXIT_CODE";
fi

ls -la $BUILD_PATH
[ -n "$(ls -A $BUILD_PATH)" ] # fail job if build folder is empty

Here’s the deploy script:

#!/usr/bin/env bash

set -e
set -x

if [ "$CI_COMMIT_REF_SLUG" = "master" ]; then
	channel="RC"
elif [ "$CI_COMMIT_REF_SLUG" = "qa" ]; then
	channel="BETA"
elif [ "$CI_COMMIT_REF_SLUG" = "dev" ]; then
	channel="ALPHA"
else
	channel=$CI_COMMIT_REF_SLUG
fi

gameVersion=`cat GameVersion.txt`
echo "Deploying Oculus Quest build for channel $channel version $gameVersion"

C:/OculusDeploy/ovr-platform-util.exe upload-quest-build -a $OCULUS_QUEST_APP_ID --app_secret $OCULUS_QUEST_APP_SECRET --apk "C:/OculusDeploy/Build/Cards & Tankards.apk" -c $channel

echo "Deploy finished"
exit 0

emrys90 · February 18, 2022, 10:19pm

Does anyone have any other ideas to try? It’s a bit random too, sometimes it will complete, and other times it will just hang there until it times out or I manually cancel the job. It’s only this one deploy to Oculus that ever has issues too. I have a separate deploy to Steam that never freezes like this.

emrys90 · May 1, 2022, 11:37am

I figured it out. The issue is that some of the exe’s I call in my scripts spawn other child processes that keep running after the exe that I call exits. I did a workaround fix by after the exe exits, finding all child processes of that process and killing them.

greg_marineverse · May 18, 2022, 1:50am

Would you be able to share the code for that? Looking at pretty much the same issue, building for Android with Unity, and the build does not finish.

Greg

emrys90 · May 18, 2022, 2:07am

My previous attempt at that actually didn’t work 100% of the time. I further narrowed it down to an issue of adb not being killed after Unity exits. This is the powershell code that I used for it and it is working 100% of the time for me now:

if ( $Env:BUILD_TARGET -eq "Android" )
{
  echo "Killing adb.exe processes as it prevents the CICD process from exiting"
  
  try
  {
    Stop-Process -Name "adb" -ErrorAction SilentlyContinue
  }
  catch
  {
    Write-Host "An error occurred:"
    Write-Host $_
  }
}

greg_marineverse · May 18, 2022, 2:18am

Awesome, thank you emrys90!

For me, this seems to work as well:
"taskkill //IM "adb.exe" //F"

For future reference: I did not have too much luck trying to find child processes ( nothing showed up after the build step finished )… however running “tasklist” I was able to see that “adb” is still running, and killing it allowed the CI job to finish.

mangoni_altair · October 18, 2022, 12:53pm

My pipeline returns error whenever trying to kill the process with Stop-Process, even wrapped in Try/Catch, even with -ErrorAction either set to SilentlyContinue or Ignore.

Same thing by pipelining Get-Process and pass it to Stop-Process.

Any experience with this issue?

emrys90 · October 19, 2022, 1:54am

This works for me.

  try
  {
    Stop-Process -Name "adb" -ErrorAction SilentlyContinue
  }
  catch
  {
    Write-Host "An error occurred:"
    Write-Host $_
  }

Topic		Replies	Views
Job never finishes after being successful GitLab CI/CD	1	288	September 12, 2024
Unable to finish CI Job GitLab CI/CD ssh	3	1401	February 12, 2019
CI Job stuck after successful completion Infrastructure as Code & Cloud Native ci , kubernetes	2	3491	May 11, 2022
CI job hangs for 10 minutes even though job is complete (private runner) GitLab CI/CD runner	2	1259	September 7, 2023
Pipeline passes all stages, but each stage does nothing GitLab CI/CD	2	1739	January 20, 2023

GitLab.com runner never finishes after script is completed

Related topics