We’ve had a gitlab.com, gitlab-runner setup running on a specific runner deployed to our kubernetes cluster. It ran for ~6 months no problem and yesterday all of our pipelines fail with this same error on every stage. We had a devops guy set everything up and he left a couple of weeks ago, so I’m on the steep learning curve of figuring out what was done without much in the way of documentation.
I can see what is failing, but I don’t really understand where I start applying fixes to resolve this issue.
The job output from one of our builds:
Running with gitlab-runner 13.3.1 (738bbe5a)
on runner-gitlab-runner-594484c775-v2zxb w7WiGHhk
Preparing the "kubernetes" executor ** ** 00:00** **
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image gcr.io/kaniko-project/executor:debug ...
Preparing environment ** ** 03:03** **
Waiting for pod gitlab-managed-apps/runner-w7wighhk-project-20213218-concurrent-0xsr7z to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-w7wighhk-project-20213218-concurrent-0xsr7z to be running, status is Pending
<same message for another 3-4 minutes>
ERROR: Job failed (system failure): prepare environment: timed out waiting for pod to start. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Then the build reports as failed. The shell documentation does not provide and good info that I believe helps me resolve this. I’ve logged into the gitlab-runner instance and don’t see any of the referenced shell files. All the builds show a similar output. We can get our builds to run on the shared runners, but we have a lot and don’t think we’ll be able to use this long-term.
I can see the gitlab-runner deployment running:
$ kubectl get pods -n gitlab-managed-apps
NAME READY STATUS RESTARTS AGE
runner-gitlab-runner-594484c775-v2zxb 1/1 Running 0 161d
If I run a build, I can see where it spins up a pod to run the build:
$ kubectl get pods -n gitlab-managed-apps
NAME READY STATUS RESTARTS AGE
runner-gitlab-runner-594484c775-v2zxb 1/1 Running 0 161d
runner-w7wighhk-project-20213218-concurrent-024xhx 0/2 Pending 0 8s
The logs show a lot of the same errors from the build screen, but there’s some additional info that might point in a helpful direction, but googling for these errors and searching the forum didn’t turn anything useful up. Here’s some exceprts from the logs:
WARNING: Failed to process runner builds=1 error=prepare environment: timed out waiting for pod to start. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information executor=kubernetes runner=w7WiGHhk
ERROR: Job failed (system failure): prepare environment: timed out waiting for pod to start. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information duration=3m3.679822327s job=1051575746 project=20213218 runner=w7WiGHhk
WARNING: Appending trace to coordinator... aborted code=403 job=1052542445 job-log= job-status=canceled runner=w7WiGHhk sent-log=6733-6934 status=403 Forbidden update-interval=0s
This is about all I’ve figured out so far. Someone with a similar setup had reported that his logs bloated and stopped his container from responding, but best I can tell I don’t have space issues on my gitlab-runner instance.
Can anyone help point me in a helpful direction? I’m new to the gitlab-runner/kubernetes/docker world but not to servers and software, I’m just at a loss for where to start.
Thanks!