Sporadic Error in Build Jobs: "error: could not write config file [...] *.tmp\git-template\config: Permission denied"

Problem to solve

Our build jobs sporadically fail with a “Permission denied” error. This happens immediately, before our own script starts, during the git fetch operation. (see log below)

We have 12 Runners that are all exactly the same Windows VM setup that are reset weekly to a point long before this issue first started appearing. There is no pattern as to which runner is affected. This problem seems to occur totally randomly. It happened a few times in 2023, but since the end of January, it’s happening multiple times a day. There seems to be no correlation between the issue occuring and changes to our infrastructure (e.g. Gitlab/Runner Update). We’ve updated out GitLab and Runners on Jan 14 and this became a daily occurrence on Jan 29.

As this happens even before the pre-build cleanup is performed, these jobs leave outdated artifacts which are then used by subsequent jobs which leads to incorrect data being sent to our other systems.

A retry usually helps to “fix” the issue, but we lose a lot of time, as our developers usually just start pipelines and then come back a while later, only to find out that they failed.

The Job Log looks like this:

Running with gitlab-runner 16.8.1 (a6097117)
  on runner-VM-FX-GLR-6 %%: , system ID: %%
Preparing the "shell" executor   00:00
Using Shell (powershell) executor...
Preparing environment   00:00
Running on VM-FX-GLR-6...
Getting source from Git repository   00:01
Fetching changes...
error: could not write config file C:\Gitlab-Runner\builds\%%\repo.tmp\git-template\config: Permission denied
Uploading artifacts for failed job   00:05
Version:      16.8.1
Git revision: a6097117
Git branch:   16-8-stable
GO version:   go1.21.5
Built:        2024-02-15T18:34:46+0000
OS/Arch:      windows/amd64
Uploading artifacts...
Runtime platform                                    arch=amd64 os=windows pid=8316 revision=a6097117 version=16.8.1
%%: found 4 matching artifact files and directories
[...]
%%: found 1 matching artifact files and directories
Uploading artifacts as "archive" to coordinator... 201 Created  id=921858 responseStatus=201 Created token=64_Csaoe
Cleaning up project directory and file based variables   00:01
ERROR: Job failed: exit status 4

Steps to reproduce

I found some other posts with a similar error, but they all seemed to be “could not lock” instead of “could not write” and they were having this problem every single time, instead of just sporadically like in our case.

Stackoverflow
GitLab Issue

We’re running out of ideas and the current plan is to build a workaround that somehow detects this type of failure and automatically restarts the job…

Configuration

I doubt that the job configuration has anything to do with it, but here it is anyways:

ci_build_job:
    stage: cibuild
    allow_failure:
        exit_codes: %% 
    script:
        - [...]
    artifacts:
        paths:
            - [...]
        when: always
        expire_in: 1 week
    tags:
        - %%
    rules:
        - if: '$CI_PIPELINE_SOURCE == "push"'
          when: never
        - if: '$CI_MERGE_REQUEST_LABELS =~ /^.*p:Hotfix.*/'
          when: never
        - if: '$CI_MERGE_REQUEST_TITLE =~ /^Draft.*/'
          when: manual
        - when: on_success

Runner config:

concurrent = 1
check_interval = 0
 
[session_server]
  session_timeout = 1800
 
[[runners]]
  name = "runner-VM-FX-GLR-1"
  url = "%%%"
  token = "%%%"
  executor = "shell"
  shell = "powershell"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]

Versions

  • Self-managed
  • Self-hosted Runners

Versions

  • GitLab 16.8.4
  • GitLab Runner 16.8.1