Dear all,
I configured my project to run some jobs using GitLab CI. The executor of my runner is configured to run docker and I have created a group token that I use to run some jobs. For the general pipelines, I am using git clone strategy. If I create a MR and a branch and I commit some stuff, some jobs are triggered by my user that succeed with no problems. However, when I merge my branch to the main branch, I have some jobs that are triggered by the GitLab API and bot corresponding to the group token. The problem here is that for some reason, the bot is able to clone some of the submodules and not others. By chance, I discovered that the modules that can be cloned are older than the ones that cannot be cloned. The bot stopped to clone a module that I was able to clone after I deleted the project and I created it again.
Something else to mention is that after a job triggered by the bot fails, if I run the job again from the GitLab web interface, as it runs with my user it runs just fine.
Find below the logging messages for the job with some comments in red capital letters:
Running with gitlab-runner 15.7.2 (0e7679e6)
on lrashpa-dr2 XLx9sykz
Preparing the "docker" executor 00:02
Using Docker executor with image gitlab-registry.cern.ch/hog/hog-docker:ubuntu ...
Pulling docker image gitlab-registry.cern.ch/hog/hog-docker:ubuntu ...
Using docker image sha256:4069161b20396428aa6c8a5b765acb0678c188824f603460eea4cb01dec1735d for gitlab-registry.cern.ch/hog/hog-docker:ubuntu with digest gitlab-registry.cern.ch/hog/hog-docker@sha256:61c2dc9a77d11f2bae4eb400d367d579fabeee5b0b96e16cb95d494c23e4141f ...
Preparing environment 00:00
Running on runner-xlx9sykz-project-3801-concurrent-0 via lrashpa-dr2...
Getting source from Git repository 00:07
Fetching changes...
Initialized empty Git repository in /builds/deg/fpga-shared/hdl/absenc_test/.git/
Created fresh repository.
Checking out a826be34 as v0.0.37...
Updating/initializing submodules recursively...
Submodule 'Hog' (https://gitlab-ci-token:[MASKED]@gitlab.esrf.fr/deg/fpga-shared/Hog.git) registered for path 'Hog'
COMMENT: IF I OPEN THE LINK BELOW FROM A WEB BROWSER I AM ABLE TO ACCESS THE REPOSITORY
Submodule 'hdl/PoC' (https://gitlab-ci-token:[MASKED]@gitlab.esrf.fr/deg/fpga-shared/hdl/PoC.git) registered for path 'hdl/PoC'
Synchronizing submodule url for 'Hog'
Synchronizing submodule url for 'hdl/PoC'
COMMENT: THE COPY OF HOG HOSTED IN THE ESRF GITLAB IS CLONING JUST FINE. THE PROJECT HAS THE SAME CONFIGURATION AS THE OTHER PROJECTS.
Cloning into '/builds/deg/fpga-shared/hdl/absenc_test/Hog'...
Cloning into '/builds/deg/fpga-shared/hdl/absenc_test/hdl/PoC'...
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository 'https://gitlab.esrf.fr/deg/fpga-shared/hdl/PoC.git/' not found
COMMENT: IF I OPEN THE LINK BELOW FROM A WEB BROWSER I AM NOT ABLE TO ACCESS THE REPOSITORY
fatal: clone of 'https://gitlab-ci-token:[MASKED]@gitlab.esrf.fr/deg/fpga-shared/hdl/PoC.git' into submodule path '/builds/deg/fpga-shared/hdl/absenc_test/hdl/PoC' failed
Failed to clone 'hdl/PoC'. Retry scheduled
Cloning into '/builds/deg/fpga-shared/hdl/absenc_test/hdl/PoC'...
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository 'https://gitlab.esrf.fr/deg/fpga-shared/hdl/PoC.git/' not found
fatal: clone of 'https://gitlab-ci-token:[MASKED]@gitlab.esrf.fr/deg/fpga-shared/hdl/PoC.git' into submodule path '/builds/deg/fpga-shared/hdl/absenc_test/hdl/PoC' failed
Failed to clone 'hdl/PoC' a second time, aborting
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1
The token seems to be fine when “Updating/initializing submodules recursively…” for PoC project as I can access it by opening the link with a web browser. But if I try to open the link provided for PoC after “Cloning into”, I am not able to access it anymore by opening the link with a web browser. The two links are identical, just the token [MASKED] is the part that could differ. So, it looks like the token is being corrupted between these two steps, and what is more strange is that it is only corrupted for recently created projects. It looks like something needs to be restarted, refreshed or cleaned up so the bot can correctly used the token to clone newly created projects.
I tried to create a new token, restart gitlab-runner, create a new gitlab-runner, clear docker cache (docker system prune), clear runners cache from gitlab web interface… and now I am running out of ideas. If someone else have any other idea or suggestion, I would really much appreciate it.
Many thanks.