Multiple projects/repos in a single CI job

I am new to GitLab coming from TFS aka Azure DevOps, on-premise using TFVC so am also new to Git.

I have a need to use multiple repos in a single CI job. One example is that test data is in its own repo shared by many projects. So the test job in a repo depends on the build job so it gets its artifacts, but then needs to clone or pull the test data repo before running the tests.

I don’t know if it matters but I’m using a Windows gitlab runner to build C# code many libraries and several WPF applications. Thus my CI scripts are written in PowerShell (so $AnyText is a variable in code below), and I have flushed out a number of helpful functions to make doing things much easier. The one I’m struggling with is the one to clone/pull extra repos.

Following GitLab documentation I managed to construct a valid URLs for any project/repo. I want to do exactly the same thing the runner does for the project/repo the CI job is coming from for the other extra repos. Reading carefully from this GitLab document:

And assuming the defaults.

I produced this command for the initial clone:
git clone --depth $Env:GIT_DEPTH -b $Branch $RootRepoURL/$Repo.git “$FullFolder”

And this series of commands for updating local repo on the runner:
git clean -ffdx
git fetch $RootRepoURL/$Repo.git $Branch --depth $Env:GIT_DEPTH --prune --quiet
git checkout -B $Branch

If I logon to the runner and clean all the build stuff off, then the next build and test works correctly, so the clone of the extra repos works fine. But if I push changes to the repo(s) and then build/test again. I can see that the files are not updated for the other extra repos, they’re still the ones from the clone.

The CI jobs will be run for every push of every different branch, as well as for merges back to main, so I need it to handle all that bouncing around.

Because it isn’t working correctly right now, I temporarily have it deleting the other/extra repos before hitting the script to clone/pull so that it always clones new. So I’m working but it is doing a LOT of repeated cloning of repos thus is drastically inefficient.

Does anyone know the actually list of Git commands a runner uses when updating a repo and switching branches?

git fetch only updates the changes in the local database but does not pull the change history into the local repository. It works well in combination to checkout a new branch, but not for future subsequent commits in the remote branch.

git pull is what implicitly fetches the remote changes, and pulls them locally. I think you can solve your problem by adding git pull origin $Branch at the end of the script.

Git has many concepts and functionality, I recommend taking some time to review the learning resources at Git | GitLab and practice, also locally on your client on the CLI. It can help debugging CI/CD pipelines later.

No, but the GitLab Runner is open source where the state machine / workflows can be inspected. I did not do that yet myself. GitLab.org / gitlab-runner · GitLab

Hmmm, I though the fetch updated the local database to the latest changes, then the checkout would bring them out into the file system.
I’d initially tried just using pull and it was failing, then found the GitLab document I references in OP and was trying to do what runner does.

So your suggestion to add pull didn’t work. Using origin just threw a authorization error. Replaced it with the full URL with token and then it failed for other reason(s) , possibly the same stuff when I first tried using pull but can’t remember as that was a couple weeks ago. Anyway here is excerpt of the CI run log:

git clean -ffdx
git fetch https://gitlab-ci-token:[MASKED]@gitlab.XXXXX.com/YYYYY/ZZZZZ.git develop --prune --quiet
git checkout -B develop
Reset branch ‘develop’
git pull https://gitlab-ci-token:[MASKED]@gitlab.XXXXX.com/YYYYY/ZZZZZ.git develop
From https://gitlab.XXXXX.com/YYYYY/ZZZZZ

  • branch develop → FETCH_HEAD
    Committer identity unknown
    *** Please tell me who you are.
    Run
    git config --global user.email “you@example.com
    git config --global user.name “Your Name”
    to set your account’s default identity.
    Omit --global to set the identity only in this repository.
    fatal: unable to auto-detect email address (got ‘SYSTEM@AAAAA.(none)’)

I have no idea why a pull needs to know who the runner is, nor why a runner doesn’t have a default identity.
Guess I’ll have to look at the runner source … I didn’t know it was open, so thanks for the link.

Sorry, my tired brain did not translate the variable into a URL, I wrongly shared origin where I should have explained git remote and how URLs come into play first.

One way is to always specify the URL in the git clone/pull/push command. Another way is to configure so-called remotes and only use the names to resolve to the actual URL.

Within the scope of the .gitlab-ci.yml, remotes might not make sense being configured for the short-lived jobs. Also, there are CI/CD variables available avoiding the copy-paste problems. Another use case for multiple remotes is keeping a fork in sync with upstream, Refreshing a Fork - #2 by dnsmichi I’d suggest going forward with variables, and not remotes. I only shared this detail for better understanding.

The error you are seeing with

Committer identity unknown
*** Please tell me who you are.

means that the local git binary does not have configuration about the user.author and user.email fields in the .gitconfig (it is blank in a fresh git package installation inside the runner executor and/or container image). One way to overcome this problem is doing the setup manually in CI/CD, similar to how you can do it with Git CLI on your desktop client.

 before_script:
    - git config --global user.email "our@email.com"
    - git config --global user.name "Gitlab Runner"

A more in-depth example is shared in Git push from inside a gitlab-runner - #3 by epicode in the before_script section but using SSH keys instead of HTTPS as a Git transport.

Different approach: Git submodules

I kept thinking about the initial request to merge different repository sources in CI/CD jobs.

One way to avoid the knowledge of Git repository URLs and extra configuration in CI/CD can be to use git submodules in your main repository where the CI/CD pipeline is triggered.

git submodule add $RootRepoURL/$Repo.git “$SubmoduleFolder”

This way, the submodules are available to everyone cloning the repository for their development environments too, which might be handy with running local tests, etc. too. They are also visible in the GitLab UI.

Git submodules inside the GitLab Runner can be initialized by cloning the repository recursively. This step is done by the runner itself, you do not need any Git CLI commands yourself. Using Git submodules with GitLab CI/CD | GitLab The only required change is to specify a global variable to define the recursive clone strategy.

variables:
  GIT_SUBMODULE_STRATEGY: recursive

Updating git submodules

The submodule’s directory points to a specific Git commit, which “pins” the version to exactly the submodule’s repository git commit. If you need to update the submodule’s content to latest, you can navigate into the directory, do a git pull, navigate outside, and commit the change to the main repository. In my experience, we did that as developers when pushing new versions in MRs, allowing us to test new revisions carefully.

Steps as a developer on a local client:

git fetch
git checkout main
git checkout -b update-submodule-xy
cd “$SubmoduleFolder”
git pull
cd ..
git commit -m "Update submodule XY" “$SubmoduleFolder”
git push -u origin HEAD

I played with submodules for a bit and found them to be not useful for us as we want to always be using the latest code/binary from the other library repos. Using them required pushing and merging them constantly and then updating the submodule.
After a lot of experimenting with the runners and my code I finally found the difference. The runner was checking out using a specific commit SHA which puts the repo into detached HEAD mode, so when I try to get it to update it just fails.
I couldn’t figure out how to get it re-connect and move to the most recent commit, so just resolved myself to nuking the folder and doing a full clone each time. It works and everything is building as it should, it just uses way more CPU, bandwidth, etc. on the runner not to mention the builds take longer.