Cloning Git submodules in GitLab CICD pipeline and build PyPI packages - authentification setup?

I am trying to add CI/CD to our workflow. For that purpose I would like to not touch any of the repos at all and add this functionality on top in side repos (that use the source as git submodule). This ninja approach :grin: will ensure that some of my colleagues, who are against version control in general let alone things such as CICD, will have very little to do but will be able to enjoy the benefits at least when it comes to automatic packaging and deployment.

A colleague of mine has created a subgroup project (we all have access to it) in our GitLab company instance (we don’t have direct access to it, that is we do not manage it and are simply end-users). Inside project there are multiple repos - some just for experimentation but others actual products for customers. Let’s say the repos that I am interested in are called r1, r2 and r3. Each repo is a Python repo with a setup.py to build a wheel using setuptools.setup with r2 and r3 using r1 as a dependency (all mentioned in the respective setup.py files).

My plan is to do the following:

  • Create a repository that will take r1, r2 and r3 as git submodules. All three are initialized inside ./components/ inside their respective directory (e.g. ./components/r1 for r1)
  • Add a .gitlab-ci.yml at the top level of this repo that will handle the building of each package for each git submodule using the provided setup.py file.

My .gitlab-ci.yml currently contains just a simple stage (with some log messages for debugging purposes) with a single job.

.gitlab-ci.yml

.git_vars:
  variables:
    GIT_SUBMODULE_STRATEGY: recursive
    GIT_SUBMODULE_DEPTH: 1

stages:
 - build

build-pypl-pkg:
  stage: build
  rules:
  image: python:latest
  variables: !reference [.git_vars, variables]
  script:
    - echo Installing Twine for publishing PyPI package
    - pip install build twine
    # TODO Remove. Currently for debugging purposes
    - cat .gitmodules
    # TODO Remove. Currently for debugging purposes
    - ls -alhR components/
    - echo Building package for component SWA Generic
    - python -m build components/swa_generic/
    - echo Building package for component SWA Kernel
    - python -m build components/swa_kernel/
    - echo Building package for component SWA Visibility
    - python -m build components/swa_visibility/
    - echo Package will be published at ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi
    - TWINE_PASSWORD=${CI_JOB_TOKEN} TWINE_USERNAME=gitlab-ci-token python -m twine upload --verbose --repository-url ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi dist/*

The job is more or less copy-paste from the official GitLab documentation on building PyPI-compatible packages and publishing those to the project’s repository.

The job fails with the following (I have used dummy data but the structure in URLs and paths is as close to the real thing as possible) errors:

Fetching changes with git depth set to 20...
Initialized empty Git repository in /builds/company/deperatment/group/cloud/services/project/.git/
Created fresh repository.
Checking out 05d65552 as main...
Updating/initializing submodules recursively with git depth set to 1...
Submodule 'r1 (https://gitlab.example.com/company/deperatment/group/project/r1.git) registered for path 'components/r1'
Submodule 'r2' (https://gitlab.example.com/company/deperatment/group/project/r2.git) registered for path 'components/r2'
Submodule 'r3' (https://gitlab.example.com/company/deperatment/group/project/r3.git) registered for path 'components/r3'
Synchronizing submodule url for 'components/r1'
Synchronizing submodule url for 'components/r2'
Synchronizing submodule url for 'components/r3'
Cloning into '/builds/company/deperatment/group/cloud/services/project/components/r1'...
fatal: could not read Username for 'https://gitlab.example.com': No such device or address
fatal: clone of 'https://gitlab.example.com/company/deperatment/group/project/r1.git' into submodule path '/builds/company/deperatment/group/cloud/services/project/components/r1' failed
Failed to clone 'components/r1'. Retry scheduled
Cloning into '/builds/company/deperatment/group/cloud/services/project/components/r2'...
fatal: could not read Username for 'https://gitlab.example.com': No such device or address
fatal: clone of 'https://gitlab.example.com/company/deperatment/group/project/r2.git' into submodule path '/builds/company/deperatment/group/cloud/services/project/components/r2 failed
Failed to clone 'components/r2'. Retry scheduled
Cloning into '/builds/company/deperatment/group/cloud/services/project/components/r3'...
fatal: could not read Username for 'https://gitlab.example.com': No such device or address
fatal: clone of 'https://gitlab.example.com/company/deperatment/group/project/r3.git' into submodule path '/builds/company/deperatment/group/cloud/services/project/components/r3' failed
Failed to clone 'components/r3'. Retry scheduled
Cloning into '/builds/company/deperatment/group/cloud/services/project/components/r1'...
fatal: could not read Username for 'https://gitlab.example.com': No such device or address
fatal: clone of 'https://gitlab.example.com/company/deperatment/group/project/r1.git' into submodule path '/builds/company/deperatment/group/cloud/services/project/components/r1' failed
Failed to clone 'components/r1' a second time, aborting
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: command terminated with exit code 1

According to GitLab documentation on CICD and git submodules a submodule can be accessed either via absolute or relative URLs. I prefer to stick to the absolute URLs since we are still learning about GitLab (and git in general :smiley: ), so things change quite often. In addition sometimes one would like to use an external (outside of the GitLab instance) repo, so instead of importing that repo it’s easier to just reference it.

Whenever absolute URLs are used, the documentation states that Personal Access Token has to be used.

First of all I find this strange especially in an environment where a lot of things are shared (team) and second of all it binds the whole procedure to a specific user. If that user disappears (e.g. layoff or whatever), another user with their PAT will have to take over and so on. I would hope that a CI_JOB_TOKEN can be used here instead?

Last but not least I still don’t know how to add my PAT in this particular case, if I absolutely have to use it. Should I add to my script git config and set up SSH in the provided image that builds the package or set up an access token in the project and pipe it via stdin? This looks like an overkill to me and also I am anything but expert in this regard.

I had to fully dump the built-in GitLab handling of Git and execute stuff manually inside the pipeline’s configuration file. In case someone wants to know how it’s done (scales up and down based on the number of submodules!):

.git_vars:
  variables:
    # Manually handle the submodule strategy (see build-pypi-pkg)
    GIT_SUBMODULE_STRATEGY: none
    GIT_STRATEGY: clone

stages:
  - build

build-pypi-pkg:
  stage: build
  rules:
  image: python:latest
  variables: !reference [.git_vars, variables]
  before_script:
    - git config --global credential.helper store
    - echo "Login URL               https://${CI_REGISTRY_USER}:${CI_JOB_TOKEN}@gitlab.cc-asp.fraunhofer.de"
    - echo "https://${CI_REGISTRY_USER}:${CI_JOB_TOKEN}@gitlab.example.com" > ~/.git-credentials
    - git submodule sync --recursive
    - git submodule update --init --recursive
    #- mkdir dist
  script:
    - echo Installing Twine for publishing PyPI package
    - pip install build twine
    - echo Building PyPI packages
    - |
      #!/bin/bash
      
      for i in `git submodule foreach --quiet 'echo "$name"'`
      do 
        echo "Building PyPI package for submodule '$i'"
        python -m build "components/$i"
      done
    - echo Publishing packages in ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi
    - |
      #!/bin/bash
      
      for i in `git submodule foreach --quiet 'echo "$name"'`
      do 
        echo "Publishing PyPI package '$i'"
        TWINE_PASSWORD=${CI_JOB_TOKEN} TWINE_USERNAME=gitlab-ci-token python -m twine upload --verbose --repository-url ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi "components/$i/dist/*"
      done
1 Like

Might be related to: Cannot fetch private submodules on GitLab SaaS using absolute URLs (#428594) · Issues · GitLab.org / GitLab · GitLab

Unfortunately, I was unable to reproduce the same result using this workaround.

In my case, the submodule that caused the problem happened to have its project visibility incorrectly set to “Private”(though it is still in the same namespace/owner of the parent project). After correcting the visibility settings the submodule can be cloned successfully.