Docker executor not able to find packages installed in previous stages despite reuse enabled

Introduction details

Hi there, I currently have a gitlab runner that uses a Docker executor to run a .gitlab-ci.yml pipeline. My current understanding is that by default, the behavior for a Docker runner is that each stage will utilize a new container.

However, I can set pull_policy = ["if-not-present"] in my config.toml for the runner and have the pipeline re-use the a container for subsequent stages, and so I added that key:value to my config.toml. It seemingly works as each stage in the pipeline does successfully use the same container (exact idential image hash)

Problem to solve

The issue I’m facing, is that the packages installed in the build stage don’t seem to be in the container in the subsequent test stage. The test stage jobs are all using the same container (same exact that was used in the build stage due to me setting pull_policy = ["if-not-present"].

Eg. every single test stage job output says Using docker image sha256:{hash} for python:3.11 with digest python@sha256:{hash}, with the same hash for every job.

Despite all the double-checking, the test stage jobs fail with the errors:

  • /usr/bin/bash: line 136: pytest: command not found
  • /usr/bin/bash: line 136: mypy: command not found

Both pytest and mypy are installed in the build stage

Configuration

Here is my .gitlab-ci.yml with sensitive things anonymized

stages:
  - build
  - test

build-job:
  image: python:${PYTHON_VERSION_CI}
  stage: build
  script:
    - echo "Compiling the code..."
    - pip install -r requirements.txt # requirements.txt contains pytest and mypy
    - echo "Compile complete."

test_1:
  image: python:${PYTHON_VERSION_CI}
  stage: test
  script:
    - echo "Running test 1"
    - echo $PATH
    - pytest src/1/tests --cov=1 --cov-report term --cov-report xml:coverage_1.xml
    - echo "Test 1 complete"

test_2:
  image: python:${PYTHON_VERSION_CI}
  stage: test
  script:
    - echo "Running coeur test"
    - echo $PATH
    - pytest src/2/tests --cov=2 --cov-report term --cov-report xml:coverage_2.xml
    - echo "Test 2 complete"

test_mypy:
  stage: test
  image: python:${PYTHON_VERSION_CI}
  script:
    - echo "Running mypy tests"
    - mypy
    - echo "mypy tests complete"

Recap

Is there a reason why mypy and pytest aren’t found in the test stage jobs? I can get the pipeline to work if I add pip install -r requirements.txt to each test stage job, but that seems redundant.

Hi @calicowearer :wave:

The Docker executor will pull and start the python:${PYTHON_VERSION_CI} image before each job executes. If the python:${PYTHON_VERSION_CI} image doesn’t have dependencies that subsequent jobs need to have installed, you’ll need to install them in each job. Changes to images (like the result of pip install -r requirements.txt) are only saved if you use an image that’s built and pushed to a container registry after these dependencies have been installed.

Setting pull_policy = if-not-present just tells your runner that it doesn’t need to do a fresh docker pull for the python:${PYTHON_VERSION_CI} if it hasn’t been updated (sha256:{hash} hasn’t changed). pull_policy = if-not-present does not control whether changes made or dependencies installed in the image used in your build stage jobs get carried over to subsequent jobs, it just controls whether your runner will docker pull a new image tag.

The boring solution here would be to add

- pip install -r requirements.txt

to each job script (or before_script) so that the required python dependencies are installed and available in all jobs that need them.

A better solution would be to build an docker image where pip install -r requirements.txt has been executed, save/push that image to your container registry, and use that image for all jobs in the pipeline.

A simplified example of how to do this would be:

  1. Create a Dockerfile in root of your repository containing

    FROM python:<tag>
    
    COPY requirements.txt .
    
    RUN pip install -r requirements.txt
    
  2. Build and push this image to the GitLab Container Registry using GitLab CI

    build-python-docker-image:
      image: docker:20.10.16
         stage: build
       services:
         - docker:20.10.16-dind
       variables:
         IMAGE_TAG: "$CI_REGISTRY_IMAGE:$PYTHON_VERSION_CI"
       script:
         - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
         - docker build -t "$IMAGE_TAG" .
         - docker push "$IMAGE_TAG"
    
  3. Change all jobs to use the image created/saved/pushed by the build-python-docker-image job (which has the requirements.txt dependencies installed)

References:

3 Likes

Hey @gitlab-greg,

I’ve gone ahead and created an image per your suggestion, however the behavior between me testing the container and then the pipeline executing on the container doesn’t seem to match.

My Dockerfile is essentially:

FROM python:3.11

WORKDIR /app

COPY . ./

RUN pip install -r requirements.txt

ENTRYPOINT ["/bin/bash", "-l", "-c"]

This builds nicely and after it’s built I can run:

user@host:$ docker run -it --rm 

root@1c961fe15998:/app# mypy
Success: no issues found in 101 source files

root@1c961fe15998:/app# exit
exit

user@host:$

Same case with running the container and then running pytest src/dir/tests.

However, when the pipeline tries to run my same mypy stage:

test_mypy:
  stage: test
  image: image-url
  script:
    - echo "Running mypy tests"
    - mypy
    - echo "mypy tests complete"

I get:

26 Using docker image sha256:...
27 /usr/bin/sh: /usr/bin/sh: cannot execute binary file
29 Uploading artifacts for failed job 00:01
30 Uploading artifacts...

I’m not sure why the pipeline is failing there; is the docker executor not starting the container that I built to run the script in each test stage?