PHPUnit tests becomes slow after few minutes in

I am running a test suite composed of 741 tests and 3390 assertions with PHPUnit in the context of a Laravel application. Most of those tests are integration tests which use a database connexion, redis and s3 services in the CI. I am producing a code coverage report with this test run. Here is the command I run in the CI.

I am using Gitlab.com CI with the shared runners.

vendor/bin/phpunit --verbose --testdox --colors=never --exclude slow --stop-on-error --stop-on-failure --coverage-clover coverage.xml

Everythings works fine. The testing process is long, but expected. Every test takes between 0.5 and 2.5 seconds in general. But later on, when I have about 1000 lines of output in my CI job console, my tests suddenly take about 5-7 minutes each to complete. Eventually, I end up hitting the timeout and the job fails. Here is an excerpt of the PHPUnit output with the timings.

Reservable Controller
 ✔ List reservables for organization [1433.97 ms]
 ✔ List reservables for organization does not show private scoped reservables [407.01 ms]
 ✔ List reservables for organization filter by category [398.82 ms]
User Controller
 ✔ Index [1093.55 ms]
 ✔ Index with user type filter [720.10 ms]
 ✔ Index with university affiliation filter [8091.16 ms]
 ✔ Index with profile and formation [15232.75 ms]
Organization Controller
 ✔ Me [1549.88 ms]
Publication Controller
 ✔ List publication for specific user [555.23 ms]
 ✔ List publication for specific user with dois [439.23 ms]
Activity Report
 ✔ Get new or updated dataset count for user projects [2517.39 ms]
 ✔ Get new or updated dataset count for user projects with multiple projects [2703.64 ms]
 ✔ Get new or updated dataset count for user projects count only updated [2061.95 ms]
 ✔ Get team most active user [5881.47 ms]
 ✔ Get team most active user ex aequo [18263.78 ms]
 ✔ Get team most active user none [445702.22 ms]
 ✔ Get team most active user past activity doesnt count [232573.56 ms]
ERROR: Job failed: execution took longer than 1h0m0s seconds

We can clearly see that the time it takes to complete drastically changes at Get team most active user ex aequo, which is very similar in terms of work load than the test just before.

I also noted that:

  • When I run the tests on my local machine, there is no drastical change in test time to complete.
  • I also tagged all the tests from Activity Report as @group slow and when running the tests again by excluding those, other unrelated tests were slow after, again, a thousand lines of output.

Here is the relevant part of my .gitlab.ci.yml file

services:
  - mariadb:10.3
  - name: minio/minio
    alias: minio
    command: ['server', '--address', ':9005', '/data']
  - redis
  - name: docker:dind
    entrypoint: [ "env", "-u", "DOCKER_HOST" ]
    command: [ "dockerd-entrypoint.sh" ]

backend_tests:
  image: registry.gitlab.com/redacted/redacted
  stage: test
  interruptible: true
  dependencies:
    - composer
  artifacts:
    when: always
    paths:
      - src/api/storage/logs/
    expire_in: 1 day
  variables:
    XDEBUG_MODE: coverage
    SENTRY_RELEASE: $CI_COMMIT_SHORT_SHA
  rules:
    - if: $CI_MERGE_REQUEST_IID
      changes:
        - src/api/*
    - if: $CI_COMMIT_TAG
      changes:
        - src/api/*
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      changes:
        - src/api/*
  script:
    - cd src/api
    - cp .env.gitlab .env
    - rm phpunit.xml && cp phpunit.gitlab.xml phpunit.xml
    - php artisan key:generate
    - php artisan migrate:fresh
    - php artisan db:seed --class TestingSeeder
    - vendor/bin/phpunit --verbose --testdox --colors=never --exclude slow --stop-on-error --stop-on-failure --coverage-clover coverage.xml
    - bash <(curl -s https://codecov.io/bash)

I don’t really have a good idea where to start to diagnose this problem. It takes a lot of time to try something out as each iteration takes 20 min to run and also I am burning my CI minutes very fast doing so.

My gut feeling is that it seems like if the runner was suddenly starting to swap memory on disk. But this is only a wild guest based on nothing.

Anyone has experienced something similar or has an idea to where to start looking?

I have been doing some tests to see if it was the memory the problem.

I added memory_get_peak_usage in the teardown function in the base testcase so I can see in the logs how much memory phpunit is using. It was fairly low. Always below 100 MB.

As my tests are running in a docker container, I tried measuring on my own computer how much memory the container was using. So I ran my container on my machine and I wrapped it with this code:

Unfortunately I forgot to note the source of the script. I was from someone on Stackoverflow

#!/bin/bash

ppid=$$
maxmem=0

$@ &
pid=`pgrep -P ${ppid} -n -f $1` # $! may work here but not later
while [[ ${pid} -ne "" ]]; do
    #mem=`ps v | grep "^[ ]*${pid}" | awk '{print $8}'`
        #the previous does not work with MPI
        mem=`cat /proc/${pid}/status | grep VmRSS | awk '{print $2}'`
    if [[ ${mem} -gt ${maxmem} ]]; then
        maxmem=${mem}
    fi
    sleep 1
    savedpid=${pid}
    pid=`pgrep -P ${ppid} -n -f $1`
    echo -e "Memory usage for $@ is: ${maxmem} KB."
done
wait ${savedpid} # don't wait, job is finished
exitstatus=$?   # catch the exit status of wait, the same of $@
echo -e "Memory usage for $@ is: ${maxmem} KB. Exit status: ${exitstatus}\n"

Surprisingly, the full phpunit run used more than 6GB of RAM on my system and I had seen some places that the Gitlab.com shared runners have a maximum of 4GB of RAM. So my guess might be right about memory swapping occurring.

So the problem does not seem to be Gitlab CI but a memory leak occurring in Docker.

Did you ever find a solution to this? I feel like I may be having the same issue.