Pipelines running Codeception / PHPunit tests takes way too much time

Hello,

Problem to solve

Our job running Codeception (which is based on PHPUnit) tests became suddenly slow around the 7th May 2024, 5:30PM UTC.

Here is a fast job report which took 5 minutes : api-test (#6796639610) · Jobs · Eternaltwin / Mush / Mush · GitLab

And here is a slow job report which took 42 minutes (!!!) : api-test (#6800520977) · Jobs · Eternaltwin / Mush / Mush · GitLab

Even if more tests were added, the time increase is clearly abnormal, as the suite is still fast when run locally.

A similar issue has been raised by another user, who deleted their post since : GitLab Runner v17 appears to be taking significantly longer to run PHPUnit tests than v16

Note : it seems the tests themselves are still executed fast, but the delay between each test is enormous.

Steps to reproduce

Running the api-test job on our repository.

I may reproduce this in a tiny repo if needed.

Configuration

Here is our .gitlab-ci.yml · develop · Eternaltwin / Mush / Mush · GitLab

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

Thank you in advance for your help.

We’ve been seeing exactly the same problems, starting at the same time mentioned above. Gitlab CI/CD is borderline unusable for us at the moment with tests taking almost an hour to run.

We have seen similar, step change in performance on 7th May, a job has gone from ~20 minutes to ~170 minutes.

Thanks for the tag! I removed my post because I determined this was not strictly an issue between GitLab Runner v16 and v17, but something else has certainly changed with the updates over the last few days.

For example, GitLab’s shared SaaS runners were updated to use Docker Engine v23 instead of v19.

I’m not seeing delays between our test cases, but each test case is now taking multiple seconds to run (we use Pest, which wraps PHPUnit, which may be causing this difference). For me, it’s particularly ones that hit the MySQL service database provisioned by GitLab through .gitlab-ci.yml.

I ruled out the engine change causing any weirdness by manually running my tests inside Docker Engine v23 locally. My next hunch is network latency with reaching out to our MySQL container. I will leave a comment if I learn anything further.

Here is my now-deleted post for posterity

We’ve been running our PHP (Laravel) project through Pipelines for a few months now, with one of those steps being to run PHPUnit tests. Up until a week ago, this step was taking just under 3 minutes to run. Today, on the same commit, that same step is taking around 23 minutes.

We’re using a custom PHP 8.3.6 Docker image and MySQL 8.3. The hashes for these images have remained the same. Based on Pipeline logs, the only obvious thing that has changed is our GitLab Runner version:

  • Before: Running with gitlab-runner 16.11.0~pre.21.gaa21be2d (aa21be2d) on blue-5.saas-linux-small-amd64.runners-manager.gitlab.com/default -AzERasQ, system ID: s_4cb09cee29e2 feature flags: FF_USE_IMPROVED_URL_MASKING:true
  • After: Running with gitlab-runner 17.0.0~pre.88.g761ae5dd (761ae5dd) on green-4.saas-linux-small-amd64.runners-manager.gitlab.com/default ntHFEtyX, system ID: s_8990de21c550

All the setup steps take the same amount of time to run, including steps that interact with a seed MySQL. The only step that is taking longer is running unit tests (each test case seems to consistently have some significant overhead).

I tried comparing gitlab-runner source code versions to see if I could spot anything obvious, but I’m in over my head. I also can’t find any obvious release notes anywhere indicating what might have happened here.

The ask

I haven’t posted my .gitlab-ci.yml or Dockerfile (but can if there’s interest) because I’m primarily trying to validate my hunch that some change with Runner caused this issue I’m having (and there are admittedly a TON of weird potential variables that would make raw copies of those files not very helpful).

If you’re seeing unexpectedly elevated Pipeline run times over the past week or so or have an idea of what might have changed between Runner versions, please leave a comment! Thanks.

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

  • GitLab: 17.0.0-pre 76a706b4d69
  • GitLab Runner, if self-hosted: Not self hosted

Can confirm this issue is happening at the database layer for us. Swapped MySQL for SQLite and while not all tests pass (due to engine incompatibilities), speeds are back to expected. Next, will try and inline MySQL in the same container image.

I see OP is using a PostgreSQL image, so seems like it may not be isolated to the engine but suggests some form of communication latency.

What the heck. Tests are randomly back to normal execution speed without any other changes over multiple runs. Wondering if there are inconsistencies between different shared runner runtimes. Will pursue my previous theory if this reappears. Curious what everyone else in this thread is seeing.

We upgraded our runner to medium by adding the tag

    tags:
        - saas-linux-medium-amd64

which has brought things back to an acceptable speed.

Our job also had a postgres service in the background - I wonder if that is a key feature - if the runner upgrade means there’s less memory available and we’ve started hitting the point where memory is now a limiting factor when it wasn’t before.

I have an active support case with gitlab

Thank you for your insights !

On our side, jobs also came back to their usual speeds earlier today.

Interesting. I tried that as well last night, but with negligible speed improvements. We’ve been on saas-linux-small-amd64 since the beginning. If you learn anything new, would love to hear about it!

Well that sure is something! Appreciate the additional data point. Guess I’ll write this off as an anomaly for now :slight_smile:

Had this feedback from gitlab support

So we indeed recently made some changes to the VMs executing jobs for our saas-linux-* runners.

This caused some customers to run into problems when getting close to the available space. However in these cases the jobs failed with no space left on device. I am now wondering if this could also lead to slower performance though as you describe.

We realised this yesterday and the following issue was created to Increase disk size of saas-linux-small-amd64 ephemeral runners.

They suggested we try again with the old tag, and it is again back to the previous runtime - so it looks like they have fixed the underlying issue.

1 Like

I can’t see the link you posted, but I stumbled across the same thing here: Increase disk size for saas-linux-* Hosted Runners for GitLab.com (#217) · Issues · GitLab.org / Ops Sub-Department / shared-runners / infrastructure · GitLab

Weird that in my case upgrading my runner size didn’t work last night, but definitely explains what you experienced. While my own container should’ve been pretty small, it’s more likely the database container ran into some perf issues. Oh well! Glad things are back to normal.