Sorry, asked internally and am busy with other tasks. Here’s an update:
We were not able to identify a change on the runner infrastructure, and therefore only looked into logs from a fast (4.5min) and slow (60+min) job - the timings of the api-lint jobs seem to vary randomly.
The slow job logs ends with:
Found 0 of 1582 files that can be fixed in 18.746 seconds, 30.000 MB memory used
Warning: "findUnusedBaselineEntry" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Warning: "findUnusedCode" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Terminated
Please fix psalm errors : vendor/bin/psalm
WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts
while the fast job log ends with:
Found 0 of 1582 files that can be fixed in 18.397 seconds, 30.000 MB memory used
Warning: "findUnusedBaselineEntry" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Warning: "findUnusedCode" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Warning: "findUnusedBaselineEntry" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Warning: "findUnusedCode" will default to "true" in Psalm 6. You should explicitly enable or disable this setting.
Target PHP version: 8.3 (inferred from composer.json).
Scanning files...
Analyzing files...
░░░░░░░░░░░░
------------------------------
No errors found!
------------------------------
847 other issues found.
You can display them with
--show-info=true
------------------------------
Checks took 51.09 seconds and used 374.328MB of memory
Psalm was able to infer types for 96.8692% of the codebase
The code is ready to be merged
Saving cache for successful job
00:05
Creating cache default-non_protected...
Api/vendor/: found 18490 matching artifact files and directories
Api/.php-cs-fixer.cache: found 1 matching artifact files and directories
Api/.psalm/: found 9843 matching artifact files and directories
Uploading cache.zip to https://storage.googleapis.com/gitlab-com-runners-cache/project/19328996/default-non_protected
Created cache
Cleaning up project directory and file based variables
00:01
Job succeeded
Since the CI/CD jobs only run a catch-all script, it is hard to get more tracing insights. To debug further, we’d suggest to add timing points/log calls and traces to the CI/CD scripts itself, and identify the bottlenecks better.
Following the Git commit sha in the MR with the slow job above, Api/linters.sh · 1649cf4ed42a5645ab77f591f6a3aba49d4b17f4 · Eternaltwin / Mush / Mush · GitLab is the script we looked at specifically.
phpmd?
Another finding was that the script uses phpmd
which might have changed with the PHP upgrade on May 26th, causing this behavior.
The CLI command has a more verbose mode which could help with debugging. GitHub - phpmd/phpmd: PHPMD is a spin-off project of PHP Depend and aims to be a PHP equivalent of the well known Java tool PMD. PHPMD can be seen as an user friendly frontend application for the raw metrics stream measured by PHP Depend.
CI/Docker changes
The only difference I can spot between the two job is that I pushed a new Docker image for CI (upgrading PHP from 8.3.2 to 8.3.7) and install some new dependencies (ext-protobuf
and some existing packages upgrades)
In the feat/opentelemetry branch, I found the protobuf changes being reverted. refactor: Remove protobuf install from CI Docker image and bypass... (a6b00114) · Commits · Eternaltwin / Mush / Mush · GitLab
Is there a way for you to use an older version of the container image/installed software, and see if the problem goes away e.g. with an older PHP version? I only see latest
in Container Registry · Eternaltwin / Mush / Mush · GitLab though.
While inspecting the Git history to see what exactly happened in the PHP upgrade change, I also learned that the base image is now alpine and did not change anything. ci: Migrate PHP CI image to Alpine (d20ad8b6) · Commits · Eternaltwin / Mush / Mush · GitLab and more changes.
Looking into the PHP bug tracker to see if there are bug reports with 8.3.7, I’ve found an infinite loop issue PHP 8.3.7 with JIT encounters infinite loop on specific paths · Issue #14475 · php/php-src · GitHub which suggests to run the scripts with strace
– can also be an option to run that in CI/CD and actually get the blocking syscalls. Beware that strace is very verbose though.
Not sure where to dig deeper here, I’d suggest focussing on adding the log timing points to the linter.sh
script first.