CI job causing segmentation fault only from GitLab - memory issue?

I am attempting to set up GitLab CI/CD with my project, with the GitLab instance hosted internal to my company, and the gitlab-runner hosted on my project’s separate virtual machine. I’m using the bash shell in CentOS 7. When I use the command "gitlab-runner exec shell ", the program that the job runs executes without issues. However, when I commit to my repository and GitLab executes the runner, the program generates a segmentation fault and the job fails. This made me wonder whether the runner is hitting some kind of memory limit when it tries to run the program? We got seg faults on running the program in our own environment when our machine only had 4GB of memory allocated; they disappeared when that was raised to 16GB. I’m totally new to GitLab CI/CD, and have not had much luck searching for answers on this issue, so any feedback or ideas on how to proceed would be greatly appreciated!

Hi,

please share the GitLab version you are using, /help at your server’s website. Also, please add the content of your .gitlab-ci,yml to get a better idea about your CI pipeline and executed jobs. Last, please also add some details about the project itself, i.e. are you compiling C++ code, or doing some resource intensive package building?

Cheers,
Michael

Thank you for your response!

GitLab version: GitLab Community Edition 12.6.4

Contents of my .gitlab-ci.yml file:

stages:
- sweep
- deploy

sweep:
stage: sweep
script:
- make all
- bash test.sh

deploy:
stage: deploy
only:
- master
script:
- echo “Do your deploy here”

The project is a large Fortran codebase that does analysis of electrical power systems. I am trying to build the Fortran program that does the analysis (this succeeds) and then run it on some test cases (this generates the segmentation fault). The test.sh script calls a perl script which runs a sweep of test cases, passing the appropriate input files and command line arguments to the second perl script that calls the Fortran program. Everything runs up until the Fortran program, which starts, then hits a segmentation fault.

Hi,

sounds interesting. I have zero knowledge about Fortran but I could imagine that the environment inside the GitLab runner and shell executor is different to a “normal” Linux shell.

In such situations, I’ll try to go the iterative way: Reduce the amount of tests fired, and see at which point the tests start failing. Maybe there is one which does “something weird” with memory allocations.

What happens if you let the program run with gdb and writing the stack trace then? Does that crash too, or does it survive due to the slowed down debugging vm?

Maybe the stack trace from the segfault provides some indications of what’s going wrong here.

Cheers,
Michael

1 Like

Thank you for the advice! Unfortunately every test is the same - run the entire program beginning to end - but with different inputs. I just tried running a single test and still got the segmentation fault. I’m not sure how to set it up to run with gdb as I haven’t used it before, but I will do some research and give that a try.

1 Like

So, it turns out that it was a stack size issue in the bash shell that the gitlab-runner was using. I added an increase to the stack size (ulimit -S -s 1000000) into my test script before running the program, and it was able to run.

3 Likes

Hey there. I have similiar experiences

 image: openfoamplus/of_v2006_centos73

stages:          # List of stages for jobs, and their order of execution
  - build

before_script:
  - set +euo pipefail;. /opt/OpenFOAM/setImage_v2006.sh;


build-job:       # This job runs in the build stage, which runs first.
 
  stage: build
  script: 
    - cd openfoam_ras_T106C
    - gmshToFoam T106_3D.msh
    - createPatch -overwrite
    - checkMesh

When executing the commands in a terminal nothing is failing. When executed by the gitlabCI, i have a Segmentation fault

Created fresh repository.
Checking out 84a8d6cc as main...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:02
Using docker image sha256:f37ab3b17c2dc1fdf1a0e497a0312038b289b3882f5f320dd4e380d71d7c97c4 for openfoamplus/of_v2006_centos73 with digest openfoamplus/of_v2006_centos73@sha256:45438eaff7ab8522eaf8ff48c5dfaa1ef85a25f1ecaa7d19648d9be22278d3ce ...
$ set +euo pipefail;. /opt/OpenFOAM/setImage_v2006.sh;
$ cd openfoam_ras_T106C
$ gmshToFoam T106_3D.msh
/*---------------------------------------------------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  v2006                                 |
|   \\  /    A nd           | Website:  www.openfoam.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
Build  : v2006 OPENFOAM=2006
Arch   : "LSB;label=32;scalar=64"
Exec   : gmshToFoam T106_3D.msh
Date   : Jul 26 2022
Time   : 08:00:55
Host   : runner-y2tsxzyg-project-2588-concurrent-0
PID    : 332
I/O    : uncollated
Case   : /builds/tfd-institute-of-turbomachinery-and-fluid-dynamics/simulations/openfoam_simplefoam_ras_compressor_cascade/openfoam_ras_T106C
nProcs : 1
trapFpe: Floating point exception trapping enabled (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 5, maxFileModificationPolls 20)
allowSystemOperations : Allowing user-supplied system call operations
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time
#0  Foam::error::printStack(Foam::Ostream&) at ??:?
#1  Foam::sigSegv::sigHandler(int) at ??:?
#2  ? in /lib64/libpthread.so.0
#3  ? at ??:?
#4  __libc_start_main in /lib64/libc.so.6
#5  ? at ??:?
/usr/bin/bash: line 126:   332 Segmentation fault      (core dumped) gmshToFoam T106_3D.msh
$ createPatch -overwrite

ulimit is unlimited in the CI. This does not help. We are using GitLab 15.0

1 Like

I’m unfamiliar with OpenFoam, what is happening with those build steps? It seems it does sort of compiling of code and run something.

Quick thoughts:

  • Enable verbose / debug logging to see where exactly the things fail
  • Enable core dumps being written to manually reproduce the error later with a debugger
  • Try a different runner, self-hosted, with more resources.

Sorry that I forgot about this issue.

The command is reading in a file and it is saving the data into ASCII-based files. I checked the ressources of the runner and they don’t seem to be the issue.

$ . /opt/OpenFOAM/setImage_v2006.sh || true;
$ cd openfoam_ras_T106C
$ ls -l
total 16704
drwxrwxrwx. 2 root root       82 Apr 18 18:15 0
-rw-rw-rw-. 1 root root 17093794 Apr 18 18:15 T106C_3D.msh
drwxrwxrwx. 2 root root       66 Apr 18 18:15 constant
-rw-rw-rw-. 1 root root      524 Apr 18 18:15 deltavalues.py
-rw-rw-rw-. 1 root root      372 Apr 18 18:15 slurmjob.sh
drwxrwxrwx. 2 root root      143 Apr 18 18:15 system
$ gmshToFoam T106_3D.msh
$ . /opt/OpenFOAM/setImage_v2006.sh || true;
$ cd openfoam_ras_T106C
$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63155
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I will be trying out your thoughts. I did not came up with anything different until now…