I am attempting to set up GitLab CI/CD with my project, with the GitLab instance hosted internal to my company, and the gitlab-runner hosted on my project’s separate virtual machine. I’m using the bash shell in CentOS 7. When I use the command "gitlab-runner exec shell ", the program that the job runs executes without issues. However, when I commit to my repository and GitLab executes the runner, the program generates a segmentation fault and the job fails. This made me wonder whether the runner is hitting some kind of memory limit when it tries to run the program? We got seg faults on running the program in our own environment when our machine only had 4GB of memory allocated; they disappeared when that was raised to 16GB. I’m totally new to GitLab CI/CD, and have not had much luck searching for answers on this issue, so any feedback or ideas on how to proceed would be greatly appreciated!
please share the GitLab version you are using,
/help at your server’s website. Also, please add the content of your .gitlab-ci,yml to get a better idea about your CI pipeline and executed jobs. Last, please also add some details about the project itself, i.e. are you compiling C++ code, or doing some resource intensive package building?
Thank you for your response!
GitLab version: GitLab Community Edition 12.6.4
Contents of my .gitlab-ci.yml file:
- make all
- bash test.sh
- echo “Do your deploy here”
The project is a large Fortran codebase that does analysis of electrical power systems. I am trying to build the Fortran program that does the analysis (this succeeds) and then run it on some test cases (this generates the segmentation fault). The test.sh script calls a perl script which runs a sweep of test cases, passing the appropriate input files and command line arguments to the second perl script that calls the Fortran program. Everything runs up until the Fortran program, which starts, then hits a segmentation fault.
sounds interesting. I have zero knowledge about Fortran but I could imagine that the environment inside the GitLab runner and shell executor is different to a “normal” Linux shell.
In such situations, I’ll try to go the iterative way: Reduce the amount of tests fired, and see at which point the tests start failing. Maybe there is one which does “something weird” with memory allocations.
What happens if you let the program run with gdb and writing the stack trace then? Does that crash too, or does it survive due to the slowed down debugging vm?
Maybe the stack trace from the segfault provides some indications of what’s going wrong here.
Thank you for the advice! Unfortunately every test is the same - run the entire program beginning to end - but with different inputs. I just tried running a single test and still got the segmentation fault. I’m not sure how to set it up to run with gdb as I haven’t used it before, but I will do some research and give that a try.
So, it turns out that it was a stack size issue in the bash shell that the gitlab-runner was using. I added an increase to the stack size (ulimit -S -s 1000000) into my test script before running the program, and it was able to run.
Hey there. I have similiar experiences
image: openfoamplus/of_v2006_centos73 stages: # List of stages for jobs, and their order of execution - build before_script: - set +euo pipefail;. /opt/OpenFOAM/setImage_v2006.sh; build-job: # This job runs in the build stage, which runs first. stage: build script: - cd openfoam_ras_T106C - gmshToFoam T106_3D.msh - createPatch -overwrite - checkMesh
When executing the commands in a terminal nothing is failing. When executed by the gitlabCI, i have a Segmentation fault
Created fresh repository. Checking out 84a8d6cc as main... Skipping Git submodules setup Executing "step_script" stage of the job script 00:02 Using docker image sha256:f37ab3b17c2dc1fdf1a0e497a0312038b289b3882f5f320dd4e380d71d7c97c4 for openfoamplus/of_v2006_centos73 with digest openfoamplus/of_v2006_centos73@sha256:45438eaff7ab8522eaf8ff48c5dfaa1ef85a25f1ecaa7d19648d9be22278d3ce ... $ set +euo pipefail;. /opt/OpenFOAM/setImage_v2006.sh; $ cd openfoam_ras_T106C $ gmshToFoam T106_3D.msh /*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: v2006 | | \\ / A nd | Website: www.openfoam.com | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : v2006 OPENFOAM=2006 Arch : "LSB;label=32;scalar=64" Exec : gmshToFoam T106_3D.msh Date : Jul 26 2022 Time : 08:00:55 Host : runner-y2tsxzyg-project-2588-concurrent-0 PID : 332 I/O : uncollated Case : /builds/tfd-institute-of-turbomachinery-and-fluid-dynamics/simulations/openfoam_simplefoam_ras_compressor_cascade/openfoam_ras_T106C nProcs : 1 trapFpe: Floating point exception trapping enabled (FOAM_SIGFPE). fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 5, maxFileModificationPolls 20) allowSystemOperations : Allowing user-supplied system call operations // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time #0 Foam::error::printStack(Foam::Ostream&) at ??:? #1 Foam::sigSegv::sigHandler(int) at ??:? #2 ? in /lib64/libpthread.so.0 #3 ? at ??:? #4 __libc_start_main in /lib64/libc.so.6 #5 ? at ??:? /usr/bin/bash: line 126: 332 Segmentation fault (core dumped) gmshToFoam T106_3D.msh $ createPatch -overwrite
ulimit is unlimited in the CI. This does not help. We are using GitLab 15.0
I’m unfamiliar with OpenFoam, what is happening with those build steps? It seems it does sort of compiling of code and run something.
- Enable verbose / debug logging to see where exactly the things fail
- Enable core dumps being written to manually reproduce the error later with a debugger
- Try a different runner, self-hosted, with more resources.
Sorry that I forgot about this issue.
The command is reading in a file and it is saving the data into ASCII-based files. I checked the ressources of the runner and they don’t seem to be the issue.
$ . /opt/OpenFOAM/setImage_v2006.sh || true; $ cd openfoam_ras_T106C $ ls -l total 16704 drwxrwxrwx. 2 root root 82 Apr 18 18:15 0 -rw-rw-rw-. 1 root root 17093794 Apr 18 18:15 T106C_3D.msh drwxrwxrwx. 2 root root 66 Apr 18 18:15 constant -rw-rw-rw-. 1 root root 524 Apr 18 18:15 deltavalues.py -rw-rw-rw-. 1 root root 372 Apr 18 18:15 slurmjob.sh drwxrwxrwx. 2 root root 143 Apr 18 18:15 system $ gmshToFoam T106_3D.msh
$ . /opt/OpenFOAM/setImage_v2006.sh || true; $ cd openfoam_ras_T106C $ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63155 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
I will be trying out your thoughts. I did not came up with anything different until now…