Troubleshooting Gitlab Crashes

rcoyle23 · October 8, 2024, 7:19pm

Looking to troubleshoot problems where our Gitlab instance will just appear to die. We won’t be able to connect to it and it doesn’t respond to requests. Normally its almost always solved with a simple reboot but we’d like to understand the root cause of why this might be happening. Current on Gitlab-CE 16.11.8 (working on porting the instance over to 17.x)

I’ve looked through the logs documentation but not sure which log going have the information I might need, and I’ve looked at a few, but nothing is really standing out. Any assistance is appreciated.

dnsmichi · October 8, 2024, 7:35pm

By “it” do you mean port 80, 443 and 22 with http and ssh clients? Or, does the entire server not respond to any sort of connection or ICMP request? If the latter, start the investigation in the system log first before going into the application (GitLab) logs.

Also, infrastructure monitoring can provide insights on trends and graphs over time for CPU, memory, IO, disk usage, network usage, etc. and help with root cause analysis and error correlation. Tools like Prometheus, Zabbix, checkmk can be helpful for monitoring.

Speaking from my own experience, one situation could be that the hardware is undersized and processes fight for resources, causing OS freezes or out of memory crashes. Or there is a Linux OS bug or issue that causes instabilities. Or disk usage is going up to 100% and a reboot only buys some time, clearing out caches until they are full again.

Which leads me to questions

Which Linux distribution, and which version (e.g. Ubuntu 24.04 LTS)
How much CPU, RAM, disk assigned
How many GitLab projects, groups, users
VM, bare metal, cloud, or containers?
Screenshot of sudo htop (if not installed, yum/apt/zypper install htop)

rcoyle23 · October 8, 2024, 7:52pm

Here’s the htop. It’s a small number of users/projects…I think 5 users? It’s cloud, in AWS, using a t2.medium running Ubuntu 18.04.6 LTS.

Here’s the disk usage:
Screenshot 2024-10-08 155140

dnsmichi · October 8, 2024, 8:06pm

AFAIK regular support for 18.04 ended in 2023. Might be worthwhile to consider an upgrade to 22 or 24 LTS, and rule out performance issues due to old/outdated software.

RAM usage is high, and 4GB might be too little. Searching the syslog for crashes and out of memory errors would be my first approach.

iwalker · October 9, 2024, 6:33am

I would suggest moving to 8GB, 4GB just doesn’t cut it and without tuning which won’t do that much, it’s worth more hassle than just having a machine with 8GB. Even after tuning (I tried), it will use 3.5GB and swap, which means your VM runs slow, susceptible to OOM killer, which is what you are experiencing when Gitlab dies. What @dnsmichi wrote pretty much hits the nail on the head.

dnsmichi · October 9, 2024, 2:00pm

Good call. A faster iteration can be increasing resources. I’d recommend 8 GB RAM, and maybe 4 CPUs instead of 2. And when the system runs again, you can conduct a root cause analysis without stress.

Topic		Replies	Views
GitLab VM becoming unresponsive every day around same time Self-managed	12	1821	November 21, 2022
Gitlab dying at least once a day How to Use GitLab	1	1138	May 29, 2018
Gitlab stalls or freezes Infrastructure as Code & Cloud Native	1	561	November 22, 2019
Cant access gitlab because High CPU? General	1	353	March 23, 2023
Gitlab interaction with git for merge requests is dead General	0	221	January 19, 2022

Troubleshooting Gitlab Crashes

Related topics