Troubleshooting Gitlab Crashes

Looking to troubleshoot problems where our Gitlab instance will just appear to die. We won’t be able to connect to it and it doesn’t respond to requests. Normally its almost always solved with a simple reboot but we’d like to understand the root cause of why this might be happening. Current on Gitlab-CE 16.11.8 (working on porting the instance over to 17.x)

I’ve looked through the logs documentation but not sure which log going have the information I might need, and I’ve looked at a few, but nothing is really standing out. Any assistance is appreciated.

By “it” do you mean port 80, 443 and 22 with http and ssh clients? Or, does the entire server not respond to any sort of connection or ICMP request? If the latter, start the investigation in the system log first before going into the application (GitLab) logs.

Also, infrastructure monitoring can provide insights on trends and graphs over time for CPU, memory, IO, disk usage, network usage, etc. and help with root cause analysis and error correlation. Tools like Prometheus, Zabbix, checkmk can be helpful for monitoring.

Speaking from my own experience, one situation could be that the hardware is undersized and processes fight for resources, causing OS freezes or out of memory crashes. Or there is a Linux OS bug or issue that causes instabilities. Or disk usage is going up to 100% and a reboot only buys some time, clearing out caches until they are full again.

Which leads me to questions

  • Which Linux distribution, and which version (e.g. Ubuntu 24.04 LTS)
  • How much CPU, RAM, disk assigned
  • How many GitLab projects, groups, users
  • VM, bare metal, cloud, or containers?
  • Screenshot of sudo htop (if not installed, yum/apt/zypper install htop)

Here’s the htop. It’s a small number of users/projects…I think 5 users? It’s cloud, in AWS, using a t2.medium running Ubuntu 18.04.6 LTS.

Here’s the disk usage:
Screenshot 2024-10-08 155140

AFAIK regular support for 18.04 ended in 2023. Might be worthwhile to consider an upgrade to 22 or 24 LTS, and rule out performance issues due to old/outdated software.

RAM usage is high, and 4GB might be too little. Searching the syslog for crashes and out of memory errors would be my first approach.

1 Like

I would suggest moving to 8GB, 4GB just doesn’t cut it and without tuning which won’t do that much, it’s worth more hassle than just having a machine with 8GB. Even after tuning (I tried), it will use 3.5GB and swap, which means your VM runs slow, susceptible to OOM killer, which is what you are experiencing when Gitlab dies. What @dnsmichi wrote pretty much hits the nail on the head.

2 Likes

Good call. A faster iteration can be increasing resources. I’d recommend 8 GB RAM, and maybe 4 CPUs instead of 2. And when the system runs again, you can conduct a root cause analysis without stress.

2 Likes