GitLab crashes with a 502 error randomly

Problem to solve

Frequently throughout the day, our GitLab Omnibus will “restart” (we have watched sudo gitlab-ctl status during the issue and nothing is down). We have dug extensively through the logs and found no direct correlation to the problem. We have had suspicions that it was related to reviewing issues or merge requests, but we have also had no one using the application then gone to view a page and get hit with a 502 error.

Steps to reproduce

We do not know how to reproduce, but have dug extensively through the logs. This issue started happening towards the end of last year but slowly seems to get worse. Our debugging process has gone like this:

sudo gitlab-ctl tail > gitlab.log
# Reproduce the 502 crash error (usually by spamming interactions on the clients: comments, reactions, issues, approvals, etc.)
sudo journalctl --since="DATE TIME" --until="DATE TIME" > journal.log
less gitlab.log # Grep for (502|error|status)
less journal.log

We looked for the first 502 error in the file, then looked for preceding errors. We did this multiple times, most of which had no errors until about 5-10 seconds before the crash, some up to 30 seconds. The only errors we found (some which only had a severity of WARN were: IpynbDiff::InvalidNotebookError, MergeRequest.mergeError, BlobViewer.renderError. None of these were consistent across the crashes. We also tried disabling certain features such as pages and SMTP without any success.

The message preceding the 502 errors after the crash is:

badgateway: failed to receive response: dial unix /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket: connect: connection refused

The CPU/RAM do not spike when the crash occurs. We watched btop while a crash occurred and the CPU graph is at ~0% when it occurs, then about 20-30% during the “startup”. We have tried doubling CPU cores and RAM before without success.

Please suggest more places we should be looking for error logs as so far this problem seems to be silent. Simple things like sudo gitlab-ctl status show no issues.

Configuration

  • GitLab Omnibus v16.9.2-ee
  • Red Hat Enterprise Linux 8.5 (Linux localhost.localdomain 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Tue Apr 12 11:20:32 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
  • This system is a VM running on an “identical” system.
  • This VM has 4 cores and 8 GB of RAM with the CPU never peaking and average 4 GB RAM usage.
  • GitLab is the only thing installed and running on the system
  • We have about 4 “active” users on the server.
gitlab.rb
external_url 'https://gitlab.DOMAIN.com'
gitlab_rails['smtp_enable'] = false
gitlab_rails['smtp_address'] = "smtp.office365.com"
gitlab_rails['smtp_port'] = 587
gitlab_rails['smtp_user_name'] = "noreply@DOMAIN.com"
gitlab_rails['smtp_password'] = "PASSWORD"
gitlab_rails['smtp_domain'] = "DOMAIN.com"
gitlab_rails['smtp_authentication'] = "login"
gitlab_rails['smtp_enable_starttls_auto'] = true
gitlab_rails['smtp_openssl_verify_mode'] = 'peer'
gitlab_rails['gitlab_email_from'] = 'noreply@DOMAIN.com'
gitlab_rails['gitlab_email_display_name'] = 'GitLab'
gitlab_rails['gitlab_email_subject_suffix'] = 'GitLab'
puma['worker_processes'] = 0 # 2
sidekiq['concurrency'] = 10
nginx['ssl_certificate'] = "/etc/gitlab/ssl/gitlab.crt"
nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/gitlab.key"
pages_external_url "https://pages.DOMAIN.com"
gitlab_pages['enable'] = true
gitlab_pages['listen_proxy'] = "127.0.0.1:8090"
gitlab_pages['access_control'] = false
gitlab_pages['namespace_in_path'] = true
pages_nginx['enable'] = true
pages_nginx['redirect_http_to_https'] = true
pages_nginx['ssl_certificate'] = "/etc/gitlab/ssl/pages.DOMAIN.com.crt"
pages_nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/pages.DOMAIN.com.key"
prometheus['enable'] = false
prometheus_monitoring['enable'] = false
gitaly['configuration'][:hooks][:custom_hooks_dir] = "/var/opt/gitlab/gitaly/custom_hooks"

sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:9229          0.0.0.0:*               LISTEN      1665/gitlab-workhor
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      537963/puma 6.4.0 (
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      349202/nginx: maste
tcp        0      0 127.0.0.1:8082          0.0.0.0:*               LISTEN      361886/sidekiq_expo
tcp        0      0 127.0.0.1:9236          0.0.0.0:*               LISTEN      1693/gitaly
tcp        0      0 127.0.0.1:8150          0.0.0.0:*               LISTEN      1644/gitlab-kas
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1230/sshd
tcp        0      0 127.0.0.1:8151          0.0.0.0:*               LISTEN      1644/gitlab-kas
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      340886/master
tcp        0      0 127.0.0.1:8153          0.0.0.0:*               LISTEN      1644/gitlab-kas
tcp        0      0 127.0.0.1:8090          0.0.0.0:*               LISTEN      349092/gitlab-pages
tcp        0      0 127.0.0.1:8154          0.0.0.0:*               LISTEN      1644/gitlab-kas
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      349202/nginx: maste
tcp        0      0 127.0.0.1:8155          0.0.0.0:*               LISTEN      1644/gitlab-kas
tcp        0      0 127.0.0.1:8092          0.0.0.0:*               LISTEN      361884/sidekiq 7.1.
tcp        0      0 0.0.0.0:8060            0.0.0.0:*               LISTEN      349202/nginx: maste
tcp6       0      0 :::22                   :::*                    LISTEN      1230/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      340886/master
udp        0      0 127.0.0.1:323           0.0.0.0:*                           1172/chronyd
udp6       0      0 ::1:323                 :::*                                1172/chronyd

Versions

Versions
sudo gitlab-rake gitlab:env:info
System information
System:
Proxy:          no
Current User:   git
Using RVM:      no
Ruby Version:   3.1.4p223
Gem Version:    3.5.5
Bundler Version:2.5.5
Rake Version:   13.0.6
Redis Version:  7.0.15
Sidekiq Version:7.1.6
Go Version:     unknown

GitLab information
Version:        16.9.2-ee
Revision:       0d71d32d321
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     14.10
URL:            https://gitlab.DOMAIN.com
HTTP Clone URL: https://gitlab.DOMAIN.com/some-group/some-project.git
SSH Clone URL:  git@gitlab.DOMAIN.com:some-group/some-project.git
Elasticsearch:  no
Geo:            no
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers:

GitLab Shell
Version:        14.33.0
Repository storages:
- default:      unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell

Gitaly
- default Address:      unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version:      16.9.2
- default Git Version:  2.43.0
sudo gitlab-runner --version
Version:      16.8.0
Git revision: c72a09b6
Git branch:   16-8-stable
GO version:   go1.21.5
Built:        2024-01-18T22:42:25+0000
OS/Arch:      linux/amd64

Similar Issues

Please let me know what else I can provide to help solve this issue. We have been pouring over logs the past weeks and can’t find anything. I am required to screen anything I send so that is why I didn’t upload the logs directly.

Thanks to @stanhu, we have found the source of this issue.

The problem was this line in our gitlab.rb file.

puma['worker_processes'] = 0

Simply removing it (reverting to the default value) solved our crashes!

1 Like